The importance of visual inspection in docking studies #memo #journal #chemoinformatics

Many drug discovery projects use computational approaches and Docking is one of the major tool for predicting protein-ligand binding pose. As readers know docking score which is calculated by each docking software isn’t so accurate. So we need to prioritize docking pose not only score but also other method. I read very interesting article aboutContinue reading “The importance of visual inspection in docking studies #memo #journal #chemoinformatics”


Install ChEMBL28 & rdkit cartridge #chemoinformatics #RDKit

Recently ChEMBL 28 was released. It’s good news for chemoinformaticitan and time to update your chembldb ;) Of course I did it. At first I tried to build postgresql chembl28 on my main conda env but it was difficult to install rdkit-postgresql due to some package confliction. So I made clean environment for postgresql/rdkit andContinue reading “Install ChEMBL28 & rdkit cartridge #chemoinformatics #RDKit”

Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics

There are lots of Graph convolutional network(GCN) models are applied for QSAR tasks instead of traditional descriptor based model. The interesting point of GCN is that we don’t need feature engineering I think. It means that during the learning process, GCN learns molecular feature from given molecular graph. On the other side, descriptor based modelContinue reading “Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics”

Conformer energy minimization with Openforcefield #OpenFF #RDKit

Last month, I posted topics about conformer generation code with rdkit. RDK_confgen can generate multiple conformers from molfile. This code generates conformers with MMFF94s forcefield. On the other hand recently open force field is very attractive package for these are I think. Fortunately OpenFF provides example code for conformer energy minimization as CLI tool.Continue reading “Conformer energy minimization with Openforcefield #OpenFF #RDKit”

Generate conformers script with rdkit #RDKit #Chemoinformatics

As you know, conformer sampling is an important task for not only SBDD but also LBDD because many drug like molecules has rotatable bonds so, they has possibility to have many conformations. And recently there are lots of tools for compound conformer generation. Jean-Paul Ebejer et al. published very useful article about conformer generation withContinue reading “Generate conformers script with rdkit #RDKit #Chemoinformatics”

Rationalize of none additive SAR #memo #journal

In medicinal chemistry, chemists often make compound analogue which differ only a part. Optimization of R1, R2 etc… After optimized each R group, combination strategy is tried because it is expected that combination make additive effect for potency. But sometime combination strategy doesn’t work well. To understanding the phenomenon structural information is important. And recentlyContinue reading “Rationalize of none additive SAR #memo #journal”

Update announcement of shape-it #chemoinformatics #RDKit #shape-it

The end of the last year, I tried to update shape-it and shared my code on github. My new version of shape-it could supports openbabel3.x After that, I got really wonderful offer from @dr_greg_landrum. He modified the code which uses RDKit instead of Openbabel and be used as library instead of command line tool. It’sContinue reading “Update announcement of shape-it #chemoinformatics #RDKit #shape-it”

Handle rdkit molobjects with String/BytesIO #RDKit #memo

This post has no new topics but just for my memo. I often want to read and write molecule ad hoc without writing them to file objects. RDKit has two kinds of SDF reader one is SDMolSupplier and the other is ForwardSDMolSupplier and has a SDF writer SDWriter. SDWriter supports StringIO and Forward SDSupplier supportsContinue reading “Handle rdkit molobjects with String/BytesIO #RDKit #memo”

Chemical structure generation without GPU #chemoinformatics #STONED #rdkit

Generative model is very hot not only in computer vision, natural language processing but also chemoinformatics. As you know, recent version of deep learning based compound generator works very well but it is required huge computer resources for building the model. And also SMILES based approach sometime generates invalid molecules. Recently I read very interestingContinue reading “Chemical structure generation without GPU #chemoinformatics #STONED #rdkit”

Python package for Automated Graph Learning #DeepLearning #GraphLearning #Chemoinformatics

I hope you have great start of 2021! This is my first post of new year! For there are many data which can be represented as graph. So graph based deep learning(GL) is very interesting area. In chemistry area, molecule can be represented as graph so GL is also attractive method for chemoinformatics. I postedContinue reading “Python package for Automated Graph Learning #DeepLearning #GraphLearning #Chemoinformatics”

Embed molecular editor into Streamlit app #streamlit #chemoinformatics #RDKit

I wrote some posts about usage of combination chemoinformatics and streamlit. One was predictive model application which was used rdkit and scikit-learn. When I tweeted that, Jan Jansen (who is Great quantum chemist and I met him RDKit UGM!!!) commented me that it is useful that if molecular drawer can use in the app ;)Continue reading “Embed molecular editor into Streamlit app #streamlit #chemoinformatics #RDKit”

Chemoinfo のアプリをStreamlitを使ってDeployする #streamlit #RDKit #souyakuAC2020

みなさんこんにちは。お元気でしょうか。私はなんとか風邪にもならず過ごしております。寒くて朝起きれなくなってきたIwatobipenです。 今年はネタもない+なぜかいろいろ忙しくて参加しないつもりでしたが、少しでもコミュニティーに貢献しようと思いまして、小ネタを提供することにしました。創薬感はゼロですがご容赦ください。 皆さんStremlitはご存知でしょうか。Qiitaなどにも記事がありますが、PythonだけでイケてるUIもコミコミのデータ解析Appを作れてしまうパッケージです。 機械学習と組み合わせるのであればモデルを作っておいて予測アプリをこれで提供するなどが簡単にできます。ちょっと前にこのStreamlitとRDKITを組み合わせたアプリに関する記事をポストしました。 その後いろいろドキュメントを見ているとStreamlit はWeb上にデプロイできるようです。share する場合はしたのページからGithubと連携させる必要があります。 手順はこちらに書いてあります。 サンプルのコードはこちらに置いてあります。コードの話は前の記事に書いてあるのですが変更点が二点あります。Shareするようにした場合、Streamlitから提供されるVMに環境がデプロイされます。requirements.txtに通常のパッケージ以外にPipで入れる必要があるパッケージを記載します。 そのあとしばらく悩んだんですけどRequrementsにCondaのPackage書いても入りません。私の大好きなRDKitはここに書いても入らないんです。その後解決策がわかりました!下のようにconda.txtにパッケージ名を書きます。チャンネルを指定したい場合はconda_channels.txtに指定しておきます。 ↓ rdkit だけですと、VM上でインストールできなかったためconda.txtはGCCも入れています。 ここまでお膳立てしておくとWeb上でStreamlitAppが配布できるようになりユーザー側に環境整備の必要がなくなります。 実際の成果物がこちら アクセスすると、プルダウンで化合物インデックスを選ぶようになっていて選ぶと該当する分子の溶解度予測の結果と構造が描画されるようになっています。 App.pyはGithubを見ていただければだいたいわかると思いますがこんな感じ。ファイルのPathなどがVMの中で見えるようにするために変えてあります。 分子のイメージは中間ファイルを出さないようにByteioオブジェクトを使って渡す仕様にしました。今回は例示していませんが同じテクニックを使ってMatplotlibの図とかも埋め込めたりします。StreamlitはwriteでMatplotlibのFigを受け取れるのでこみいったことはしないでもいけるはずですが。他の場合に使えるかもですw。 Google colabなどRDKIT入れるのちょっと面倒だったりするので、アプリを作ってShareする場合、Streamlit使うとユーザー側は環境を構築しないでテストできるので良いですね。と思いました。 公式ドキュメントにはCondaパッケージ使えるとは書いてあるけどどうやって指定するかは書いていなくて困っていたんですが、Stremlitのコミュニティーに聞いたら一瞬で解決しました。感謝感謝です。 オープンソースのパッケージ+アクティブなコミュニティーはデータサイエンスにとって貴重ですね。 ライトな話題ではありましたが何かしらの参考になれば幸いです。 おしまい。

Update shape-it and align-it #structure_align #chemoinformatics #OpenBabel3

Shape and Pharmacophore based molecular alignment methods are often used not only SBDD but also LBDD projects. ROCS is one of the major tool to do it but it’s commercial package for none academia. Shape-it and Align-it which are provided from silicos-it are very useful open source packages for molecular alignments. I posted about shape-itContinue reading “Update shape-it and align-it #structure_align #chemoinformatics #OpenBabel3”

Make interactive web app with streamlit and RDKit #RDKit #streamlit

Recently @napoles3D shared very useful code which shows integrate rdkit and stremlit. Here is the code. So I have interest about integrate rdkit and streamlit because streamlit can make web app easily without considering and making UI like jupyter notebook. So today, I would like to share an example to integrate rdkit and streamlit.Continue reading “Make interactive web app with streamlit and RDKit #RDKit #streamlit”

Useful ML tool for chemoinformatics #chemoinformatics #RDKit #Machine learning

Yesterday, I moved my main PC from Ubuntu18.04 to 20.04LTS. Now it works well. And I’m building new(clean) env for my coding. Today I would like to share useful package for machine learning named pycaret. Brief introduction of PyCaret is below. —from original site—PyCaret is an open-source, low-code machine learning library in Python that automatesContinue reading “Useful ML tool for chemoinformatics #chemoinformatics #RDKit #Machine learning”

Update conda package of rdkit-postgresql #rdkit #postgresql

Yesterday, I enjoyed mishima.syk #16. Due to recent COVID-19, we moved the meeting to virtual style and worked very well. As same as RDKit UGM 2020, we used Zoom for presentation and discord for discussion and chat. Discord was very useful and easy to use. Thank for all participants and member of the community. TheContinue reading “Update conda package of rdkit-postgresql #rdkit #postgresql”

AutomatedSeriesClassification update #RDKit #chemoinformatics

It was an honor for me that I could have an opportunity to present RDKit UGM 2020. My LT topic about Automated Chemical Series classification with pure RDKit. I uploaded my slide to UGM repo After the UGM, I got great offer from @cthoyt. He proposed that this code convert to package which canContinue reading “AutomatedSeriesClassification update #RDKit #chemoinformatics”

Tool for machine learning model logging #chemoinformatics #machine_learning

It is difficult to manage machine learning models because to obtain good model, many trials which called parameter optimization are required and then lots of models are generated. Optuna is one of the useful package for model management and parameter optimization. I like it and posted some code about optuna. So today, I would likeContinue reading “Tool for machine learning model logging #chemoinformatics #machine_learning”

Similarity search with chemical cartridge for SQLite3 #rdkit #sqlite3 #chemicalite

Some days ago I posted topics about chemical cartridge for sqlite named ‘chemicalite’ And in the post I wrote how to install chemicalite and how to conduct substructure search but didn’t wrote similarity search with chemicalite. Original document describes how to make fingerprint table and use it however I couldn’t reproduce it with sameContinue reading “Similarity search with chemical cartridge for SQLite3 #rdkit #sqlite3 #chemicalite”