Molecular set profiling with pandas_profiling #RDKit

Molecular descriptors are good indicator for molecular profiling. Visualize and analyze these descriptors are important to have a bird’s-eye view of given molecules set.

I often use “pandas” and “seaborn” to do it. Seaborn is powerful tool to make cool visualization but difficult to obtain statistics data.

Yesterday, I found interesting tool to analyze pandas data frame named “pandas_profiling”.
It seems very easy to make analyze report. It can be installed with conda/pip.
https://github.com/pandas-profiling/pandas-profiling

Let’s install the package and use!
First, call library.

import os
import pandas
import pandas_profiling
import pandas as pd
from rdkit import Chem
from rdkit import RDConfig
from rdkit.Chem import rdBase 
from rdkit.Chem import Descriptors
from rdkit.Chem.Descriptors import _descList
from rdkit.ML.Descriptors import MoleculeDescriptors
# I used cdk2.sdf dataset as test.
datadir =  os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" )

Then calculate descriptors and make dataframe

data = {}
for name in desc_name:
    data[name] = []
for descs in descs_list:
    for i, desc in enumerate(descs):
        data[desc_name[i]].append(desc)
df = pd.DataFrame(data)

Let’s make report. It is very very easy!!!! ;-)

pandas_profiling.ProfileReport(df)

Then you can get analyze repot with bar chart.
Snap shots are below.

This package provides not only summary of the dataset but also details of the data. It seems very cool package isn’t it?
You can check whole code is following URL.
https://nbviewer.jupyter.org/github/iwatobipen/chemo_info/blob/master/rdkit_notebook/pandas_profiling_test.ipynb

Advertisements

mishima.sykに参加した話

第一回目からで通算12回目、振り返れば早5年続いているmishima.sykに参加してきました。
今回も演題が多岐にわたり、かつ、盛りだくさんでとても勉強になりました。参加された他の皆様にも何か得るものがあったことを願っております。
X線、結晶構造解析の話から、ターゲットFindingとKnimeの話、Open source/企業の関わり方、AI創薬の話題、公共DBの最新の話題、webRTCでデバイスとカメラを連動させたアプリ開発の話!うなぎーハモーアナゴのえ!それ知らないっす!って話題からの実践的なPythonの話、ポプテピピック創薬w、寿司と刺身の違い!
面白いだけでなく、先端の情報、サイエンスの情報が盛り込まれていて毎回プレゼンターすげえなって思うわけですよ。mishima.sykすごい。興味ある方は是非ともご参加くださいねw
懇親会も大変美味しい料理と、面白い話を色々と聞けて大満足の1日でした。

私のプレゼンとマテリアルは下記のURLからゲットできます。今回は強化学習の最初の話とLTでcyRESTの話をさせていただきました。LTはファイルを消してしまっていて一番の見せ場で失敗するという失態を犯しましたが、、、、
https://github.com/Mishima-syk/12

強化学習に関してはDQNが世界を沸かせたのも早昔、今では様々なアルゴリズムがどんどこでていますね。論文でてすぐ実装が上がりの繰り返し。何か応用を考えようと思っているうちに世界は先に進んでいる。エキサイティングだけど追っていくのがとても大変。というか私は追えてない。
コンピュータサイエンスはアルゴリズムの進化とマシンリソースがうまく合わさってどんどん進んでいる感じがします。そこで生まれた提案、デザインを現実のものとして生み出すProduction部分もそれについていかないと、全体としての加速は起こらないでしょう。まあAIが100発100中の回答を出せれば別ですが、まだ創薬に関して考えるとそれは難しいかなって思うわけです。これから数年先、メディスナルケミストの役割、姿ってどうなってるんでしょうね。などとあれこれ考えながら週末を終えそうです。

何はともあれ参加者、発表者のみなさまどうもありがとうございました。

New modalities in Drug Discovery #diary

Here is a nice review of recent new modalities in Drug Discovery.
https://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.8b00378

The article covered wide range of recent technologies.
1. Peptide based drug discovery not only synthetic but also venoms.
2. DELI.
3. New structure for drug discovery partnerships. In the section, the authors well documented about compound sharing (i.e. ELF) and risk-sharing, collection leasing partnerships and crowd sourcing. I am interested in compound sharing. Because there is a consortium for library in Japan named J-CLIC: Japan Compound Library ConsortiumJ-CLIC. The consortium joint purchase many compounds from supplier in pharmaceutical companies in Japan. It will be cost effective. I think this is one of the nice proposal in non competitive area.
Also crowdsourcing is interesting for me. It means open innovation!
4. Strategies for protein structure mimetics, stabilize alpha-helix or beta sheet. I did not know scaffold grafting technology. The technology is impressive for me.
5. In the section, the author introduced 2D combinatorial libraries and informa. This technology is used for direct RNA targeting by small molecules. Modulation of translation with small molecules is challenging I think, but this approach seems work well and well designed. Also PROTAC and miRNA, anti sense oligomer.

Figure 20 in the article is very nice summary about scope and limitation of these technologies.

There are many toolkit for drug discovery today. It is not limited only small molecules. The role of medicinal chemist is still expanding. Keep open my eyes and catch up new technology and science to develop new drug for human health.

mol2graph and graph2mol #rdkit #igraph

I posted about mol to graph object before.
In the blog post, I wrote example that convert RDKit mol object to igraph object. It was one way. There was no method igraph to rdkit mol object.
So I wrote very simple converter from graph to molecule.

First, import modules.

import numpy as np
import pandas as pd
import igraph
from rdkit import Chem
from rdkit.Chem.rdchem import RWMol
from rdkit.Chem import Draw
from rdkit.Chem import rdmolops
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.ipython_useSVG = True

Then define two way function, mol2graph and graph2mol. It is very simple.I did not sanitize process because I could not handle some compounds. RWMol method is very useful to do this work.

def mol2graph(mol):
    atoms_info = [ (atom.GetIdx(), atom.GetAtomicNum(), atom.GetSymbol()) for atom in mol.GetAtoms()]
    bonds_info = [(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType(), bond.GetBondTypeAsDouble()) for bond in mol.GetBonds()]
    graph = igraph.Graph()
    for atom_info in atoms_info:
        graph.add_vertex(atom_info[0], AtomicNum=atom_info[1], AtomicSymbole=atom_info[2])
    for bond_info in bonds_info:
        graph.add_edge(bond_info[0], bond_info[1], BondType=bond_info[2], BondTypeAsDouble=bond_info[3])
    return graph

def graph2mol(graph): 
    emol = RWMol()
    for v in graph.vs():
        emol.AddAtom(Chem.Atom(v["AtomicNum"]))
    for e in graph.es():
        emol.AddBond(e.source, e.target, e['BondType'])
    mol = emol.GetMol()
    return mol

Finally, I checked my function on jupyter notebook. And It worked well.

All code is uploaded my repo and can check from following URL. https://nbviewer.jupyter.org/github/iwatobipen/chemo_info/blob/master/rdkit_notebook/mol2g_g2mol.ipynb