Molecular set profiling with pandas_profiling #RDKit

Molecular descriptors are good indicator for molecular profiling. Visualize and analyze these descriptors are important to have a bird’s-eye view of given molecules set.

I often use “pandas” and “seaborn” to do it. Seaborn is powerful tool to make cool visualization but difficult to obtain statistics data.

Yesterday, I found interesting tool to analyze pandas data frame named “pandas_profiling”.
It seems very easy to make analyze report. It can be installed with conda/pip.

Let’s install the package and use!
First, call library.

import os
import pandas
import pandas_profiling
import pandas as pd
from rdkit import Chem
from rdkit import RDConfig
from rdkit.Chem import rdBase 
from rdkit.Chem import Descriptors
from rdkit.Chem.Descriptors import _descList
from rdkit.ML.Descriptors import MoleculeDescriptors
# I used cdk2.sdf dataset as test.
datadir =  os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" )

Then calculate descriptors and make dataframe

data = {}
for name in desc_name:
    data[name] = []
for descs in descs_list:
    for i, desc in enumerate(descs):
df = pd.DataFrame(data)

Let’s make report. It is very very easy!!!! ;-)


Then you can get analyze repot with bar chart.
Snap shots are below.

This package provides not only summary of the dataset but also details of the data. It seems very cool package isn’t it?
You can check whole code is following URL.



X線、結晶構造解析の話から、ターゲットFindingとKnimeの話、Open source/企業の関わり方、AI創薬の話題、公共DBの最新の話題、webRTCでデバイスとカメラを連動させたアプリ開発の話!うなぎーハモーアナゴのえ!それ知らないっす!って話題からの実践的なPythonの話、ポプテピピック創薬w、寿司と刺身の違い!




New modalities in Drug Discovery #diary

Here is a nice review of recent new modalities in Drug Discovery.

The article covered wide range of recent technologies.
1. Peptide based drug discovery not only synthetic but also venoms.
2. DELI.
3. New structure for drug discovery partnerships. In the section, the authors well documented about compound sharing (i.e. ELF) and risk-sharing, collection leasing partnerships and crowd sourcing. I am interested in compound sharing. Because there is a consortium for library in Japan named J-CLIC: Japan Compound Library ConsortiumJ-CLIC. The consortium joint purchase many compounds from supplier in pharmaceutical companies in Japan. It will be cost effective. I think this is one of the nice proposal in non competitive area.
Also crowdsourcing is interesting for me. It means open innovation!
4. Strategies for protein structure mimetics, stabilize alpha-helix or beta sheet. I did not know scaffold grafting technology. The technology is impressive for me.
5. In the section, the author introduced 2D combinatorial libraries and informa. This technology is used for direct RNA targeting by small molecules. Modulation of translation with small molecules is challenging I think, but this approach seems work well and well designed. Also PROTAC and miRNA, anti sense oligomer.

Figure 20 in the article is very nice summary about scope and limitation of these technologies.

There are many toolkit for drug discovery today. It is not limited only small molecules. The role of medicinal chemist is still expanding. Keep open my eyes and catch up new technology and science to develop new drug for human health.

mol2graph and graph2mol #rdkit #igraph

I posted about mol to graph object before.
In the blog post, I wrote example that convert RDKit mol object to igraph object. It was one way. There was no method igraph to rdkit mol object.
So I wrote very simple converter from graph to molecule.

First, import modules.

import numpy as np
import pandas as pd
import igraph
from rdkit import Chem
from rdkit.Chem.rdchem import RWMol
from rdkit.Chem import Draw
from rdkit.Chem import rdmolops
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.ipython_useSVG = True

Then define two way function, mol2graph and graph2mol. It is very simple.I did not sanitize process because I could not handle some compounds. RWMol method is very useful to do this work.

def mol2graph(mol):
    atoms_info = [ (atom.GetIdx(), atom.GetAtomicNum(), atom.GetSymbol()) for atom in mol.GetAtoms()]
    bonds_info = [(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType(), bond.GetBondTypeAsDouble()) for bond in mol.GetBonds()]
    graph = igraph.Graph()
    for atom_info in atoms_info:
        graph.add_vertex(atom_info[0], AtomicNum=atom_info[1], AtomicSymbole=atom_info[2])
    for bond_info in bonds_info:
        graph.add_edge(bond_info[0], bond_info[1], BondType=bond_info[2], BondTypeAsDouble=bond_info[3])
    return graph

def graph2mol(graph): 
    emol = RWMol()
    for v in graph.vs():
    for e in
        emol.AddBond(e.source,, e['BondType'])
    mol = emol.GetMol()
    return mol

Finally, I checked my function on jupyter notebook. And It worked well.

All code is uploaded my repo and can check from following URL.