AMES classification with WL graph kernel #RDKit

I often feel it difficult for me to implement algorithm from zero-base… I need to more practice. ;-)
BTW, recently I can find many articles about application of graph theory for chemoinformatics.
I found some interesting articles and they provides useful packages in github!

Today, I tried a library named Grakel.
You can find original article from the URL below.

I used the package and compared the performance to traditional SVC. Open AMES dataset is used for following test.
My code is below. The Grakel package has many algorithms and easy to use for calculation of graph kernel. I calculated WL graph kernel with Adjacency matrix from RDKit and built predictive model. At the same time, I built tradicional SVC model with ECFP4(Morgan Finger print radi=2).

To compare the results, it is interesting for me that WL graph kernel worked well even if the kernel does not have details for the molecules such as charge, num of hydrogen etc.
Is it means that Graph based model is powerful? This is only one experience for the descriptor.
I would like to try any other dataset.

These model is based on feature of ligand and not include protein information. For the real world, drug discovery process is needed many informations not only ligands, but also proteins.

I would like to know possibility of graph based approach for chemoinformatics.

from grakel import GraphKernel
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
import numpy as np
import argparse
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

def getparser():
    parser = argparse.ArgumentParser('argparser')
    parser.add_argument('input', help='file path and name of input')
    parser.add_argument('prop', help='properties for predict')
    return parser.parse_args()

def molg_from_smi(smiles):
    mol = Chem.MolFromSmiles(smiles)
    atom_with_idx = { i:atom.GetSymbol() for i, atom in enumerate(mol.GetAtoms())}
    adj_m = Chem.GetAdjacencyMatrix(mol, useBO=True).tolist()
    return [adj_m, atom_with_idx]

def molg_from_rdkit(mol):
    atom_with_idx = { i:atom.GetSymbol() for i, atom in enumerate(mol.GetAtoms())}
    adj_m = Chem.GetAdjacencyMatrix(mol, useBO=True).tolist()
    return [adj_m, atom_with_idx]

def mol2fp(mol):
    fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2)
    arr = np.zeros((1,))
    DataStructs.ConvertToNumpyArray(fp, arr)
    return arr

if __name__=='__main__':
    args = getparser()
    mols = [mol for mol in Chem.SDMolSupplier(args.input) if mol != None]
    X = [molg_from_rdkit(mol) for mol in mols]
    Ames_dict = {'mutagen':1, 'nonmutagen':0}
    Y = [ Ames_dict[mol.GetProp('Ames test categorisation')] for mol in mols]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y)

    gk = GraphKernel(kernel=[{"name": "weisfeiler_lehman", "niter": 5},{"name":"subtree_wl"}], normalize=True)
    K_train = gk.fit_transform(X_train)
    K_test = gk.transform(X_test)

    gclf = SVC(kernel='precomputed'), Y_train)
    y_pred_g = gclf.predict(K_test)

    from sklearn.metrics import classification_report
    from sklearn.metrics import confusion_matrix
    rep = classification_report(Y_test, y_pred_g)
    print('WL graph kernel')
    print(confusion_matrix(Y_test, y_pred_g))


    X = [mol2fp(mol) for mol in mols]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
    clf = SVC(C=20.), Y_train)
    y_pred = clf.predict(X_test)
    rep = classification_report(Y_test, y_pred)
    print(confusion_matrix(Y_test, y_pred))

WL graph kernel
[[381 125]
[ 85 494]]
precision recall f1-score support

      0       0.82      0.75      0.78       506
      1       0.80      0.85      0.82       579

avg / total 0.81 0.81 0.81 1085

[[381 110]
[111 483]]
precision recall f1-score support

      0       0.77      0.78      0.78       491
      1       0.81      0.81      0.81       594

avg / total 0.80 0.80 0.80 1085

real 0m40.446s
user 1m49.922s
sys 0m3.074s


mol2graph and graph2mol #rdkit #igraph

I posted about mol to graph object before.
In the blog post, I wrote example that convert RDKit mol object to igraph object. It was one way. There was no method igraph to rdkit mol object.
So I wrote very simple converter from graph to molecule.

First, import modules.

import numpy as np
import pandas as pd
import igraph
from rdkit import Chem
from rdkit.Chem.rdchem import RWMol
from rdkit.Chem import Draw
from rdkit.Chem import rdmolops
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.ipython_useSVG = True

Then define two way function, mol2graph and graph2mol. It is very simple.I did not sanitize process because I could not handle some compounds. RWMol method is very useful to do this work.

def mol2graph(mol):
    atoms_info = [ (atom.GetIdx(), atom.GetAtomicNum(), atom.GetSymbol()) for atom in mol.GetAtoms()]
    bonds_info = [(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType(), bond.GetBondTypeAsDouble()) for bond in mol.GetBonds()]
    graph = igraph.Graph()
    for atom_info in atoms_info:
        graph.add_vertex(atom_info[0], AtomicNum=atom_info[1], AtomicSymbole=atom_info[2])
    for bond_info in bonds_info:
        graph.add_edge(bond_info[0], bond_info[1], BondType=bond_info[2], BondTypeAsDouble=bond_info[3])
    return graph

def graph2mol(graph): 
    emol = RWMol()
    for v in graph.vs():
    for e in
        emol.AddBond(e.source,, e['BondType'])
    mol = emol.GetMol()
    return mol

Finally, I checked my function on jupyter notebook. And It worked well.

All code is uploaded my repo and can check from following URL.

Make Drug central ER diagram with python #chemoinfo

Recently I knew useful database “DrugCentral“.
From About.
DrugCentral provides information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action. We monitor FDA, EMA, and PMDA for new drug approval on regular basis to ensure currency of the resource.

By using the site, user can search many information on web browser. And also the site provides posgresql dump file with all data of DrugCentral.
I had interest the data so I got dump file and use it.
Afte download the dump file, I made local db in my postgres env and install the db.

iwatobipen$ psql -U postgres
postgres=# create database drugcentral;
postgres=# \q
iwatobipen$ psql drugcental < drugcentral.dump.08292017.sql

OK now I made local drugcentral db.
Next, I would like to know the structure of the database, schema. I could not find the schema in the site but I found good library named "eralchemy".
ERAlchemy generates Entity Relation (ER) diagram (like the one below) from databases or from SQLAlchemy models. I installed the package via pip and made ER diagram. ;-)

iwatobipen$ pip install eralchemy
iwatobipen$ eralchemy -i 'postgresql+psycopg2://postgres@' -o er.pdf

Second code of above generates ER diagram as PDF format. Let's check it.
Good! ;-)

First extract smiles and DDI risk.

iwatobipen$ psql -U postgres -D drugcentral
                                                 smiles                                                 |                    description                     |  ddi_risk
 OC[C@H]1N[C@H]([C@H](O)[C@@H]1O)C1=CNC2=C1NC=NC2=O                                                     | FLUCONAZOLE/OSPEMIFENE [VA Drug Interaction]       | Significant
 CO[C@@H]1[C@@H](C[C@H]2O[C@]1(C)N1C3=CC=CC=C3C3=C4CNC(=O)C4=C4C5=CC=CC=C5N2C4=C13)N(C)C(=O)C1=CC=CC=C1 | MERCAPTOPURINE/TOFACITINIB [VA Drug Interaction]   | Critical
 CCS(=O)(=O)N1CC(CC#N)(C1)N1C=C(C=N1)C1=NC=NC2=C1C=CN2                                                  | FLUDROCORTISONE/RISPERIDONE [VA Drug Interaction]  | Significant
 CNCC1=CC=C(C=C1)C1=C2CCNC(=O)C3=CC(F)=CC(N1)=C23                                                       | ARIPIPRAZOLE/HYDROCORTISONE [VA Drug Interaction]  | Significant
 OC(CNC1=CC=CC=N1)C1=CC=CC=C1                                                                           | CISAPRIDE/ZIPRASIDONE [VA Drug Interaction]        | Significant
 C#CC1=CC=CC(NC2=NC=NC3=C2C=C2OCCOCCOCCOC2=C3)=C1                                                       | ARIPIPRAZOLE/PHENYTOIN [VA Drug Interaction]       | Significant
 CN1CCN(CC1)C1=CC=C(NC2=NC3=C(SC=C3)C(OC3=CC=CC(NC(=O)C=C)=C3)=N2)C=C1                                  | ARIPIPRAZOLE/PREDNISOLONE [VA Drug Interaction]    | Significant
 OB1OCC2=CC(OC3=CC=C(C=C3)C#N)=CC=C12                                                                   | ARIPIPRAZOLE/FLUDROCORTISONE [VA Drug Interaction] | Significant
 CC1=CN(C=N1)C1=CC(NC(=O)C2=CC=C(C)C(NC3=NC=CC(=N3)C3=CN=CC=N3)=C2)=CC(=C1)C(F)(F)F                     | AMOBARBITAL/RISPERIDONE [VA Drug Interaction]      | Significant

Second extract smiles and mode of action.

                                     smiles                                     |           action_type            |                                                                          description
 COC1=C2OC=CC2=CC2=C1OC(=O)C=C2                                                 | PHARMACOLOGICAL CHAPERONE        | Pharmaceutical chaperones may help stabilize the protein structure thereby restoring folding and/or preventing misfolding of the protein
 FC1=CNC(=O)NC1=O                                                               | MINIMUM INHIBITORY CONCENTRATION | The lowest concentration of an antimicrobial that will inhibit the visible growth of a microorganism
 CC(=O)OC[C@H]1O[C@H]([C@H](OC(C)=O)[C@@H]1OC(C)=O)N1N=CC(=O)NC1=O              | ANTIBODY BINDING                 | Antibody binding activity
 CCCCN1CCCC[C@H]1C(=O)NC1=C(C)C=CC=C1C                                          | ANTAGONIST                       | Binds to a receptor and prevents activation by an agonist through competing for the binding site
 COC(=O)C1=C(C)NC(C)=C([C@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OCCN(C)CC1=CC=CC=C1 | ANTISENSE INHIBITOR              | Prevents translation of a complementary mRNA sequence through binding to it
 CCOC(=O)C1=C(C)NC(C)=C([C@@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OC                | BINDING AGENT                    | Binds to a substance such as a cell surface antigen, targetting a drug to that location, but not necessarily affecting the functioning of the substance itself
 C[C@@H](CCC1=CC=C(O)C=C1)NCCC1=CC=C(O)C(O)=C1                                  | MODULATOR                        | Effects the normal functioning of a protein in some way e.g., mixed agonist/antagonist or unclear whether action is positive or negative
 NC1=NC2=NC=C(CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O)=O)C(O)=O)N=C2C(N)=N1           | POSITIVE MODULATOR               | Positively effects the normal functioning of a protein e.g., receptor agonist, positive allosteric modulator, ion channel activator
 NCC1=CC=C(C=C1)C(O)=O                                                          | PROTEOLYTIC ENZYME               | Hydrolyses a protein substrate through enzymatic reaction
 OC(=O)CCCC1=CC=CC=C1                                                           | SUBSTRATE                        | Carried by a transporter, possibly competing with the natural substrate for transport
(10 rows)

This is very limited example of the DB. If reader who interested in the DB how about play and analyze with the DB? And ERAlchemy is very useful!!!
* To use ERAlchemy with postgresql, you need to install psycopg2 at first.

Make MMP network and send to cytoscape #chemoinfo

Recently I use cytoscape in my laboratory. You know Cytoscape is nice tool for network visualization.
I often make data with python and import data from cytoscape. The work flow is not so bad but I am thinking that it will be nice if python can communicate with cytoscape.
Fortunately cytocape has REST plugin called cyREST and also python has py2cytoscape to do it!
It sounds nice. I tried to use these libraries.
At first I installed cyREST to my cytoscape (v3.5.1). appmanager => cyREST >
And also I installed chemviz for drawing chemical structure in cytoscape.
Then install py2cytoscape via pip. ;-)
You can access localhost:1234/v1 from web blowser when cytoscape launched if cyREST is successfully installed.

I drew simple MMP network. Code is below.
First, I made MMP from SMILES file by using RDKit MMP script.

$ cat testdata.smi
Oc1ccccc1 phenol
Oc1ccccc1O catechol
Oc1ccccc1N 2-aminophenol
Oc1ccccc1Cl 2-chlorophenol
Nc1ccccc1N o-phenylenediamine
Nc1cc(O)ccc1N amidol
Oc1cc(O)ccc1O hydroxyquinol
Nc1ccccc1 phenylamine
C1CCCC1N cyclopentanol
$ python < testdata.smi> testdata.frag
$ python < testdata.frag > testmmp.txt -r 0.2
$ cat testmmp.txt

Now I got MMP data I used the data to make edge of my network and testdata.smi is used to make node data.

Next code is example for communication between python and cytoscape.
At first, import CyRestClient and make connection. Default URL is localhost and port is 1234. But if user would like to use another IP and Port, user can modify from cytoscape.
Edit => Preferences => add => rest.url xxxxx, rest.port xxxx

I used python-igraph for making graph but py2cytoscape handle data generated by networkx, geohi and something.

import igraph
from import CyRestClient
import py2cytoscape

cy = CyRestClient()
G = igraph.Graph()

with open('testdata.smi', 'r') as vertexis:
    for v in vertexis:
        G.add_vertex(v.split(' ')[0], molname=v.split(' ')[1])

with open("testmmp.txt", "r") as edges:
    for edge in edges:
        G.add_edge(edge.split(",")[0], edge.split(",")[1], transform=edge.split(",")[4])

After making network, go to next step.
network_create_from_igraph method receives data from igraph and send to cytoscape.
Then I set network layout ‘force-directed’.
Finally I set some view style and update the graph settings.

g_cy =
cy.layout.apply(name='force-directed', network=g_cy)

mystyle ='mystyle')

defaults = {
    'NODE_HIGHT': 100,
    'NODE_WIDTH': 100,
    'NODE_LABEL_COLOR': '#555555',
    'EDGE_WIDTH': 20,

mystyle.update_defaults(defaults), network=g_cy)

View screenshots.
Before run the code, there is no network in cytoscape.

After run the code I could see MMP network without chemical structure.

Finally I set chemviz setting and run paint structure from menu, I could see structure on each node.

And also each node and edge has their own attribute that is set by igraph.
It interesting and useful because all work is done by using only python!

This example is one way python => cytoscape. But the library can send data in both directions.
There are nice documents written in Japanese such like a following URL.

inter and intra reaction handling in RDKit #RDKit

RDKit can handle reaction. Enumeration of many molecules with template reaction and building blocks are useful for library generation.
Recently I have a question about how to handle intramolecular reactions with RDKit such as micro cyclization etc.
In the case of amidation reaction that is often used for drug synthesis SMARTS query is below.
The query is inter molecular reaction in RDKit. A + B => C. So this query can not apply to intramolecular reaction such like a “OC(=O)CCCCN => C(=O)1CCCCN1”
In the RDKit, intramolecular reaction query is represented by including reactants in parentheses.
You can found in the document.

By the way how to distinguish between intra and inter reaction in my code?🤔

I propose simple solution, multiple SMILES is handled as one mol object and perform reaction and then separate it.
OK my english is difficult to understand, let’s go to code.

In the blog post, I wrote an example for amidation.

from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw

# define intra and inter molecular reaction
intra_rxn = AllChem.ReactionFromSmarts('([C:1][C:2](=[O:6])[O:3].[N:4][C:5])>>[C:1][C:2](=[O:6])[N:4][C:5]')
inter_rxn = AllChem.ReactionFromSmarts('[C:1][C:2](=[O:6])[O:3].[N:4][C:5]>>[C:1][C:2](=[O:6])[N:4][C:5]')

# basic acid / amine
acid = Chem.MolFromSmiles('CC(=O)O')
amine = Chem.MolFromSmiles('NC')
# intramolecular
aminoacid =  Chem.MolFromSmiles('N(C)CCC(O)CC(=O)O')

# two molecules in one mol object!
combmol = Chem.MolFromSmiles("CC(=O)O.N1CCC(C)1")

In the case of A + B => C is below

inter_rxn.RunReactants([acid, amine])[0][0]

In this case reaction seems good. By the way, in case of intra reaction is below.

# Intra reaction can not represent with inter molecular reactoin query

But intra SMIRKS query works fine.

#inra moleclar query works fine

Next run the reaction with one mol object in two molecules and intra molecular reaction object.

# paired molecular object also works but reactant two molecules is handled as one object

Finally combined molecules are separated with Sep_mol function. The function convert molecule to SMILES and split by ‘.’ then transforms SMILES to molecules.

def sepMol(mol):
    smi_list = Chem.MolToSmiles(mol).split('.')
    mols = [ Chem.MolFromSmiles(smi) for smi in smi_list]
    return mols
ms = sepMol(combmol)
# out

I tried the function only amidation and do not know whether the method is efficient or not.
Any comment or advice is appreciated.

All my code can check from following URL.

RDKit 2018.03.01 release! #rdkit

Dear RDKitter,
It’s good news that new version of rdkit is released!
You can find details in original repository.
There are many improvement and bug fixes in the release. I appreciate developers work!
Recent version of RDKit has lots of 3D descriptors. PMI/NPR.
And new version of rdkit has new function “ComputePrincipalAxesAndMoments” that computes principal axes and moments of inertia for a conformer.
I tried to use the function. ;-)
At first I installed new version of rdkit via conda command.

iwatobipen$ conda install -c rdkit rdkit=

Then wrote code.

from pprint import pprint
from rdkit import Chem
from rdkit import rdBase
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
# 2018.03.1
mol = Chem.AddHs(mol)
#generate 3d conf
AllChem.EmbedMolecule(mol, useExpTorsionAnglePrefs=True, useBasicKnowledge=True)
#calc PMI and NPR with default function
npr1 = AllChem.CalcNPR1(mol)
npr2 = AllChem.CalcNPR2(mol)
pprint([pmi1, pmi2, pmi3,"##",npr1, npr2])

# Next use new function
from rdkit.Chem import rdMolTransforms as rdmt
(array([[ 0.98500524,  0.17113466,  0.02185434],
        [-0.1185898 ,  0.57961538,  0.8062149 ],
        [-0.1253042 ,  0.7967176 , -0.59121901]]),
 array([ 30.83932592, 203.65737152, 218.01591889]))

There are differences in both function. Why ??? I checked source code but I can not understand the reason.
Most difference is scale, but not equal.

print([484.9333356244745/30.83932592, 2954.3752956989097/203.65737152, 3158.6098121028335/218.01591889])
[15.724511517613434, 14.506596415582127, 14.487977887965654]

RDKit 3D conformer generator shows good performance as same as commercial packages. It indicates 3D descriptors of RDKit is useful in Computer Aided Drug Discovery I think.

At last, I attached today’s snippet as PDF files.

Install indigo tool kit to OSX and make python wrapper #Indigo #chemoinfo

I am not familiar with Indigo TK. I only have used Indigo TK via Knime.
Indigo TK provides python wrapper, so if I can build indigo TK from source and python wrapper all task can do only python. ( for me )
That sounds nice. So I tried to install Indigo TK from source. It was easy to install to Linux but there are some problem against OSX. Not so big problem.
OK let’s start.
At first down load files from git repository.

iwatobipen$ git clone
iwatobipen$ Indigo

My OSX version is 10.13 Serria, indigo install script is not supported the version. But I changed some lines of cmake file and succeeded the installation

iwatobipen$ vim common/cmake/MacFrameworks.cmake
# comment out line of 8 and hard coded SDK version
8 #set(FRAMEWORK_PATH /Applications/${SSNAME}.sdk/System/Library/Frameworks)
9 set(FRAMEWORK_PATH /Applications/

iwatobipen$ vim common/cmake/SetBuildParameters.cmake
#comment out of line 57 and hard corded version.
57 # set(CMAKE_OSX_SYSROOT /Applications/${SDK_SUBSYSTEM_NAME}.sdk)
58 set(CMAKE_OSX_SYSROOT /Applications/

Ready! ;-)
Let’s run install script.

iwatobipen$ python build_scripts/ --preset=mac-universal
iwatobipen$ python build_scripts/ --preset=mac-universal

After installation I can call some commands indigo-cano, indigo-deco, indigo-depict.
Next make python wrapper! If you are familiar for Java, you can make java wrapper with same way.

iwatobipen$ build_scripts\ --type=pyhon

I can found zipped python wrapper in dist folder. ;-)
move to dist/indigo-python…
And launch ipython.
Indigo TK has atom mapping function. It is useful because RDKit does not has such function I think.

from indigo import Indigo, IndigoObject
mol = Indigo().loadMolecule('c1ccccc1C')
# 92.1384220123291
rxn = indigo.loadReaction('CCBr.N(C)C>>CCN(C)C')
# '[CH3:1][CH2:2]Br.[NH:4]([CH3:6])[CH3:5]>>[CH3:1][CH2:2][N:4]([CH3:6])[CH3:5]'

In summary Indigo TK is useful for chemoinformatics.
If reader who is interested in indigo TK please try to use command tool and Knime node.
I appreciate comments and/or advice.