mol2graph and graph2mol #rdkit #igraph

I posted about mol to graph object before.
In the blog post, I wrote example that convert RDKit mol object to igraph object. It was one way. There was no method igraph to rdkit mol object.
So I wrote very simple converter from graph to molecule.

First, import modules.

import numpy as np
import pandas as pd
import igraph
from rdkit import Chem
from rdkit.Chem.rdchem import RWMol
from rdkit.Chem import Draw
from rdkit.Chem import rdmolops
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.ipython_useSVG = True

Then define two way function, mol2graph and graph2mol. It is very simple.I did not sanitize process because I could not handle some compounds. RWMol method is very useful to do this work.

def mol2graph(mol):
    atoms_info = [ (atom.GetIdx(), atom.GetAtomicNum(), atom.GetSymbol()) for atom in mol.GetAtoms()]
    bonds_info = [(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType(), bond.GetBondTypeAsDouble()) for bond in mol.GetBonds()]
    graph = igraph.Graph()
    for atom_info in atoms_info:
        graph.add_vertex(atom_info[0], AtomicNum=atom_info[1], AtomicSymbole=atom_info[2])
    for bond_info in bonds_info:
        graph.add_edge(bond_info[0], bond_info[1], BondType=bond_info[2], BondTypeAsDouble=bond_info[3])
    return graph

def graph2mol(graph): 
    emol = RWMol()
    for v in graph.vs():
    for e in
        emol.AddBond(e.source,, e['BondType'])
    mol = emol.GetMol()
    return mol

Finally, I checked my function on jupyter notebook. And It worked well.

All code is uploaded my repo and can check from following URL.

Make Drug central ER diagram with python #chemoinfo

Recently I knew useful database “DrugCentral“.
From About.
DrugCentral provides information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action. We monitor FDA, EMA, and PMDA for new drug approval on regular basis to ensure currency of the resource.

By using the site, user can search many information on web browser. And also the site provides posgresql dump file with all data of DrugCentral.
I had interest the data so I got dump file and use it.
Afte download the dump file, I made local db in my postgres env and install the db.

iwatobipen$ psql -U postgres
postgres=# create database drugcentral;
postgres=# \q
iwatobipen$ psql drugcental < drugcentral.dump.08292017.sql

OK now I made local drugcentral db.
Next, I would like to know the structure of the database, schema. I could not find the schema in the site but I found good library named "eralchemy".
ERAlchemy generates Entity Relation (ER) diagram (like the one below) from databases or from SQLAlchemy models. I installed the package via pip and made ER diagram. ;-)

iwatobipen$ pip install eralchemy
iwatobipen$ eralchemy -i 'postgresql+psycopg2://postgres@' -o er.pdf

Second code of above generates ER diagram as PDF format. Let's check it.
Good! ;-)

First extract smiles and DDI risk.

iwatobipen$ psql -U postgres -D drugcentral
                                                 smiles                                                 |                    description                     |  ddi_risk
 OC[C@H]1N[C@H]([C@H](O)[C@@H]1O)C1=CNC2=C1NC=NC2=O                                                     | FLUCONAZOLE/OSPEMIFENE [VA Drug Interaction]       | Significant
 CO[C@@H]1[C@@H](C[C@H]2O[C@]1(C)N1C3=CC=CC=C3C3=C4CNC(=O)C4=C4C5=CC=CC=C5N2C4=C13)N(C)C(=O)C1=CC=CC=C1 | MERCAPTOPURINE/TOFACITINIB [VA Drug Interaction]   | Critical
 CCS(=O)(=O)N1CC(CC#N)(C1)N1C=C(C=N1)C1=NC=NC2=C1C=CN2                                                  | FLUDROCORTISONE/RISPERIDONE [VA Drug Interaction]  | Significant
 CNCC1=CC=C(C=C1)C1=C2CCNC(=O)C3=CC(F)=CC(N1)=C23                                                       | ARIPIPRAZOLE/HYDROCORTISONE [VA Drug Interaction]  | Significant
 OC(CNC1=CC=CC=N1)C1=CC=CC=C1                                                                           | CISAPRIDE/ZIPRASIDONE [VA Drug Interaction]        | Significant
 C#CC1=CC=CC(NC2=NC=NC3=C2C=C2OCCOCCOCCOC2=C3)=C1                                                       | ARIPIPRAZOLE/PHENYTOIN [VA Drug Interaction]       | Significant
 CN1CCN(CC1)C1=CC=C(NC2=NC3=C(SC=C3)C(OC3=CC=CC(NC(=O)C=C)=C3)=N2)C=C1                                  | ARIPIPRAZOLE/PREDNISOLONE [VA Drug Interaction]    | Significant
 OB1OCC2=CC(OC3=CC=C(C=C3)C#N)=CC=C12                                                                   | ARIPIPRAZOLE/FLUDROCORTISONE [VA Drug Interaction] | Significant
 CC1=CN(C=N1)C1=CC(NC(=O)C2=CC=C(C)C(NC3=NC=CC(=N3)C3=CN=CC=N3)=C2)=CC(=C1)C(F)(F)F                     | AMOBARBITAL/RISPERIDONE [VA Drug Interaction]      | Significant

Second extract smiles and mode of action.

                                     smiles                                     |           action_type            |                                                                          description
 COC1=C2OC=CC2=CC2=C1OC(=O)C=C2                                                 | PHARMACOLOGICAL CHAPERONE        | Pharmaceutical chaperones may help stabilize the protein structure thereby restoring folding and/or preventing misfolding of the protein
 FC1=CNC(=O)NC1=O                                                               | MINIMUM INHIBITORY CONCENTRATION | The lowest concentration of an antimicrobial that will inhibit the visible growth of a microorganism
 CC(=O)OC[C@H]1O[C@H]([C@H](OC(C)=O)[C@@H]1OC(C)=O)N1N=CC(=O)NC1=O              | ANTIBODY BINDING                 | Antibody binding activity
 CCCCN1CCCC[C@H]1C(=O)NC1=C(C)C=CC=C1C                                          | ANTAGONIST                       | Binds to a receptor and prevents activation by an agonist through competing for the binding site
 COC(=O)C1=C(C)NC(C)=C([C@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OCCN(C)CC1=CC=CC=C1 | ANTISENSE INHIBITOR              | Prevents translation of a complementary mRNA sequence through binding to it
 CCOC(=O)C1=C(C)NC(C)=C([C@@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OC                | BINDING AGENT                    | Binds to a substance such as a cell surface antigen, targetting a drug to that location, but not necessarily affecting the functioning of the substance itself
 C[C@@H](CCC1=CC=C(O)C=C1)NCCC1=CC=C(O)C(O)=C1                                  | MODULATOR                        | Effects the normal functioning of a protein in some way e.g., mixed agonist/antagonist or unclear whether action is positive or negative
 NC1=NC2=NC=C(CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O)=O)C(O)=O)N=C2C(N)=N1           | POSITIVE MODULATOR               | Positively effects the normal functioning of a protein e.g., receptor agonist, positive allosteric modulator, ion channel activator
 NCC1=CC=C(C=C1)C(O)=O                                                          | PROTEOLYTIC ENZYME               | Hydrolyses a protein substrate through enzymatic reaction
 OC(=O)CCCC1=CC=CC=C1                                                           | SUBSTRATE                        | Carried by a transporter, possibly competing with the natural substrate for transport
(10 rows)

This is very limited example of the DB. If reader who interested in the DB how about play and analyze with the DB? And ERAlchemy is very useful!!!
* To use ERAlchemy with postgresql, you need to install psycopg2 at first.

Do rapid SAR iteration!

Now I participating with JCUP, it is exciting for me. Due to growing the computer performance such as GPU computing, in silico technology become very powerful method in drug discovery.
And also DMTA cycle is going to next stage. You know recent publication from Merck is amazing for me. They make thousands of molecules on very small scale and perform their assay in crude state.
There is nice review regarding the article. So I would like to post another approach for rapid SAR.

Here is report from Cyclofluidic.
Their unique feature is closed-loop structure activity platform, to revolutionise hit and lead optimisation. In the article they explore SAR of Hepsin, a membrane-anchored serine protease.
The compounds are build from three parts acyl/sulfonyl, amino acid and guanidino protease catalytic domain has the catalytic triad of His, Asp and Ser residues. It indicates that guanidino residue is necessarily to keeping activity.

The author explore SAR with flow chemistry. They changed synthetic flow compared to batch synthesis, used TMS protected amino acid for flow chemistry because free amino acid shows low solubility and it is problematic factor. It is good tips for flow synthesis.

Finally they obtained highly active and low toxic molecules. It seems success story of the technology. BTW, I wrote below, to keeping the activity guanidino moiety is required. And it shows bad effect for ADMET profile especially permeability.
I think low cell toxicity comes from this low permeability. Of corse this target is trans membrane and does not need to going to cell inside. But low permeability is not good feature for drug (my opinion).
I am interested in next action of the research.
I think Cyclofluidic technology is very interesting and useful for rapid SAR.
How about readers opinion. ;-)


Graph convolution neural network (GCN) is useful for chemoinfo because molecules can be represent as graph structure. But GCN approach in chemoinfo has difficulties that each graph has different structure compared to image data.
There are many reports about applying GCN for chemoinfo. Sometime GCN approach outperforms other method such as CNN with molecular fingerprint.

By the way, the authors point out several limitations of current GCN.
– First, basic GCN can only capture local structure information of the graph.
– Second, GCN model cannot be applied directly because they are equivariant model with respect to the node order graph.
– Third, GCNN model is their limited ability to exploit global information for the purpose of graph.

They developed novel approach Graph Capsule Convolutional Neural Networks ( GCCNN ) classification.
Original capsule net was proposed by Hilton’s group and the approach solve problem of CNN of image classification. It can manage orientational and relative spatial relationships between small set of data.
In GCCNN, Graph capsule function is defined with statistical moments and polynominal coefficients.

f(ℓ)(X,L) = σ(g(f(ℓ−1)(X,L),L)W(ℓ)) —-eq (2)
L is graph lapracian and W is weight of l th layer.

And their idea of permutation invariant features in GCAPS-CNN model is computing the covariance of f(X, L) layer.
C(f(X,L)) = 1/N(f(X,L) − µ)T(f(X,L) − µ) –eq (7)
 Merit of using the matrix is that not only each element of covariance matrix is invariant to node orders but also the matrix has rich infromation between each node’s information.
They can guaranteeing permutation invariance in GCAP-NN model by using the strategy above.
Finally they defined model and tested with some dataset, COLLAB, IMDB etc. And GCAPS-CNN outperformed other methods.
* Graph lapracian can get from adjacency matrix “A” and these matrix has unique features.

There are many approaches about GCN and it is developed very rapidly. It is exciting area for me but difficult to follow the mathematics @_@.

Think about Structure Kinetics Relationship

Here is a deep analysis about SKR from Merck.

Recently it is becoming important factor for understanding ligand target binding kinetics. You know there are tools such as SPR, ITC and in silico method like a MD.

The author analyzed Kinetic data about Hsp90. They analyzed relation ship between R-group of some scaffolds and Kon with two type of compounds set called “cavity-varying” and “entrance-varying”.
The “cavity” is hydrophobic region of Hsp90 and “entrance” is hydrophilic.
It is interesting that substituents of “cavity-varying” shows strong relation ships between lipophilicity and Kon . On the other hand, substituents of “entrance-varying” shows week correlation.

Also they performed MD simulations to confirm a polar desolvation barrier. Unfortunately I am not familiar for Molecular Dynamics but it reveal the effect of desolvation step of molecular binding.

In the article the author provides lots of data. It is worth to check and learn I think.

Make MMP network and send to cytoscape #chemoinfo

Recently I use cytoscape in my laboratory. You know Cytoscape is nice tool for network visualization.
I often make data with python and import data from cytoscape. The work flow is not so bad but I am thinking that it will be nice if python can communicate with cytoscape.
Fortunately cytocape has REST plugin called cyREST and also python has py2cytoscape to do it!
It sounds nice. I tried to use these libraries.
At first I installed cyREST to my cytoscape (v3.5.1). appmanager => cyREST >
And also I installed chemviz for drawing chemical structure in cytoscape.
Then install py2cytoscape via pip. ;-)
You can access localhost:1234/v1 from web blowser when cytoscape launched if cyREST is successfully installed.

I drew simple MMP network. Code is below.
First, I made MMP from SMILES file by using RDKit MMP script.

$ cat testdata.smi
Oc1ccccc1 phenol
Oc1ccccc1O catechol
Oc1ccccc1N 2-aminophenol
Oc1ccccc1Cl 2-chlorophenol
Nc1ccccc1N o-phenylenediamine
Nc1cc(O)ccc1N amidol
Oc1cc(O)ccc1O hydroxyquinol
Nc1ccccc1 phenylamine
C1CCCC1N cyclopentanol
$ python < testdata.smi> testdata.frag
$ python < testdata.frag > testmmp.txt -r 0.2
$ cat testmmp.txt

Now I got MMP data I used the data to make edge of my network and testdata.smi is used to make node data.

Next code is example for communication between python and cytoscape.
At first, import CyRestClient and make connection. Default URL is localhost and port is 1234. But if user would like to use another IP and Port, user can modify from cytoscape.
Edit => Preferences => add => rest.url xxxxx, rest.port xxxx

I used python-igraph for making graph but py2cytoscape handle data generated by networkx, geohi and something.

import igraph
from import CyRestClient
import py2cytoscape

cy = CyRestClient()
G = igraph.Graph()

with open('testdata.smi', 'r') as vertexis:
    for v in vertexis:
        G.add_vertex(v.split(' ')[0], molname=v.split(' ')[1])

with open("testmmp.txt", "r") as edges:
    for edge in edges:
        G.add_edge(edge.split(",")[0], edge.split(",")[1], transform=edge.split(",")[4])

After making network, go to next step.
network_create_from_igraph method receives data from igraph and send to cytoscape.
Then I set network layout ‘force-directed’.
Finally I set some view style and update the graph settings.

g_cy =
cy.layout.apply(name='force-directed', network=g_cy)

mystyle ='mystyle')

defaults = {
    'NODE_HIGHT': 100,
    'NODE_WIDTH': 100,
    'NODE_LABEL_COLOR': '#555555',
    'EDGE_WIDTH': 20,

mystyle.update_defaults(defaults), network=g_cy)

View screenshots.
Before run the code, there is no network in cytoscape.

After run the code I could see MMP network without chemical structure.

Finally I set chemviz setting and run paint structure from menu, I could see structure on each node.

And also each node and edge has their own attribute that is set by igraph.
It interesting and useful because all work is done by using only python!

This example is one way python => cytoscape. But the library can send data in both directions.
There are nice documents written in Japanese such like a following URL.