Retrieve MMP square from MMP database.

I wrote blog post about mmp and neo4j somedays ago.
I thought that I could retrieve mmp square from neo4j.
MMP square means 4 molecules pairs like a following relationship.
mol1 => mol2 (MMP), mol2 => mol3 (MMP), mol3 => mol4 (MMP), mol4 => mol1(MMP)
The relationship is important to think about additive or non additive SAR.
And cypher can search the square very simply.
Let’s try it.
At first, I prepare dataset from chemblDB CYP3A4 inhibition data.

import pandas as pd
import numpy as np
df = pd.read_table( "bioactivity-16_12-40-28.txt", header=0, low_memory=False )
df.shape
Out[6]:(17143, 55)

df2 = df[["CANONICAL_SMILES", "MOLREGNO" ]]
df2.to_csv('chembl_cyp3a4.csv', index=False)

from rdkit import Chem
from rdkit import rdBase

!python mmpa/rfrag.py < ./chembl_cyp3a4.csv > ./cyp3a4_frag.txt
!python mmpa/indexing.py -s -r 0.1 < ./cyp3a4_frag.txt > ./cyp3a4_mmp.txt
mmps= pd.read_csv(  'cyp3a4_mmp.txt' , header=None, names = ('smi1','smi2','id1','id2','tform','core'))
mmps.shape

Out[23]:(45096, 6)

Data preparation was finished.
Then read data from neo4j-shell
I used LOAD CSV WITH HEADERS function to do it.

neo4j-sh (?)$ LOAD CSV WITH HEADERS FROM 'file:///path/mmp_cyp3a4/chembl_cyp3a4mmp.csv' AS line
> MERGE (a:mol {smi:line.smi1, molregno: line.id1})
> MERGE (b:mol {smi:line.smi2, molregno: line.id2})
> MERGE (a)-[:MMP {tform:line.tform, core:line.core} ]->(b);
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2463
Relationships created: 36388
Properties set: 77702
Labels added: 2463
191456 ms

OK, Finally Search mmp square using cypher.
Cypher does not allow query that has same node symbol in a path, so I wrote comma separated query.

neo4j-sh (?)$ MATCH (n)-[r1]->(a)-[r2]->(b)-[r3]->(c), (c)-[r4]->(n) RETURN n.smi,r1.tform,a.smi,r2.tform,b.smi,r3.tform,c.smi,r4.tform LIMIT 1;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n.smi                               | r1.tform          | a.smi                              | r2.tform          | b.smi                               | r3.tform          | c.smi                              | r4.tform          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Cc1cccc(CNc2cc(ncn2)c3ccccc3Cl)c1" | "Cl[*:1]>>C[*:1]" | "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" | "C[*:1]>>CO[*:1]" | "COc1ccccc1c2cc(NCc3cccc(C)c3)ncn2" | "CO[*:1]>>C[*:1]" | "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" | "C[*:1]>>Cl[*:1]" |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
14266 ms

Wow! It works fine!
Next visualise result using rdkit!
New version of rdkit can draw molecule as SVG very easily.

from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
IPythonConsole.ipython_useSVG=True
tforms= "Cc1cccc(CNc2cc(ncn2)c3ccccc3Cl)c1", "Cl[*:1]>>C[*:1]" , "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" , "C[*:1]>>CO[*:1]" , "COc1ccccc1c2cc(NCc3cccc(C)c3)ncn2" , "CO[*:1]>>C[*:1]" , "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" ,"C[*:1]>>Cl[*:1]"
# Ignored following error / the following code can not read transforms.
molobj = [ Chem.MolFromSmiles(smi) for smi in tforms ]
# Draw them!
Draw.MolsToGridImage( [molobj[0],molobj[2], molobj[4], molobj[6]], molsPerRow=4 )

mmp_square

I want to develop chemoinformatics tools using rdkit and neo4j.