I wrote blog post about mmp and neo4j somedays ago.
I thought that I could retrieve mmp square from neo4j.
MMP square means 4 molecules pairs like a following relationship.
mol1 => mol2 (MMP), mol2 => mol3 (MMP), mol3 => mol4 (MMP), mol4 => mol1(MMP)
The relationship is important to think about additive or non additive SAR.
And cypher can search the square very simply.
Let’s try it.
At first, I prepare dataset from chemblDB CYP3A4 inhibition data.
import pandas as pd import numpy as np df = pd.read_table( "bioactivity-16_12-40-28.txt", header=0, low_memory=False ) df.shape Out[6]:(17143, 55) df2 = df[["CANONICAL_SMILES", "MOLREGNO" ]] df2.to_csv('chembl_cyp3a4.csv', index=False) from rdkit import Chem from rdkit import rdBase !python mmpa/rfrag.py < ./chembl_cyp3a4.csv > ./cyp3a4_frag.txt !python mmpa/indexing.py -s -r 0.1 < ./cyp3a4_frag.txt > ./cyp3a4_mmp.txt mmps= pd.read_csv( 'cyp3a4_mmp.txt' , header=None, names = ('smi1','smi2','id1','id2','tform','core')) mmps.shape Out[23]:(45096, 6)
Data preparation was finished.
Then read data from neo4j-shell
I used LOAD CSV WITH HEADERS function to do it.
neo4j-sh (?)$ LOAD CSV WITH HEADERS FROM 'file:///path/mmp_cyp3a4/chembl_cyp3a4mmp.csv' AS line > MERGE (a:mol {smi:line.smi1, molregno: line.id1}) > MERGE (b:mol {smi:line.smi2, molregno: line.id2}) > MERGE (a)-[:MMP {tform:line.tform, core:line.core} ]->(b); +-------------------+ | No data returned. | +-------------------+ Nodes created: 2463 Relationships created: 36388 Properties set: 77702 Labels added: 2463 191456 ms
OK, Finally Search mmp square using cypher.
Cypher does not allow query that has same node symbol in a path, so I wrote comma separated query.
neo4j-sh (?)$ MATCH (n)-[r1]->(a)-[r2]->(b)-[r3]->(c), (c)-[r4]->(n) RETURN n.smi,r1.tform,a.smi,r2.tform,b.smi,r3.tform,c.smi,r4.tform LIMIT 1; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n.smi | r1.tform | a.smi | r2.tform | b.smi | r3.tform | c.smi | r4.tform | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Cc1cccc(CNc2cc(ncn2)c3ccccc3Cl)c1" | "Cl[*:1]>>C[*:1]" | "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" | "C[*:1]>>CO[*:1]" | "COc1ccccc1c2cc(NCc3cccc(C)c3)ncn2" | "CO[*:1]>>C[*:1]" | "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" | "C[*:1]>>Cl[*:1]" | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row 14266 ms
Wow! It works fine!
Next visualise result using rdkit!
New version of rdkit can draw molecule as SVG very easily.
from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import Draw IPythonConsole.ipython_useSVG=True tforms= "Cc1cccc(CNc2cc(ncn2)c3ccccc3Cl)c1", "Cl[*:1]>>C[*:1]" , "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" , "C[*:1]>>CO[*:1]" , "COc1ccccc1c2cc(NCc3cccc(C)c3)ncn2" , "CO[*:1]>>C[*:1]" , "Cc1cccc(CNc2cc(ncn2)c3ccccc3C)c1" ,"C[*:1]>>Cl[*:1]" # Ignored following error / the following code can not read transforms. molobj = [ Chem.MolFromSmiles(smi) for smi in tforms ] # Draw them! Draw.MolsToGridImage( [molobj[0],molobj[2], molobj[4], molobj[6]], molsPerRow=4 )
I want to develop chemoinformatics tools using rdkit and neo4j.