Use Neo4j to store MMP data.

I read news about ‘panama papers’.
Data of panama pares was analysed with graph database! It was exciting for me.
http://neo4j.com/blog/analyzing-panama-papers-neo4j/
So, I’m interested in neo4j.
Fortunately, Mac user can install neo4j by using homebrew. 😉
Neo4j has original SQL like language named Cypher.
I used cypher to read data following reason.
At first I used py2neo but it was difficult to handle large dataset. Because py2neo communicates with neo4j server using REST and it took long time to read data, caused time out.

Fist step of starting Cyper is Create node, and relation.
It can with simple way.

Node is created following command.
CREATE ( n: name { property:value, …. } )
And relation is created following.
CREATE (n)-[ r:name {property: value, ….} ]->(n1)
(n)-[r]->() represents relation represents directed graph.
And It was easy to set properties. 😉
Let’s make sample dataset. I got mmp data from following url.
https://zenodo.org/record/8418#.VxLvURN97Uo
I renamed data and checked data.

iwatobipen$ head ChEMBL17_IC50_RECAP_MMP_list.csv 
Target_ChEMBLID,Cpd1_ChEMBLID,Cpd2_ChEMBLID,KeyFragment,Transformation,Cpd1_SMILES,Cpd2_SMILES
CHEMBL1075097,CHEMBL2348488,CHEMBL2326095,[R1]CCC(CCCCB(O)O)(C(=O)O)N,[R1]N(CC)CC>>[R1]N1CCCC1,OB(O)CCCCC([NH3+])(CC[NH+](CC)CC)C(=O)[O-],OB(O)CCCC[C@@]([NH3+])(CC[NH+]1CCCC1)C(=O)[O-]
CHEMBL1075097,CHEMBL2326085,CHEMBL2348486,[R1]CCC(CCCCB(O)O)(C(=O)O)N,[R1]N(CC)CC>>[R1]N1CCCC1,OB(O)CCCC[C@@]([NH3+])(CC[NH+](CC)CC)C(=O)[O-],OB(O)CCCCC([NH3+])(CC[NH+]1CCCC1)C(=O)[O-]
CHEMBL1075097,CHEMBL2348488,CHEMBL2348486,[R1]CCC(CCCCB(O)O)(C(=O)O)N,[R1]N(CC)CC>>[R1]N1CCCC1,OB(O)CCCCC([NH3+])(CC[NH+](CC)CC)C(=O)[O-],OB(O)CCCCC([NH3+])(CC[NH+]1CCCC1)C(=O)[O-]
iwatobipen$ wc ChEMBL17_IC50_RECAP_MMP_list.csv 
  240323  240323 60542670 ChEMBL17_IC50_RECAP_MMP_list.csv

The data has mmp information of ChEMBL17.
Then load data from neo4j-shell.
The csv file has header, so I used command LOAD CSV WITH HEADERS FROM…
I got 10 thousand data from original dataset.
iwatobioen $ head -n 10000 ChEMBL17_IC50_RECAP_MMP_list.csv > ChEMBL17_IC50_RECAP_MMP_10000list.csv
To load 10 thousand of data, it took lots time.

neo4j-sh (?)$ USING PERIODIC COMMIT 1000
> LOAD CSV WITH HEADERS FROM "file:////Users/iwatobipen/develop/py3env/neo4jtest/ChEMBL17_IC50_RECAP_MMP_10000list.csv" AS line
> MERGE (m1:mol { molid: line.Cpd1_ChEMBLID, smi: line.Cpd1_SMILES })
> MERGE (m2:mol { molid: line.Cpd2_ChEMBLID, smi: line.Cpd2_SMILES })
> CREATE (m1)-[r:MMP { transform:line.Transformation, targetid:line.Target_ChEMBLID}]->(m2); # Maybe MERGE is better...
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 3522
Relationships created: 9999
Properties set: 27042
Labels added: 3522
63150 ms

Hmm It needed too long time, I want to know more efficient way.
Anyway, I loaded data to graph database.
Then access http://localhost:7474 (default settings.)

I got following image …
neo4jtop

Write Query. “Select node and count relation as degree”.

$MATCH (node)-[r]->() RETURN node, count (r) AS degree ORDER BY degree DESC LIMIT 25;

query_res_cypher2

query_cypher1

I think graph database is useful to detect molecular matched series(MMS) because MMS is represented a path like nodeA -> nodeB -> nodeC ….
And also Molecular Matched square is represented a path like nodeA->nodeB->nodeC->nodeA.
These path is easily detect by using Cypher.

MATCH (n)-->(a)-->(b)
RETURN n, a, b

or

MATCH (n)-->(a)-->(b)-->(n)
RETURN n, a, b

Try it…

$MATCH (n)-->(a)-->(n) RETURN n,a LIMIT 100;

Then I got following result.
query_res_cypher3

I think neo4j is interesting for chemoinformatics and I want to analyse in-house and public data ASAP.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s