Create MMPDB ( matched molecular pair )!

Matched molecular pair analysis is very common method to analyze SAR for medicinal chemists. There are lots of publications about it and applications in these area.
I often use rdkit/Contrib/mmpa to make my own MMP dataset.
The origin of the algorithm is described in following URL.
https://www.ncbi.nlm.nih.gov/pubmed/20121045

Yesterday, good news announced by @RDKit_org. It is release the package that can make MMPDB.
I tried to use the package immediately.
This package is provided from github repo. And to use the package, I need to install apsw at first. APSW can install by using conda.
And the install mmpdb by python script.

iwatobipen$ conda install -c conda-forge apsw
iwatobipen$ git clone https://github.com/rdkit/mmpdb.git
iwatobipen$ cd mmpdb
iwatobipen$ python setup.py install

After success of installation, I could found mmpdb command in terminal.
I used CYP3A4 inhibition data from ChEMBL for the test.
I prepared two files, one has smiles and id, and the another has id and ic50 value.
* means missing value. In the following case, I provided single property ( IC50 ) but the package can handle multiple properties. If reader who is interested the package, please show more details by using mmpdb –help command.

iwatobipen$ head -n 10 chembl_cyp3a4.csv 
CANONICAL_SMILES,MOLREGNO
Cc1ccccc1c2cc(C(=O)n3cccn3)c4cc(Cl)ccc4n2,924282
CN(C)CCCN1c2ccccc2CCc3ccccc13,605
Cc1ccc(cc1)S(=O)(=O)\N=C(/c2ccc(F)cc2)\n3c(C)nc4ccccc34,1698776
NC[C@@H]1O[C@@H](Cc2c(O)c(O)ccc12)C34CC5CC(CC(C5)C3)C4,59721
Cc1ccc(cc1)S(=O)(=O)N(Cc2ccccc2)c3ccccc3C(=O)NCc4occc4,759749
O=C(N1CCC2(CC1)CN(C2)c3ccc(cc3)c4ccccc4)c5ccncc5,819161
iwatobipen$ head -n 10 prop.csv 
ID	STANDARD_VALUE
924282	*
605	*
1698776	*
59721	19952.62
759749	2511.89
819161	2511.89

mmdb fragment has –cut-smarts option.
It seems attractive for me! 😉
”’
–cut-smarts SMARTS alternate SMARTS pattern to use for cutting (default:
‘[#6+0;!$(*=,#[!#6])]!@!=!#[!#0;!#1;!$([CH2]);!$([CH3]
[CH2])]’), or use one of: ‘default’,
‘cut_AlkylChains’, ‘cut_Amides’, ‘cut_all’,
‘exocyclic’, ‘exocyclic_NoMethyl’
”’
Next step, make mmpdb and join the property to db.

# run fragmentation and my input file has header, delimiter is comma ( default is white space ). Output file is cyp3a4.fragments.
# Each line of inputfile must be unique!
iwatobipen$ mmpdb fragment chembl_cyp3a4.csv --has-header --delimiter 'comma' -o cyp3a4.fragments
# rung indexing with fragmented file and create a mmpdb. 
iwatobipen$ mmpdb index cyp3a4.fragments -o cyp3a4.mmpdb

OK I got cyp3a4.mmpdb file. (sqlite3 format)
Add properties to a DB.
Type following command.

iwatobipen$ mmpdb loadprops -p prop.csv cyp3a4.mmpdb
Using dataset: MMPs from 'cyp3a4.fragments'
Reading properties from 'prop.csv'
Read 1 properties for 17143 compounds from 'prop.csv'
5944 compounds from 'prop.csv' are not in the dataset at 'cyp3a4.mmpdb'
Imported 5586 'STANDARD_VALUE' records (5586 new, 0 updated).
Generated 83759 rule statistics (1329408 rule environments, 1 properties)
Number of rule statistics added: 83759 updated: 0 deleted: 0
Loaded all properties and re-computed all rule statistics.

Ready to use DB. Let’s play with the DB.
Identify possible transforms.

iwatobipen$ mmpdb transform --smiles 'c1ccc(O)cc1' cyp3a4.mmpdb --min-pair 10 -o transfom_res.txt
iwatobipen$ head -n3 transfom_res.txt 
ID	SMILES	STANDARD_VALUE_from_smiles	STANDARD_VALUE_to_smiles	STANDARD_VALUE_radius	STANDARD_VALUE_fingerprint	STANDARD_VALUE_rule_environment_id	STANDARD_VALUE_counSTANDARD_VALUE_avg	STANDARD_VALUE_std	STANDARD_VALUE_kurtosis	STANDARD_VALUE_skewness	STANDARD_VALUE_min	STANDARD_VALUE_q1	STANDARD_VALUE_median	STANDARD_VALUE_q3	STANDARD_VALUE_max	STANDARD_VALUE_paired_t	STANDARD_VALUE_p_value
1	CC(=O)NCCO	[*:1]c1ccccc1	[*:1]CCNC(C)=O	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1049493	14	3632	5313.6	-0.71409	-0.033683	-6279.7	498.81	2190.5	7363.4	12530	-2.5576	0.023849
2	CC(C)CO	[*:1]c1ccccc1	[*:1]CC(C)C	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1026671	20	7390.7	8556.1	-1.1253	-0.082107	-6503.9	-0	8666.3	13903	23534	-3.863	0.0010478

Output file has information of transformation with statistics values.
And the db can use to make a prediction.
Following command can generate two files with prefix CYP3A-.
CYP3A_pairs.txt
CYP3A_rules.txt

iwatobipen$ mmpdb predict --reference 'c1ccc(O)cc1' --smiles 'c1ccccc1' cyp3a4.mmpdb  -p STANDARD_VALUE --save-details --prefix CYP3A
iwatobipen$ head -n 3 CYP3A_pairs.txt
rule_environment_id	from_smiles	to_smiles	radius	fingerprint	lhs_public_id	rhs_public_id	lhs_smiles	rhs_smiles	lhs_value	rhs_value	delta
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1016823	839661	C[C@]12CC[C@@H]3[C@H](CC[C@H]4C[C@@H](O)CC[C@@]43C)[C@@H]1CC[C@H]2C(=O)CO	CC(=O)[C@@H]1CC[C@H]2[C@H]3CC[C@H]4C[C@@H](O)CC[C@]4(C)[C@@H]3CC[C@@]21C	1000	15849	14849
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	3666	47209	O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	15849	5011.9	-10837
iwatobipen$ head -n 3 CYP3A_rules.txt 
rule_environment_statistics_id	rule_id	rule_environment_id	radius	fingerprint	from_smiles	to_smiles	count	avg	std	kurtosis	skewness	min	q1	median	q3	max	paired_t	p_value
28699	143276	868610	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	[*:1]O	[*:1][H]	16	-587.88	14102	-0.47579	-0.065761	-28460	-8991.5	-3247.8	10238	23962	0.16674	0.8698
54091	143276	1140189	1	tLP3hvftAkp3EUY+MHSruGd0iZ/pu5nwnEwNA+NiAh8	[*:1]O	[*:1][H]	15	-1617	13962	-0.25757	-0.18897	-28460	-9534.4	-4646	7271.1	23962	0.44855	0.66062

It is worth that the package ca handle not only structure based information but also properties.
I learned a lot of things from the source code.
RDKit org is cool community!
I pushed my code to my repo.
https://github.com/iwatobipen/mmpdb_test

original repo URL is
https://github.com/rdkit/mmpdb
Do not miss it!

Advertisements

After mishima.syk #4

Thank a lot for all attendance and presenters at mishima.syk #4 !
I enjoyed the event and party.
All presentations were very interesting and exciting. 😉
I made brief introduction about cytoscape.js, and up loaded my slide and sample code to slide share and git hub.
slide link
code link
My sample apps will need Flask, and Flask-Bootstrap. They can be installed using pip.
My mistake was that I forgot mention about my test environment.
Sample code does not work on Chrome…..
I couldn’t fix the bug. Firefox, and Safari may be work well.
But in chrome, run the app from terminal and access 127.0.0.1:5000 .
Then open javascript console in your browser and type following code , you can draw network.

cy.load(items)

Hmm, I want to study more and more.

今回のmishimasyl勉強会は次世代シーケンサの話に始まり、ネットワーク関連の話題でサンプルをいじりながらあれこれとやったり、事例紹介をしていただいたりと、勉強になりました。
 NGsの解析は最終的な解析に至るまでのステップがかなり大変そうな印象を持ちました。
データの解析というとだいたい最後のステップの可視化から何かをひもとく部分をイメージすることが多い気もしますがデータの準備超大事ですよね、その辺のプロセスが垣間見えたのはよかったです。
 ネットワーク関連の話もたのしく聞かせていただきました。igraph便利ですね。
画像を解析するっての思想がなかったんでおおって感じです。
 早速本はポチった。
今回もありがとうございました。

*ちなみにサンプルサイトのCOLAだけがmapDataを使っていて
*トップのページだけがノードをクリックすると画像をとるサイトに飛ぶようになっていて
*後のページはすべてアラートでsmilesが出ます
html部分は後から書き足ししたこともあり、ライブラリ全部毎回読んでいたりと冗長に思われるかもしれません。

Get min and max values.

I made some web apps in our lab.
I want to make mmp app. For make it, I need function that can set range like spotfire.
HTML5 range tag is not enough.
….
I found good JS library noUiSlier !
It’s easy to get min and max value from slid-bar.
I wrote code using flask.
Sample folder are….

iwatobipen$ tree
.
├── app.py
├── static
│   ├── jquery-1.11.1.min.js
│   └── noUiSlider.7.0.8
│       ├── README.md
│       ├── jquery.nouislider.all.js
│       ├── jquery.nouislider.all.min.js
│       ├── jquery.nouislider.css
│       ├── jquery.nouislider.js
│       ├── jquery.nouislider.min.css
│       ├── jquery.nouislider.min.js
│       ├── jquery.nouislider.pips.css
│       └── jquery.nouislider.pips.min.css
└── templates
    ├── index.html
    └── sample.html

app.py was simple.

from flask import Flask, url_for, jsonify, request
from flask import render_template
from flask_bootstrap import Bootstrap


app = Flask(__name__)
Bootstrap( app )

@app.route('/')
def top():
    return render_template('index.html')

@app.route('/testpage1')
def sample():
    return render_template('sample.html')

if __name__=="__main__":
    app.run(debug=True)

Next, wrote sample.html

{% extends 'bootstrap/base.html' %} 
{% block title %} samole {% endblock title %}
{% block styles %}
{{ super() }}
<link href="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.min.css") }}" rel="stylesheet">
<link href="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.pips.min.css") }}" rel="stylesheet">
{% endblock styles %}
{% block scripts %}
{{ super() }}
<script type=text/javascript src="{{ url_for('static', filename="jquery-1.11.1.min.js") }}"></script>
<script type=text/javascript src="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.all.min.js") }}"></script>
			
<script>
$('#testslider').noUiSlider({
		start: [ 0, 20 ],
		connect: true,
			range: {
						'min': -40,
						'max': 40
								}
});
</script>
 <script>
$("#read-button").click(function(){
		alert( $('#testslider').val());
		});
 </script>

<script>
function leftValue( value, handle, slider ){
	$(this).text( handle.parent()[0].style.left );
};
$("#testslider").Link("lower").to($("#lower-value"));
$("#testslider").Link("lower").to($("#lower-offset"), leftValue);

$("#testslider").Link("upper").to($("#upper-value"));
$("#testslider").Link("upper").to($("#upper-offset"), leftValue);
</script>

{% block head %}
{{ super() }}
<style>
div.example {
	padding: 20px;
}
</style>

{% endblock %}

{% endblock scripts %}

{% block content %}
<div class="example">
 <div class="view">
	 <p> noUiSliderber test </p><br><br>
 <div id="testslider" style="width:200px;"></div>
 <br>
 <button id="read-button">read min-max</button>
 <br><br>
 minvalue<br>
 <span id="lower-value"></span><br>
 <span id="lower-offset"></span><br>
 maxvalue<br>
 <span id="upper-value"></span><br>
 <span id="upper-offset"></span><br>
{% endblock %}

This code is very simple, only slider bar and that bar can get min and max value.
So, integrate ajax, it’s easy to generate SQL with some data range.
Screen Shot 2014-10-04 at 9.57.09 PM

MMP using predict

I’m still thinking about how to use mmp data in our lab.
Inspired following nice presentation, I challenged to make predictive model from MMPA.
One of ipython notebook about rdkit. link
And another is Greg’s nice presentation about reaction finger print . link2
To make predictive model from mmps, I think, I need to convert molecular transformation to fingerprint .
So, I tried to use atompair fingerprint about pair.
Data preparation is almost same as link.
In following code, TID_CHEMBL240.txt was get from chembl19, and mmp_herg.txt was made by using rdkit mmp codes.
Steps are…
1st. Merge MMPs and herg_pKi.
2nd. Calculate delta pKi.
3rd. Calculate AtomPairFingerprint.
4th. Classify delta pKi is over 1 (10 hold increase pKi) or not.

import math
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from sklearn import feature_extraction
from sklearn.svm import SVR, SVC
from sklearn import svm
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

df = pd.read_table("TID_CHEMBL240.txt")
mmps = pd.read_csv("mmp_herg.txt", header=None, names=["smiles1", "smiles2","molregno1","molregno2","tform","core"])
PandasTools.AddMoleculeColumnToFrame(mmps,"smiles1","mol1")
PandasTools.AddMoleculeColumnToFrame(mmps,"smiles2","mol2")
mmps = mmps[["mol1","mol2","molregno1","molregno2","tform","core"]]


#make activity table
t1 = df[["MOLREGNO", "STANDARD_VALUE"]]
#1st step merge data
mmpdds=mmps.merge(t1,left_on="molregno1",right_on="MOLREGNO",suffixes=("_1","_2")).merge(t1,left_on="molregno2", right_on="MOLREGNO",suffixes=("_1","_2"))
mmpdds["pKi_1"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_1"])*1e-9),axis=1)
mmpdds["pKi_2"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_2"])*1e-9),axis=1)

#2nd step calc delta pKi
mmpdds["delta"]=mmpdds.pKi_1-mmpdds.pKi_2
mmpdds = mmpdds[["mol1","mol2","molregno1", "molregno2", "pKi_1", "pKi_2", "delta", "tform", "core"]]

#3rd Calc AtompairFingerprint
mmpdds["afp_1"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol1"]) ,axis=1)
mmpdds["afp_2"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol2"]) ,axis=1)
mmpdds["deltafp"]=mmpdds.afp_2-mmpdds.afp_1
mmpdds = mmpdds.dropna()
#classify delta pKi is >1 or not.
mmpdds["ov10"] = mmpdds.delta > 1

Hmm. maybe work fine.
Go next Step.
Next, I got NonzeroElements from AtomPairFingerprint and make sparse matrix from it.
Scikit-learn has good module DictVectrizer.
So, used tha method to handle sparse matrix.
And split dataset using train_test_split.
OK, read to go.
First, build SVR classification module.
Then apply to test set.

nzf = [fp.GetNonzeroElements() for fp in mmpdds.deltafp]
v=feature_extraction.DictVectorizer(sparse=True)
sparse_mat=v.fit_transform(nzf)
x_train, x_test, y_train, y_test = train_test_split(
    sparse_mat, mmpdds.ov10, test_size=0.3, random_state=42)
clf=svm.SVC()
clf.fit(x_train,y_train)
pred=clf.predict(x_test)

cm = confusion_matrix(y_test, pred)
print(cm)

confusion matrix was…
[[359 0]
[ 5 64]]
Fine !

文章が文章になっているのか不明ですね、、、
とりあえず動作は確認できたのでベースはよしということで。