Create MMPDB ( matched molecular pair )!

Matched molecular pair analysis is very common method to analyze SAR for medicinal chemists. There are lots of publications about it and applications in these area.
I often use rdkit/Contrib/mmpa to make my own MMP dataset.
The origin of the algorithm is described in following URL.

Yesterday, good news announced by @RDKit_org. It is release the package that can make MMPDB.
I tried to use the package immediately.
This package is provided from github repo. And to use the package, I need to install apsw at first. APSW can install by using conda.
And the install mmpdb by python script.

iwatobipen$ conda install -c conda-forge apsw
iwatobipen$ git clone
iwatobipen$ cd mmpdb
iwatobipen$ python install

After success of installation, I could found mmpdb command in terminal.
I used CYP3A4 inhibition data from ChEMBL for the test.
I prepared two files, one has smiles and id, and the another has id and ic50 value.
* means missing value. In the following case, I provided single property ( IC50 ) but the package can handle multiple properties. If reader who is interested the package, please show more details by using mmpdb –help command.

iwatobipen$ head -n 10 chembl_cyp3a4.csv 
iwatobipen$ head -n 10 prop.csv 
924282	*
605	*
1698776	*
59721	19952.62
759749	2511.89
819161	2511.89

mmdb fragment has –cut-smarts option.
It seems attractive for me! 😉
–cut-smarts SMARTS alternate SMARTS pattern to use for cutting (default:
[CH2])]’), or use one of: ‘default’,
‘cut_AlkylChains’, ‘cut_Amides’, ‘cut_all’,
‘exocyclic’, ‘exocyclic_NoMethyl’
Next step, make mmpdb and join the property to db.

# run fragmentation and my input file has header, delimiter is comma ( default is white space ). Output file is cyp3a4.fragments.
# Each line of inputfile must be unique!
iwatobipen$ mmpdb fragment chembl_cyp3a4.csv --has-header --delimiter 'comma' -o cyp3a4.fragments
# rung indexing with fragmented file and create a mmpdb. 
iwatobipen$ mmpdb index cyp3a4.fragments -o cyp3a4.mmpdb

OK I got cyp3a4.mmpdb file. (sqlite3 format)
Add properties to a DB.
Type following command.

iwatobipen$ mmpdb loadprops -p prop.csv cyp3a4.mmpdb
Using dataset: MMPs from 'cyp3a4.fragments'
Reading properties from 'prop.csv'
Read 1 properties for 17143 compounds from 'prop.csv'
5944 compounds from 'prop.csv' are not in the dataset at 'cyp3a4.mmpdb'
Imported 5586 'STANDARD_VALUE' records (5586 new, 0 updated).
Generated 83759 rule statistics (1329408 rule environments, 1 properties)
Number of rule statistics added: 83759 updated: 0 deleted: 0
Loaded all properties and re-computed all rule statistics.

Ready to use DB. Let’s play with the DB.
Identify possible transforms.

iwatobipen$ mmpdb transform --smiles 'c1ccc(O)cc1' cyp3a4.mmpdb --min-pair 10 -o transfom_res.txt
iwatobipen$ head -n3 transfom_res.txt 
1	CC(=O)NCCO	[*:1]c1ccccc1	[*:1]CCNC(C)=O	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1049493	14	3632	5313.6	-0.71409	-0.033683	-6279.7	498.81	2190.5	7363.4	12530	-2.5576	0.023849
2	CC(C)CO	[*:1]c1ccccc1	[*:1]CC(C)C	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1026671	20	7390.7	8556.1	-1.1253	-0.082107	-6503.9	-0	8666.3	13903	23534	-3.863	0.0010478

Output file has information of transformation with statistics values.
And the db can use to make a prediction.
Following command can generate two files with prefix CYP3A-.

iwatobipen$ mmpdb predict --reference 'c1ccc(O)cc1' --smiles 'c1ccccc1' cyp3a4.mmpdb  -p STANDARD_VALUE --save-details --prefix CYP3A
iwatobipen$ head -n 3 CYP3A_pairs.txt
rule_environment_id	from_smiles	to_smiles	radius	fingerprint	lhs_public_id	rhs_public_id	lhs_smiles	rhs_smiles	lhs_value	rhs_value	delta
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1016823	839661	C[C@]12CC[C@@H]3[C@H](CC[C@H]4C[C@@H](O)CC[C@@]43C)[C@@H]1CC[C@H]2C(=O)CO	CC(=O)[C@@H]1CC[C@H]2[C@H]3CC[C@H]4C[C@@H](O)CC[C@]4(C)[C@@H]3CC[C@@]21C	1000	15849	14849
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	3666	47209	O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	15849	5011.9	-10837
iwatobipen$ head -n 3 CYP3A_rules.txt 
rule_environment_statistics_id	rule_id	rule_environment_id	radius	fingerprint	from_smiles	to_smiles	count	avg	std	kurtosis	skewness	min	q1	median	q3	max	paired_t	p_value
28699	143276	868610	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	[*:1]O	[*:1][H]	16	-587.88	14102	-0.47579	-0.065761	-28460	-8991.5	-3247.8	10238	23962	0.16674	0.8698
54091	143276	1140189	1	tLP3hvftAkp3EUY+MHSruGd0iZ/pu5nwnEwNA+NiAh8	[*:1]O	[*:1][H]	15	-1617	13962	-0.25757	-0.18897	-28460	-9534.4	-4646	7271.1	23962	0.44855	0.66062

It is worth that the package ca handle not only structure based information but also properties.
I learned a lot of things from the source code.
RDKit org is cool community!
I pushed my code to my repo.

original repo URL is
Do not miss it!

After mishima.syk #4

Thank a lot for all attendance and presenters at mishima.syk #4 !
I enjoyed the event and party.
All presentations were very interesting and exciting. 😉
I made brief introduction about cytoscape.js, and up loaded my slide and sample code to slide share and git hub.
slide link
code link
My sample apps will need Flask, and Flask-Bootstrap. They can be installed using pip.
My mistake was that I forgot mention about my test environment.
Sample code does not work on Chrome…..
I couldn’t fix the bug. Firefox, and Safari may be work well.
But in chrome, run the app from terminal and access .
Then open javascript console in your browser and type following code , you can draw network.


Hmm, I want to study more and more.



Get min and max values.

I made some web apps in our lab.
I want to make mmp app. For make it, I need function that can set range like spotfire.
HTML5 range tag is not enough.
I found good JS library noUiSlier !
It’s easy to get min and max value from slid-bar.
I wrote code using flask.
Sample folder are….

iwatobipen$ tree
├── static
│   ├── jquery-1.11.1.min.js
│   └── noUiSlider.7.0.8
│       ├──
│       ├── jquery.nouislider.all.js
│       ├── jquery.nouislider.all.min.js
│       ├── jquery.nouislider.css
│       ├── jquery.nouislider.js
│       ├── jquery.nouislider.min.css
│       ├── jquery.nouislider.min.js
│       ├── jquery.nouislider.pips.css
│       └── jquery.nouislider.pips.min.css
└── templates
    ├── index.html
    └── sample.html was simple.

from flask import Flask, url_for, jsonify, request
from flask import render_template
from flask_bootstrap import Bootstrap

app = Flask(__name__)
Bootstrap( app )

def top():
    return render_template('index.html')

def sample():
    return render_template('sample.html')

if __name__=="__main__":

Next, wrote sample.html

{% extends 'bootstrap/base.html' %} 
{% block title %} samole {% endblock title %}
{% block styles %}
{{ super() }}
<link href="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.min.css") }}" rel="stylesheet">
<link href="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.pips.min.css") }}" rel="stylesheet">
{% endblock styles %}
{% block scripts %}
{{ super() }}
<script type=text/javascript src="{{ url_for('static', filename="jquery-1.11.1.min.js") }}"></script>
<script type=text/javascript src="{{ url_for('static', filename="noUiSlider.7.0.8/jquery.nouislider.all.min.js") }}"></script>
		start: [ 0, 20 ],
		connect: true,
			range: {
						'min': -40,
						'max': 40
		alert( $('#testslider').val());

function leftValue( value, handle, slider ){
	$(this).text( handle.parent()[0].style.left );
$("#testslider").Link("lower").to($("#lower-offset"), leftValue);

$("#testslider").Link("upper").to($("#upper-offset"), leftValue);

{% block head %}
{{ super() }}
div.example {
	padding: 20px;

{% endblock %}

{% endblock scripts %}

{% block content %}
<div class="example">
 <div class="view">
	 <p> noUiSliderber test </p><br><br>
 <div id="testslider" style="width:200px;"></div>
 <button id="read-button">read min-max</button>
 <span id="lower-value"></span><br>
 <span id="lower-offset"></span><br>
 <span id="upper-value"></span><br>
 <span id="upper-offset"></span><br>
{% endblock %}

This code is very simple, only slider bar and that bar can get min and max value.
So, integrate ajax, it’s easy to generate SQL with some data range.
Screen Shot 2014-10-04 at 9.57.09 PM

MMP using predict

I’m still thinking about how to use mmp data in our lab.
Inspired following nice presentation, I challenged to make predictive model from MMPA.
One of ipython notebook about rdkit. link
And another is Greg’s nice presentation about reaction finger print . link2
To make predictive model from mmps, I think, I need to convert molecular transformation to fingerprint .
So, I tried to use atompair fingerprint about pair.
Data preparation is almost same as link.
In following code, TID_CHEMBL240.txt was get from chembl19, and mmp_herg.txt was made by using rdkit mmp codes.
Steps are…
1st. Merge MMPs and herg_pKi.
2nd. Calculate delta pKi.
3rd. Calculate AtomPairFingerprint.
4th. Classify delta pKi is over 1 (10 hold increase pKi) or not.

import math
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from sklearn import feature_extraction
from sklearn.svm import SVR, SVC
from sklearn import svm
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

df = pd.read_table("TID_CHEMBL240.txt")
mmps = pd.read_csv("mmp_herg.txt", header=None, names=["smiles1", "smiles2","molregno1","molregno2","tform","core"])
mmps = mmps[["mol1","mol2","molregno1","molregno2","tform","core"]]

#make activity table
#1st step merge data
mmpdds=mmps.merge(t1,left_on="molregno1",right_on="MOLREGNO",suffixes=("_1","_2")).merge(t1,left_on="molregno2", right_on="MOLREGNO",suffixes=("_1","_2"))
mmpdds["pKi_1"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_1"])*1e-9),axis=1)
mmpdds["pKi_2"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_2"])*1e-9),axis=1)

#2nd step calc delta pKi
mmpdds = mmpdds[["mol1","mol2","molregno1", "molregno2", "pKi_1", "pKi_2", "delta", "tform", "core"]]

#3rd Calc AtompairFingerprint
mmpdds["afp_1"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol1"]) ,axis=1)
mmpdds["afp_2"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol2"]) ,axis=1)
mmpdds = mmpdds.dropna()
#classify delta pKi is >1 or not.
mmpdds["ov10"] = > 1

Hmm. maybe work fine.
Go next Step.
Next, I got NonzeroElements from AtomPairFingerprint and make sparse matrix from it.
Scikit-learn has good module DictVectrizer.
So, used tha method to handle sparse matrix.
And split dataset using train_test_split.
OK, read to go.
First, build SVR classification module.
Then apply to test set.

nzf = [fp.GetNonzeroElements() for fp in mmpdds.deltafp]
x_train, x_test, y_train, y_test = train_test_split(
    sparse_mat, mmpdds.ov10, test_size=0.3, random_state=42)

cm = confusion_matrix(y_test, pred)

confusion matrix was…
[[359 0]
[ 5 64]]
Fine !