Create MMPDB ( matched molecular pair )!

Matched molecular pair analysis is very common method to analyze SAR for medicinal chemists. There are lots of publications about it and applications in these area.
I often use rdkit/Contrib/mmpa to make my own MMP dataset.
The origin of the algorithm is described in following URL.

Yesterday, good news announced by @RDKit_org. It is release the package that can make MMPDB.
I tried to use the package immediately.
This package is provided from github repo. And to use the package, I need to install apsw at first. APSW can install by using conda.
And the install mmpdb by python script.

iwatobipen$ conda install -c conda-forge apsw
iwatobipen$ git clone
iwatobipen$ cd mmpdb
iwatobipen$ python install

After success of installation, I could found mmpdb command in terminal.
I used CYP3A4 inhibition data from ChEMBL for the test.
I prepared two files, one has smiles and id, and the another has id and ic50 value.
* means missing value. In the following case, I provided single property ( IC50 ) but the package can handle multiple properties. If reader who is interested the package, please show more details by using mmpdb –help command.

iwatobipen$ head -n 10 chembl_cyp3a4.csv 
iwatobipen$ head -n 10 prop.csv 
924282	*
605	*
1698776	*
59721	19952.62
759749	2511.89
819161	2511.89

mmdb fragment has –cut-smarts option.
It seems attractive for me! 😉
–cut-smarts SMARTS alternate SMARTS pattern to use for cutting (default:
[CH2])]’), or use one of: ‘default’,
‘cut_AlkylChains’, ‘cut_Amides’, ‘cut_all’,
‘exocyclic’, ‘exocyclic_NoMethyl’
Next step, make mmpdb and join the property to db.

# run fragmentation and my input file has header, delimiter is comma ( default is white space ). Output file is cyp3a4.fragments.
# Each line of inputfile must be unique!
iwatobipen$ mmpdb fragment chembl_cyp3a4.csv --has-header --delimiter 'comma' -o cyp3a4.fragments
# rung indexing with fragmented file and create a mmpdb. 
iwatobipen$ mmpdb index cyp3a4.fragments -o cyp3a4.mmpdb

OK I got cyp3a4.mmpdb file. (sqlite3 format)
Add properties to a DB.
Type following command.

iwatobipen$ mmpdb loadprops -p prop.csv cyp3a4.mmpdb
Using dataset: MMPs from 'cyp3a4.fragments'
Reading properties from 'prop.csv'
Read 1 properties for 17143 compounds from 'prop.csv'
5944 compounds from 'prop.csv' are not in the dataset at 'cyp3a4.mmpdb'
Imported 5586 'STANDARD_VALUE' records (5586 new, 0 updated).
Generated 83759 rule statistics (1329408 rule environments, 1 properties)
Number of rule statistics added: 83759 updated: 0 deleted: 0
Loaded all properties and re-computed all rule statistics.

Ready to use DB. Let’s play with the DB.
Identify possible transforms.

iwatobipen$ mmpdb transform --smiles 'c1ccc(O)cc1' cyp3a4.mmpdb --min-pair 10 -o transfom_res.txt
iwatobipen$ head -n3 transfom_res.txt 
1	CC(=O)NCCO	[*:1]c1ccccc1	[*:1]CCNC(C)=O	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1049493	14	3632	5313.6	-0.71409	-0.033683	-6279.7	498.81	2190.5	7363.4	12530	-2.5576	0.023849
2	CC(C)CO	[*:1]c1ccccc1	[*:1]CC(C)C	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1026671	20	7390.7	8556.1	-1.1253	-0.082107	-6503.9	-0	8666.3	13903	23534	-3.863	0.0010478

Output file has information of transformation with statistics values.
And the db can use to make a prediction.
Following command can generate two files with prefix CYP3A-.

iwatobipen$ mmpdb predict --reference 'c1ccc(O)cc1' --smiles 'c1ccccc1' cyp3a4.mmpdb  -p STANDARD_VALUE --save-details --prefix CYP3A
iwatobipen$ head -n 3 CYP3A_pairs.txt
rule_environment_id	from_smiles	to_smiles	radius	fingerprint	lhs_public_id	rhs_public_id	lhs_smiles	rhs_smiles	lhs_value	rhs_value	delta
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1016823	839661	C[C@]12CC[C@@H]3[C@H](CC[C@H]4C[C@@H](O)CC[C@@]43C)[C@@H]1CC[C@H]2C(=O)CO	CC(=O)[C@@H]1CC[C@H]2[C@H]3CC[C@H]4C[C@@H](O)CC[C@]4(C)[C@@H]3CC[C@@]21C	1000	15849	14849
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	3666	47209	O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	15849	5011.9	-10837
iwatobipen$ head -n 3 CYP3A_rules.txt 
rule_environment_statistics_id	rule_id	rule_environment_id	radius	fingerprint	from_smiles	to_smiles	count	avg	std	kurtosis	skewness	min	q1	median	q3	max	paired_t	p_value
28699	143276	868610	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	[*:1]O	[*:1][H]	16	-587.88	14102	-0.47579	-0.065761	-28460	-8991.5	-3247.8	10238	23962	0.16674	0.8698
54091	143276	1140189	1	tLP3hvftAkp3EUY+MHSruGd0iZ/pu5nwnEwNA+NiAh8	[*:1]O	[*:1][H]	15	-1617	13962	-0.25757	-0.18897	-28460	-9534.4	-4646	7271.1	23962	0.44855	0.66062

It is worth that the package ca handle not only structure based information but also properties.
I learned a lot of things from the source code.
RDKit org is cool community!
I pushed my code to my repo.

original repo URL is
Do not miss it!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s