Let’s use mmpdb v3 #memo #rdkit #chemoinformatics

Matched molecular pair (MMP) is not AI based compound design method however it’s still useful and powerful approach to do compound design.

MMPDB is one of the cool package for MMP analysis. And Andrew who is developer of MMPDB preseted new version of MMPDB at RDKit UGM 2022.


Version 3 is still developing state but you can install it from andrew’s repo. I installed it and use mmpdb v3.

$ gh repo clone adalke/mmpdb -- -b v3-dev
$ cd mmpdb
$ pip install -e .

After the command avobe, I could call mmpdb command ;)

Make cdk2.smi from cdk2.sdf with rdkit.

from rdkit import Chem
mols = Chem.SDMolSupplier('cdk2.sdf')
with open('cdk2.smi') as of:
    for m in mols:
        molid = m.GetProp('_Name')
        smi = Chem.MolToSmiles(m)
        of.write(f'{smi} {molid}\n')

After making smi file, let’s make mmpdb.

$ mmpdb fragment cdk2.smi -o cdk2.fragment
$ mmpdb index cdk2.fragment -o cdk2.mmpd

Check rule list with rulecat command.

$ mmpdb rulecat cdk2.mmpdb
id	from_smiles	to_smiles
1	[*:1]c1[nH]cc2c1CCOC2=O	[*:1]c1cnc[nH]1
2	[*:1]c1c(F)cc(Br)cc1F	[*:1]c1nccs1
3	[*:1]Oc1nc([*:2])nc(N)c1N=O	[*:1]Oc1nc([*:2])nc2[nH]cnc12
4	[*:1]Oc1nc([*:2])nc2[nH]cnc12	[*:1]Oc1nc(N)nc([*:2])c1N=O
5	[*:1]C1CC1	[*:1]c1ccccc1
6	[*:1]C(=O)C(C)C	[*:1][C@@H]1CCCO1
7	[*:1]C(=O)C(C)C	[*:1][C@H]1CCC(=O)N1
8	[*:1]C(=O)C(C)C	[*:1]C1CCCCC1
9	[*:1]C(=O)C(C)C	[*:1][C@@H]1CC=CCC1
10	[*:1][C@@H]1CCCO1	[*:1][C@H]1CCC(=O)N1
11	[*:1]C1CCCCC1	[*:1][C@@H]1CCCO1
12	[*:1][C@@H]1CC=CCC1	[*:1][C@@H]1CCCO1
13	[*:1]C1CCCCC1	[*:1][C@H]1CCC(=O)N1
14	[*:1][C@@H]1CC=CCC1	[*:1][C@H]1CCC(=O)N1
15	[*:1]C1CCCCC1	[*:1][C@@H]1CC=CCC1
16	[*:1]c1ccc(C(=O)[O-])c([*:2])c1	[*:1]c1cccc([*:2])c1
17	[*:1]Nc1ccc(C(=O)[O-])c([*:2])c1	[*:1]Nc1cccc([*:2])c1
18	[*:1]c1nc([*:2])c(N=O)c(N)n1	[*:1]c1nc([*:2])c2nc[nH]c2n1
19	[*:1]c1nc([*:2])c2nc[nH]c2n1	[*:1]c1nc(N)nc([*:2])c1N=O
20	[*:1]C	[*:1]c1nccs1
21	[*:1]c1ccc(C(N)=O)cc1	[*:1][H]
22	[*:1]c1ccc(C(=O)[O-])c(Cl)c1	[*:1]c1cccc(Cl)c1
23	[*:1]CC1CC1	[*:1]Cc1ccccc1
24	[*:1]c1ccc(S(N)(=O)=O)cc1	[*:1]c1ccccc1
25	[*:1]c1ccccc1	[*:1][H]
26	[*:1]c1ccc(S(N)(=O)=O)cc1	[*:1][H]
27	[*:1]c1nc(N)nc(N)c1N=O	[*:1]c1nc(N)nc2[nH]cnc12
28	[*:1]CC(=O)C(C)C	[*:1]C[C@@H]1CCCO1
29	[*:1]CC(=O)C(C)C	[*:1]C[C@H]1CCC(=O)N1
30	[*:1]CC(=O)C(C)C	[*:1]CC1CCCCC1
31	[*:1]CC(=O)C(C)C	[*:1]C[C@@H]1CC=CCC1
32	[*:1]C[C@@H]1CCCO1	[*:1]C[C@H]1CCC(=O)N1
33	[*:1]CC1CCCCC1	[*:1]C[C@@H]1CCCO1
34	[*:1]C[C@@H]1CC=CCC1	[*:1]C[C@@H]1CCCO1
35	[*:1]CC1CCCCC1	[*:1]C[C@H]1CCC(=O)N1
36	[*:1]C[C@@H]1CC=CCC1	[*:1]C[C@H]1CCC(=O)N1
37	[*:1]CC1CCCCC1	[*:1]C[C@@H]1CC=CCC1
38	[*:1]c1ccc(OC)cc1	[*:1]c1cccs1
39	[*:1]N1CCC[C@H](C(N)=O)C1	[*:1][H]
40	[*:1]OC	[*:1]SCC[NH3+]
41	[*:1]S(=O)(=O)NC	[*:1]S(=O)(=O)NC(=N)N
42	[*:1]S(=O)(=O)NC	[*:1]S(=O)(=O)Nc1nccs1
43	[*:1]S(=O)(=O)NC(=N)N	[*:1]S(=O)(=O)Nc1nccs1
44	[*:1]C(=O)[O-]	[*:1][H]
45	[*:1]S(N)(=O)=O	[*:1][H]
46	[*:1]OC[C@@H](O)C[NH+](C)C	[*:1][H]
47	[*:1]NC(=O)NN(C)C	[*:1]NC(N)=O
48	[*:1]N	[*:1]Nc1ccc(C(N)=O)cc1
49	[*:1]OCC(=O)C(C)C	[*:1]OC[C@@H]1CCCO1
50	[*:1]OCC(=O)C(C)C	[*:1]OC[C@H]1CCC(=O)N1
51	[*:1]OCC(=O)C(C)C	[*:1]OCC1CCCCC1
52	[*:1]OCC(=O)C(C)C	[*:1]OC[C@@H]1CC=CCC1
53	[*:1]OC[C@@H]1CCCO1	[*:1]OC[C@H]1CCC(=O)N1
54	[*:1]OCC1CCCCC1	[*:1]OC[C@@H]1CCCO1
55	[*:1]OC[C@@H]1CC=CCC1	[*:1]OC[C@@H]1CCCO1
56	[*:1]OCC1CCCCC1	[*:1]OC[C@H]1CCC(=O)N1
57	[*:1]OC[C@@H]1CC=CCC1	[*:1]OC[C@H]1CCC(=O)N1
58	[*:1]OCC1CCCCC1	[*:1]OC[C@@H]1CC=CCC1
59	[*:1]NCC1CC1	[*:1]NCc1ccccc1
60	[*:1]N	[*:1]Nc1ccccc1

60 rules are generated from cdk2.smi which contains 47 molecules.

One of the interesting feature of mmpdb v3 is generate command imprementation. The command allows to user for generating molecule with user defined substructure as query.

cdk2.mmpdb has metyl to thiazole rules ([:1]C >>> [:1]c1nccs1). So I tried to generate command with simple molecule.

$ mmpdb generate --smiles 'Cc1ccccc1' --query '*C' cdk2.mmpdb
start	constant	from_smiles	to_smiles	r	pseudosmiles	final	heavies_diff	#pairs	pair_from_id	pair_from_smiles	pair_to_id	pair_to_smiles
  EXEC: *C (22, 22, 1, 1)
Cc1ccccc1	*c1ccccc1	[*:1]C	[*:1]c1nccs1	0	[*:1](~*)	c1ccc(-c2nccs2)cc1	4	1	ZINC03814443	CNS(=O)(=O)c1ccc(N/C=C2\C(=O)Nc3ccccc32)cc1	ZINC03814447	O=C1Nc2ccccc2/C1=C/Nc1ccc(S(=O)(=O)Nc2nccs2)cc1

Generated molecule from “Cc1ccccc1” is ‘c1ccc(-c2nccs2)cc1’ and the image is shown below.

generate command can use radius option. It provides flexibility of user definition.

The new feature of mmpdb v3 is not limited generate command which is shown here. You should read Andrew’s presentation and his repo!

Thanks for reading ;)


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: