Matched molecular pair (MMP) is not AI based compound design method however it’s still useful and powerful approach to do compound design.
MMPDB is one of the cool package for MMP analysis. And Andrew who is developer of MMPDB preseted new version of MMPDB at RDKit UGM 2022.
Version 3 is still developing state but you can install it from andrew’s repo. I installed it and use mmpdb v3.
$ gh repo clone adalke/mmpdb -- -b v3-dev
$ cd mmpdb
$ pip install -e .
After the command avobe, I could call mmpdb command ;)
Make cdk2.smi from cdk2.sdf with rdkit.
from rdkit import Chem
mols = Chem.SDMolSupplier('cdk2.sdf')
with open('cdk2.smi') as of:
for m in mols:
molid = m.GetProp('_Name')
smi = Chem.MolToSmiles(m)
of.write(f'{smi} {molid}\n')
After making smi file, let’s make mmpdb.
$ mmpdb fragment cdk2.smi -o cdk2.fragment
$ mmpdb index cdk2.fragment -o cdk2.mmpd
Check rule list with rulecat command.
$ mmpdb rulecat cdk2.mmpdb
id from_smiles to_smiles
1 [*:1]c1[nH]cc2c1CCOC2=O [*:1]c1cnc[nH]1
2 [*:1]c1c(F)cc(Br)cc1F [*:1]c1nccs1
3 [*:1]Oc1nc([*:2])nc(N)c1N=O [*:1]Oc1nc([*:2])nc2[nH]cnc12
4 [*:1]Oc1nc([*:2])nc2[nH]cnc12 [*:1]Oc1nc(N)nc([*:2])c1N=O
5 [*:1]C1CC1 [*:1]c1ccccc1
6 [*:1]C(=O)C(C)C [*:1][C@@H]1CCCO1
7 [*:1]C(=O)C(C)C [*:1][C@H]1CCC(=O)N1
8 [*:1]C(=O)C(C)C [*:1]C1CCCCC1
9 [*:1]C(=O)C(C)C [*:1][C@@H]1CC=CCC1
10 [*:1][C@@H]1CCCO1 [*:1][C@H]1CCC(=O)N1
11 [*:1]C1CCCCC1 [*:1][C@@H]1CCCO1
12 [*:1][C@@H]1CC=CCC1 [*:1][C@@H]1CCCO1
13 [*:1]C1CCCCC1 [*:1][C@H]1CCC(=O)N1
14 [*:1][C@@H]1CC=CCC1 [*:1][C@H]1CCC(=O)N1
15 [*:1]C1CCCCC1 [*:1][C@@H]1CC=CCC1
16 [*:1]c1ccc(C(=O)[O-])c([*:2])c1 [*:1]c1cccc([*:2])c1
17 [*:1]Nc1ccc(C(=O)[O-])c([*:2])c1 [*:1]Nc1cccc([*:2])c1
18 [*:1]c1nc([*:2])c(N=O)c(N)n1 [*:1]c1nc([*:2])c2nc[nH]c2n1
19 [*:1]c1nc([*:2])c2nc[nH]c2n1 [*:1]c1nc(N)nc([*:2])c1N=O
20 [*:1]C [*:1]c1nccs1
21 [*:1]c1ccc(C(N)=O)cc1 [*:1][H]
22 [*:1]c1ccc(C(=O)[O-])c(Cl)c1 [*:1]c1cccc(Cl)c1
23 [*:1]CC1CC1 [*:1]Cc1ccccc1
24 [*:1]c1ccc(S(N)(=O)=O)cc1 [*:1]c1ccccc1
25 [*:1]c1ccccc1 [*:1][H]
26 [*:1]c1ccc(S(N)(=O)=O)cc1 [*:1][H]
27 [*:1]c1nc(N)nc(N)c1N=O [*:1]c1nc(N)nc2[nH]cnc12
28 [*:1]CC(=O)C(C)C [*:1]C[C@@H]1CCCO1
29 [*:1]CC(=O)C(C)C [*:1]C[C@H]1CCC(=O)N1
30 [*:1]CC(=O)C(C)C [*:1]CC1CCCCC1
31 [*:1]CC(=O)C(C)C [*:1]C[C@@H]1CC=CCC1
32 [*:1]C[C@@H]1CCCO1 [*:1]C[C@H]1CCC(=O)N1
33 [*:1]CC1CCCCC1 [*:1]C[C@@H]1CCCO1
34 [*:1]C[C@@H]1CC=CCC1 [*:1]C[C@@H]1CCCO1
35 [*:1]CC1CCCCC1 [*:1]C[C@H]1CCC(=O)N1
36 [*:1]C[C@@H]1CC=CCC1 [*:1]C[C@H]1CCC(=O)N1
37 [*:1]CC1CCCCC1 [*:1]C[C@@H]1CC=CCC1
38 [*:1]c1ccc(OC)cc1 [*:1]c1cccs1
39 [*:1]N1CCC[C@H](C(N)=O)C1 [*:1][H]
40 [*:1]OC [*:1]SCC[NH3+]
41 [*:1]S(=O)(=O)NC [*:1]S(=O)(=O)NC(=N)N
42 [*:1]S(=O)(=O)NC [*:1]S(=O)(=O)Nc1nccs1
43 [*:1]S(=O)(=O)NC(=N)N [*:1]S(=O)(=O)Nc1nccs1
44 [*:1]C(=O)[O-] [*:1][H]
45 [*:1]S(N)(=O)=O [*:1][H]
46 [*:1]OC[C@@H](O)C[NH+](C)C [*:1][H]
47 [*:1]NC(=O)NN(C)C [*:1]NC(N)=O
48 [*:1]N [*:1]Nc1ccc(C(N)=O)cc1
49 [*:1]OCC(=O)C(C)C [*:1]OC[C@@H]1CCCO1
50 [*:1]OCC(=O)C(C)C [*:1]OC[C@H]1CCC(=O)N1
51 [*:1]OCC(=O)C(C)C [*:1]OCC1CCCCC1
52 [*:1]OCC(=O)C(C)C [*:1]OC[C@@H]1CC=CCC1
53 [*:1]OC[C@@H]1CCCO1 [*:1]OC[C@H]1CCC(=O)N1
54 [*:1]OCC1CCCCC1 [*:1]OC[C@@H]1CCCO1
55 [*:1]OC[C@@H]1CC=CCC1 [*:1]OC[C@@H]1CCCO1
56 [*:1]OCC1CCCCC1 [*:1]OC[C@H]1CCC(=O)N1
57 [*:1]OC[C@@H]1CC=CCC1 [*:1]OC[C@H]1CCC(=O)N1
58 [*:1]OCC1CCCCC1 [*:1]OC[C@@H]1CC=CCC1
59 [*:1]NCC1CC1 [*:1]NCc1ccccc1
60 [*:1]N [*:1]Nc1ccccc1
60 rules are generated from cdk2.smi which contains 47 molecules.
One of the interesting feature of mmpdb v3 is generate command imprementation. The command allows to user for generating molecule with user defined substructure as query.
cdk2.mmpdb has metyl to thiazole rules ([:1]C >>> [:1]c1nccs1). So I tried to generate command with simple molecule.
$ mmpdb generate --smiles 'Cc1ccccc1' --query '*C' cdk2.mmpdb
start constant from_smiles to_smiles r pseudosmiles final heavies_diff #pairs pair_from_id pair_from_smiles pair_to_id pair_to_smiles
GENERATE:
EXEC: *C (22, 22, 1, 1)
Cc1ccccc1 *c1ccccc1 [*:1]C [*:1]c1nccs1 0 [*:1](~*) c1ccc(-c2nccs2)cc1 4 1 ZINC03814443 CNS(=O)(=O)c1ccc(N/C=C2\C(=O)Nc3ccccc32)cc1 ZINC03814447 O=C1Nc2ccccc2/C1=C/Nc1ccc(S(=O)(=O)Nc2nccs2)cc1
Generated molecule from “Cc1ccccc1” is ‘c1ccc(-c2nccs2)cc1’ and the image is shown below.

generate command can use radius option. It provides flexibility of user definition.
The new feature of mmpdb v3 is not limited generate command which is shown here. You should read Andrew’s presentation and his repo!
Thanks for reading ;)