Matched molecular pairs are popular approach to transform molecules with prior knowledge of medicinal chemistry. MMPDB is useful open source package for managing MMP dataset which is derived from GSK under the 3-clause BSD license. It is easy to use and work really fast. I love it the package. However current version of MMPDB supports only sqlite3. SQLite3 is easy to use and it works without server.
I would like to build MMPDB in postgresql because RDKit community provides for chemical cartridge postgresql. So it’ll be useful if it’s possible to make postgresql MMPDB.
Fortunately recently, Andrew Dalke, shared new version of MMPDB wich supports postgresql!
The version isn’t main beach yet but you can get it from v3-dev branch. https://github.com/adalke/mmpdb/tree/v3-dev/mmpdblib/cli
OK let’s use it.
At first, clone the mmpdb and install it.
$ gh repo clone adalke/mmpdb -- -b v3-dev
$ cd mmpdb
$ pip install -e .
Then modify mmpdb/index_writers.py.
# From L436-
'''
def add_rule_environment(self, rule_env_idx, rule_idx, env_fp_idx, radius):
self._rule_environment_values.append(
(rule_env_idx, rule_idx, env_fp_idx, radius))
if next(self._check_flush):
self.flush()
'''
# To (added num_pairs)
def add_rule_environment(self, rule_env_idx, rule_idx, env_fp_idx, radius, num_pairs=0):
self._rule_environment_values.append(
(rule_env_idx, rule_idx, env_fp_idx, radius, num_pairs))
if next(self._check_flush):
self.flush()
Then create test mmpdb.
$ createdb mmpdbtest
Next, make fragmentdb with mmpdb fragment command.
$ head -n 10 herg_data.txt
$ head -n 10 herg_data.txt
canonical_smiles chembl_id molregno activity_id standard_value standard_units
O=S(=O)(c1ccccc1)C1(F)CCN(CCc2ccc(F)cc2F)CC1 CHEMBL175586 296708 1403965 2446.0 nM
N[C@H](C(=O)N1CC[C@H](F)C1)[C@H]1CC[C@H](NS(=O)(=O)c2ccc(F)cc2F)CC1 CHEMBL22310 29272 671631 49000.0 nM
N[C@H](C(=O)N1CCSC1)C1CCCCC1 CHEMBL23223 29758 674222 28000.0 nM
N[C@H](C(=O)N1CCSC1)[C@H]1CC[C@H](NC(=O)c2ccc(F)c(F)c2)CC1 CHEMBL22359 29449 675583 5900.0 nM
N[C@H](C(=O)N1CCCC1)[C@H]1CC[C@H](NS(=O)(=O)c2ccc(F)cc2F)CC1 CHEMBL25437 29244 675588 35000.0 nM
N[C@H](C(=O)N1CC[C@@H](F)C1)[C@H]1CC[C@H](NS(=O)(=O)c2ccc(OC(F)(F)F)cc2)CC1 CHEMBL281561 29265 679299 6000.0 nM
N[C@H](C(=O)N1CC[C@@H](F)C1)[C@H]1CC[C@H](NS(=O)(=O)c2ccc(F)cc2F)CC1 CHEMBL283309 29253 679302 52000.0 nM
N[C@H](C(=O)N1CCCC1)[C@H]1CC[C@H](NC(=O)c2ccc(F)c(F)c2)CC1 CHEMBL278558 29482 683566 29000.0 nM
N[C@H](C(=O)N1CCSC1)[C@H]1CC[C@H](NC(=O)c2ccccc2C(F)(F)F)CC1 CHEMBL283368 29340 685042 39000.0 nM
$ mmpdb fragment herg_data.txt -o herg_dataset.fragdb
Now I could get herg_dataset.fragdb. To make mmpdb in postgresql. Type following command. I run postgresql in localhost and insert table to mmpdbtest.
$ mmpdb index herg_dataset.fragdb -o postgres://localhost/mmpdbtest
The command above will make mmpdb in postgresql mmpdbtest database. Check postgesql database.
$ psql mmpdbtest
psql (12.9, server 12.2)
Type "help" for help.
mmpdbtest=# select * from rule_environment where num_pairs >1;
id | rule_id | environment_fingerprint_id | radius | num_pairs
-------+---------+----------------------------+--------+-----------
43 | 8 | 7 | 0 | 2
49 | 9 | 1 | 0 | 2
55 | 10 | 7 | 0 | 5
61 | 11 | 7 | 0 | 2
67 | 12 | 1 | 0 | 2
127 | 22 | 1 | 0 | 8
133 | 23 | 1 | 0 | 3
139 | 24 | 1 | 0 | 3
187 | 32 | 1 | 0 | 4
193 | 33 | 1 | 0 | 6
211 | 36 | 1 | 0 | 6
217 | 37 | 1 | 0 | 3
229 | 39 | 1 | 0 | 3
253 | 43 | 7 | 0 | 3
259 | 44 | 7 | 0 | 2
265 | 45 | 7 | 0 | 8
343 | 58 | 57 | 0 | 2
361 | 61 | 57 | 0 | 2
439 | 74 | 7 | 0 | 3
--More--
Next do transformation with mmpdb.
$ mmpdb transform --smiles 'c1cccnc1O' postgres://localhost/mmpdbtest --max-variable-size 5
ID SMILES
1 CN(C)c1ccccn1
2 CNc1ccccn1
3 COC(=O)c1ccccn1
4 COCCNc1ccccn1
5 COc1ccccn1
6 CS(=O)(=O)Nc1ccccn1
7 Cc1ccccn1
8 Clc1ccccn1
9 Fc1ccccn1
...
It works fine.
In summary, new version of mmpdb works with postgresql. It provides opportunity to integrate rdkit chemical cartridge and move to sqlite3 to postgresql.
Thanks for continuous development of mmpdb!
If readers who have interest the package let’s use it ;)