I introduced MMPDB v3 few days ago. The package is CLI tool. So user can do MMPA by typing command line. However the tool doesn’t provide API interface.
I think, if we can use MMPDB on jupyter-notebook it’ll be more useful because output data can be used directly in coding process.
MMPDB Cli interface is implemented with Click which is python command line option parser. And click has CliRunner for testing. I found that CliRunner is nice solution to do that.
Following example is my first trial to call mmpdb on jupyter notebook. And most of SQL part is bollowed from RDKit-Blog post.
At first, connect postgresql chembl30 and retrive data.
import psycopg2
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
from rdkit import Chem
from rdkit.Chem import PandasTools
import requests
import pandas as pd
from mmpdblib import cli
from click.testing import CliRunner
from IPython.display import HTML
%load_ext sql
print(rdBase.rdkitVersion)
data = %sql select canonical_smiles,molregno,activity_id,standard_value,standard_units from activities \
join assays using (assay_id) \
join compound_structures using (molregno) \
where tid=165 and standard_type='Ki' and standard_value is not null and standard_relation='=' \
and canonical_smiles not like '%.%';
df = data.DataFrame()
Next, calculate pKi and rename column for next data processing. And save compound data and property data separated as CSV format. It’s worth to know that properties data should be tab separated format.
import numpy as np
def get_pki(val):
val = float(val)+0.001
return 9.-np.log10(val)
df['pKi_hERG'] = df.standard_value.apply(get_pki)
df.rename(columns={'molregno':'ID'}, inplace=True)
cpddf = df[['canonical_smiles','ID']].copy()
cpddf.drop_duplicates().to_csv('chembl_herg.smi', sep=' ', index=False, header=False)
propdf = df[['ID', 'pKi_hERG']].copy()
propdf.drop_duplicates().to_csv('chembl_herg_prop.csv', sep='\t', index=False, columns=['ID', 'pKi_hERG'])
OK, ready to run MMPDBlib. It is easy to use CliRunner, I show examples.
# make fragmentdb
runner = CliRunner(mix_stderr=False)
args = ("fragment", "chembl_herg.smi", "-o", "chembl_herg.fragmentdb")
runner.invoke(cli.main, args=args)
# do indexing
args = ("index", "chembl_herg.fragmentdb", "-o", "chembl_herg.mmpdb")
runner.invoke(cli.main, args=args)
# check data
!mmpdb rulecat chembl_herg.mmpdb
id from_smiles to_smiles
4 [*:1]c1ccc(F)cc1Cl [*:1]c1ccc(F)cc1O
1 [*:1]c1ccc(F)cc1O [*:1]c1ccccc1O
2 [*:1]c1ccc(F)cc1Cl [*:1]c1ccccc1O
4183 [*:1]c1ccc(F)cc1Br [*:1]c1ccc(F)cc1Cl
4184 [*:1]c1c(Cl)cccc1Cl [*:1]c1ccc(F)cc1Cl
.....
# load properties
args = ("loadprops", "-p", "chembl_herg_prop.csv", "chembl_herg.mmpdb")
runner.invoke(cli.main, args=args)
It works fine. OK let’s use generate command.
# I used droperidol as an example molecule
smi = 'C1CN(CC=C1N2C3=CC=CC=C3NC2=O)CCCC(=O)C4=CC=C(C=C4)F'
droperidol = Chem.MolFromSmiles(smi)
smi = Chem.MolToSmiles(droperidol)
droperidol

args = ('generate', "--smiles", smi, "--query", "c1c(F)ccc(*)c1", "--radius", "1", "chembl_herg.mmpdb")
res = runner.invoke(cli.main, args)
Result object named ‘res’ has stdout as text format. I used StringIO and convert the data as pandas dataframe ;)
from io import StringIO
sio = StringIO()
sio.write(res.output)
sio.seek(0)
mmpdf = pd.read_csv(sio, sep='\t', skiprows=[1,2])
mmpdf.head(3)
#
start constant from_smiles to_smiles r pseudosmiles final heavies_diff #pairs pair_from_id pair_from_smiles pair_to_id pair_to_smiles
0 O=C(CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1)c1ccc(F... *C(=O)CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1 [*:1]c1ccc(F)cc1 [*:1]C 1 [*:1]-[C](~*)(~*) CC(=O)CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1 -6 2 423277 O=C(NC1CCc2cc(CCN3CCN(c4nsc5ccccc45)CC3)ccc21)... 423269 CC(=O)NC1CCc2cc(CCN3CCN(c4nsc5ccccc45)CC3)ccc21
1 O=C(CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1)c1ccc(F... *C(=O)CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1 [*:1]c1ccc(F)cc1 [*:1]C(C)C 1 [*:1]-[C](~*)(~*) CC(C)C(=O)CCCN1CC=C(n2c(=O)[nH]c3ccccc32)CC1 -4 2 423277 O=C(NC1CCc2cc(CCN3CCN(c4nsc5ccccc45)CC3)ccc21)... 423275 CC(C)C(=O)NC1CCc2cc(CCN3CCN(c4nsc5ccccc45)CC3)...
Finally, convert smiles to rdkit molobjects with Chem.PandasTools. I used changeMoleculeRendering method because before using the method molecules in dataframe are not rendered.
PandasTools.AddMoleculeColumnToFrame(mmpdf, smilesCol="start", molCol="start")
PandasTools.AddMoleculeColumnToFrame(mmpdf, smilesCol="final", molCol="final")
from rdkit.Chem import PandasPatcher
PandasPatcher.changeMoleculeRendering(mmpdf)

Worked fine ;)
By using the approach, we can call mmpdb function from python code without subprocess module. I think it is usful for calling function from the code with is click based CLI tool.
I uploaded whole code of the post on my gist.
Thanks for reading.