Predict pKa value with ML&QM #memo #cheminformatics #RDKit

pKa value is one of the important parameter in drug design. It’s describes basicity and acidity of molecules. So there are lots of tools to predict pKa value such as ACD lab, Marvin etc. Compared to commercial packages there are few solutions in open science field.

Paul Crodrowski’s group disclosed code for pKa prediction on github repo and it uses machine learning approach. Also the activity was presented in RDKitUGM 2019! You can get PDF poster from following URL.
https://github.com/czodrowskilab/pka/blob/master/RDKit_UGM_2019/poster.pdf

I’ve interested in the integration of QM and ML recently and found new code which uses both of them to predict micro pKa values.

The article published from ACS JCTC, ‘QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-pKa Predictions‘.

The article is open access so readers who interested in the article can access and read the article.

Kupqake uses xtb, semienpirical features to determine suitable tautomer of input molecule. Then predicts reactive sites where protonate or deprotonate and pKa value. To predict the reactive site, Kupqake uses two GNN models(predict deprotonate site and protonate site).

The authors evaluated the perfomance with several dataset and QupKake shows good predictive perfomance (some data shown below.)

It seems interesting so I tried to use QupKake today because the code shared on github repo ;)

Ok let’s write code after building experimental environment.

(base) $ git clone https://github.com/Shualdon/QupKake.git
(base) $ cd QupKake
(base) $ mamba env create -f environment.yml
(base) $ mamba activate qupkake
(qupkake) $ pip install .

After installation, qupkake command is available.

qupkake can accept not only single molecule as smiles, but also multiple molecules as csv, sdf. To use csv, file extension should be csv.

I tried to predict pKa value with m-amino benzoic acid SMILES input.

(qupkake) $ qupkake smi 'Nc1cc(C(=O)O)ccc1'
/home/iwatobipen/miniforge3/envs/qupkake/lib/python3.9/site-packages/qupkake/xtb-641/bin/xtb
Processing...
Processing molecule: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  4.51it/s]
Done!
Processing...
Processing molecule: 100%|████████████████████████████████████████████| 4/4 [00:02<00:00,  1.46it/s]
Done!
Predictions saved to data/output/qupkake_output.sdf

After calculation data folder is generated with default setting. OK let’s check calculated value from jupyter notebook. RDKit can render atom information if the atom has atomNote property.

The program return molecules with pKa type (acid/basic) and pka value by separated form.
I merged the value and render on a molecule. The code is shown below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import IPythonConsole
from collections import defaultdict
IPythonConsole.drawOptions.addAtomIndices = True
mols = [m for m in Chem.SDMolSupplier('./data/output/qupkake_output.sdf')]

for m in mols:
    rdDepictor.Compute2DCoords(m)
Draw.MolsToGridImage(mols, molsPerRow=5)

def get_pka_info(mols):
    res_mol = Chem.Mol(mols[0])
    for k in res_mol.GetPropNames():
        res_mol.ClearProp(k)
    res = defaultdict(list)
    for mol in mols:
        data = mol.GetPropsAsDict()
        res[data['idx']].append(f"{data['pka_type'][:1]}:{data['pka']} ")
    for k,v in res.items():
        res_mol.GetAtomWithIdx(k).SetProp('atomNote', ''.join(v))
    return (res, res_mol)


res, resmol = get_pka_info(mols)
resmol

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Wet pKa values from wikipedia are 3.07 (carboxyl) and 4.79 (amino), predicted values are 4.99(carboxyl) and 4.01 (amino).

The model gave pka type ‘acidic’ and ‘basic’ to the molecule.

In summary Qupkake is easy to use and useful package for predicting pKa values. I would like to test more drug like molecules later.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.