Recently importance of 3D character of molecules are increasing.
If I design a libraries, I want to estimate not only 2D, but also 3D diversity.
Fortunately RDKit implemented function for characterize the 3D character of molecules named ‘plane of best fit’ (PBF).
You can call this function from rdkit/Contrib/PBF folder. ;-) Great!!!
And reference of PBF is following.
Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules
http://pubs.acs.org/doi/abs/10.1021/ci300293f
In the paper, the authors compare with relationship between PBFscore and PMI( principal moment of inertia ). I’m interested in the algorithm, so I calculated two scores.
At first, I can calculate PBF by using RDKit Contrib/PBF code. But code for calculate PMI is not available. So, I write calculator.
My code is following. calc_pmi.py.
import sys from rdkit import Chem from rdkit.Chem import Descriptors from rdkit.Chem import AllChem from numpy import linalg from sklearn.preprocessing import normalize import numpy as np def inertia( mol,confId=-1, principal = True, moments_only = True ): numatoms = mol.GetNumAtoms() conf = mol.GetConformer( confId ) if not conf.Is3D(): return 0 #get cordinate of each atoms pts = np.array( [ list( conf.GetAtomPosition(atmidx) ) for atmidx in range( numatoms ) ] ) atoms = [ atom for atom in mol.GetAtoms() ] mass = Descriptors.MolWt( mol ) #get center of mass center_of_mass = np.array(np.sum( atoms[i].GetMass() * pts[i] for i in range( numatoms ))) /mass inertia = np.zeros( (3,3) ) for atmidx in range( numatoms ): cmx,cmy,cmz = pts[ atmidx ] - center_of_mass inertia += -atoms[ atmidx ].GetMass() / mass * np.matrix( [[ 0, -cmz, cmy ], [ cmz, 0, -cmx ], [ -cmy, cmx, 0 ]] ) ** 2 if principal: if moments_only: return np.linalg.eigvalsh( inertia ) else: return np.linalg.eigh( inertia ) else: return inertia if __name__=='__main__': mols = [ mol for mol in Chem.SDMolSupplier( sys.argv[1], removeHs=False ) ] for mol in mols: I = inertia(mol) npr1, npr2 = I[0]/I[2], I[1]/I[2] print( npr1, npr2 )
OK, let’s start calculation!
I picked up sample from one molecule from the reference.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole IPythonConsole.ipython_useSVG=True # pbf imported from rdkit/Contrib/PBF import pbf import calc_pmi #(1) mol in the ref. mol = Chem.MolFromSmiles( 'c1cc(O)c(C(=O)O)cc1N=Nc1ccc(O)c(C(=O)O)c1' ) hmol = Chem.AddHs(mol) AllChem.EmbedMolecule(hmol,useExpTorsionAnglePrefs=True ,useBasicKnowledge=True) AllChem.UFFOptimizeMolecule(hmol) rhmol = Chem.RemoveHs(hmol) hmolI = calc_pmi.inertia(hmol) rmolI = calc_pmi.inertia(rhmol)
Compare score with results in the reference.
#PBF score print(pbf.PBFRD(hmol), pbf.PBFRD(rhmol)) #out #0.140618602774 0.0769181645171 #NPR1, NPR2 hmolI = calc_pmi.inertia(hmol) rmolI = calc_pmi.inertia(rhmol) print(hmolI[0]/hmolI[2], hmolI[1]/hmolI[2]) print(rmolI[0]/rmolI[2], rmolI[1]/rmolI[2]) # out #with hydrogen #0.128978367365 0.876583937787 #without hydorgen #0.125631084212 0.879566191929
In the reference data is following.
NPR1 = 0.089 ( 0.128978367365, 0.125631084212 )
NPR2 = 0.910 ( 0.876583937787, 0.879566191929 )
PBF = 0.00313 ( 0.140618602774 , 0.0769181645171)
*data which surrounded by parentheses is calculated data with hydrogen, and without hydrogen in my code.
My data indicate that PMI does not see depend on existence of hydrogen, but PBF score is depended on hydrogen.
And data without hydrogen seems good correlation of literature data. I can’t understand the reason. :-(
I need to study more deep in the method.
BTW RDKit has a lots of function for chemoinformatics. It’s Swiss knife for chemoinformatics!
I referred following URL.
The URL gave good suggestion for me.
https://github.com/wrwrwr/mmoi-calc
Also, You can calculate PMI using Knime vernalis community nodes!!!
Wow! that’s nice for OSS! That’s so cool! ;-)