Scoring 3D diversity using RDKit #RDKit

Recently importance of 3D character of molecules are increasing.
If I design a libraries, I want to estimate not only 2D, but also 3D diversity.
Fortunately RDKit implemented function for characterize the 3D character of molecules named ‘plane of best fit’ (PBF).
You can call this function from rdkit/Contrib/PBF folder. ;-) Great!!!
And reference of PBF is following.
Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules

In the paper, the authors compare with relationship between PBFscore and PMI( principal moment of inertia ). I’m interested in the algorithm, so I calculated two scores.
At first, I can calculate PBF by using RDKit Contrib/PBF code. But code for calculate PMI is not available. So, I write calculator.
My code is following.

import sys
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem import AllChem
from numpy import linalg
from sklearn.preprocessing import normalize
import numpy as np

def inertia( mol,confId=-1,  principal = True, moments_only = True  ):
    numatoms = mol.GetNumAtoms()
    conf = mol.GetConformer( confId )
    if not conf.Is3D():
        return 0
    #get cordinate of each atoms
    pts =  np.array( [ list( conf.GetAtomPosition(atmidx) ) for atmidx in range( numatoms ) ]  )
    atoms = [ atom for atom in mol.GetAtoms() ]
    mass = Descriptors.MolWt( mol )
    #get center of mass
    center_of_mass = np.array(np.sum( atoms[i].GetMass() * pts[i] for i in range( numatoms ))) /mass

    inertia = np.zeros( (3,3) )
    for atmidx in range( numatoms ):
        cmx,cmy,cmz = pts[ atmidx ] - center_of_mass
        inertia += -atoms[ atmidx ].GetMass() / mass * np.matrix(
                [[ 0, -cmz, cmy ],
                 [ cmz, 0, -cmx ],
                 [ -cmy, cmx, 0 ]]
                ) ** 2
    if principal:
        if moments_only:
            return np.linalg.eigvalsh( inertia )
            return np.linalg.eigh( inertia )
        return inertia
if __name__=='__main__':
    mols = [ mol for mol in Chem.SDMolSupplier( sys.argv[1], removeHs=False )  ]
    for mol in mols:
        I = inertia(mol)
        npr1, npr2 = I[0]/I[2], I[1]/I[2]
        print( npr1, npr2 )

OK, let’s start calculation!
I picked up sample from one molecule from the reference.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
# pbf imported from rdkit/Contrib/PBF
import pbf
import calc_pmi

#(1) mol in the ref.
mol = Chem.MolFromSmiles( 'c1cc(O)c(C(=O)O)cc1N=Nc1ccc(O)c(C(=O)O)c1' )
hmol = Chem.AddHs(mol)
AllChem.EmbedMolecule(hmol,useExpTorsionAnglePrefs=True ,useBasicKnowledge=True)
rhmol = Chem.RemoveHs(hmol)

hmolI = calc_pmi.inertia(hmol)
rmolI = calc_pmi.inertia(rhmol)

Compare score with results in the reference.

#PBF score
print(pbf.PBFRD(hmol), pbf.PBFRD(rhmol))
#0.140618602774 0.0769181645171

hmolI = calc_pmi.inertia(hmol)
rmolI = calc_pmi.inertia(rhmol)
print(hmolI[0]/hmolI[2], hmolI[1]/hmolI[2])
print(rmolI[0]/rmolI[2], rmolI[1]/rmolI[2])
# out
#with hydrogen
#0.128978367365 0.876583937787
#without hydorgen
#0.125631084212 0.879566191929

In the reference data is following.

NPR1 = 0.089 ( 0.128978367365, 0.125631084212 )
NPR2 = 0.910 ( 0.876583937787, 0.879566191929 )
PBF = 0.00313 ( 0.140618602774 , 0.0769181645171)
*data which surrounded by parentheses is calculated data with hydrogen, and without hydrogen in my code.

My data indicate that PMI does not see depend on existence of hydrogen, but PBF score is depended on hydrogen.
And data without hydrogen seems good correlation of literature data. I can’t understand the reason. :-(
I need to study more deep in the method.
BTW RDKit has a lots of function for chemoinformatics. It’s Swiss knife for chemoinformatics!

I referred following URL.
The URL gave good suggestion for me.

Also, You can calculate PMI using Knime vernalis community nodes!!!
Wow! that’s nice for OSS! That’s so cool! ;-)

Screen Shot 2016-08-16 at 11.19.13 PM


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: