Natural Product likenes Score

Some years ago, large amount of molecules produced by using palladium catalysed cross coupling reaction, like suzuki-miyaura, negishi, stille, etc.
It showed great impact for medicinal chemistry but these reaction tend to produce flat molecules like low fsp3 score.
Now I often read the word ‘Escape from flat land, sp3 rich molecules, 3D diversity …’.
Natural products shows complex, rich sp3 structure.
So, natural product likeness is one of the score for estimation of the library.
Fortunately, we can get NPscore using RDKit. 😉
RDKit implemented following algorithms and easy to use it.
Natural Product-likeness Score and Its Application for Prioritization of Compound Libraries
Peter Ertl, Silvio Roggo, and Ansgar Schuffenhauer
Journal of Chemical Information and Modeling, 48, 68-74 (2008)

Lest try it. At first, I got dataset from NCI.
https://wiki.nci.nih.gov/display/NCIDTPdata/Compound+Sets
Diversity set 5, and Natural products set as SDF. And convert SDF to smiles.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdBase
div5 = [ m for m in Chem.SDMolSupplier('Div5_2DStructures_Oct2014.sdf') if m != None ]
nat = [ m for m in Chem.SDMolSupplier( 'NAtProd4.sdf' ) if m != None ]
f = open(  'dataset.txt', 'w' )
for m in div5:
    name = m.GetProp( 'NSC' )
    smi = Chem.MolToSmiles( m )
    f.write(  smi + ' ' + name + ' DIV\n' )
for m in nat:
    name = m.GetProp( 'NSC' )
    smi = Chem.MolToSmiles( m )
    f.write(  smi + ' ' + name + ' NAT\n' )
f.close()

OK, next I run npscore.py and merge resultdata.

NP_Score iwatobipen$ python npscorer.py dataset.txt > res.txt 
import seaborn as sns
import pandas as pd
df1 = pd.read_table( 'dataset.txt', sep=' ', names=['smi','nsc','cat'] )
df2 = pd.read_table( 'res.txt', sep='\t', names=['smi','nsc','np'] )
df=df1.join(df2.np)
sns.distplot( df[df.cat == 'DIV'].np)
sns.distplot( df[df.cat == 'NAT'].np)

I got following image.
NP set showed higher score than Div5 set.

screen shot

Data summary is following.

count    1593.000000
mean       -0.654590
std         1.035716
min        -3.258000
25%        -1.325000
50%        -0.753000
75%        -0.162000
max         4.054000
Name: np, dtype: float64
In [39]:

df[df.cat=='NAT'].np.describe()
Out[39]:
count    419.000000
mean       1.594697
std        1.012731
min       -1.541000
25%        0.911500
50%        1.485000
75%        2.228000
max        4.054000
Name: np, dtype: float64
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s