plot data using RDKit-PandasTools

I often use SDMolSupplier method to read SDF. But, PandasTool is another useful way to read SDF.
But, to handle property data, I need to convert data to float.
Today I read a major supplier’s SDF. And plot data using seaborn.

%matplotlib inline
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from rdkit.Chem import PandasTools

IPythonConsole.ipython_useSVG=True
nx = [ mol for mol in Chem.SDMolSupplier( 'dataset.sdf' ) ]
print( m = nx[0][n for n in m.GetPropNames()] )

[out]
[------------
 'CLogP',
 ------------
 'FSP3',
 -------------
'MolWeight',
 -------------
 'Purity_percent',
 'ROT',
-------------
 'Salt',
 'TPSA']

nx = PandasTools.LoadSDF( 'dataset.sdf' )
#convert data
nx.FSP3 = natx.FSP3.apply( np.float32 )
nx.ROT = natx.ROT.apply( np.float32 )
nx.TPSA = natx.TPSA.apply( np.float32 )
nx.CLogP = natx.CLogP.apply( np.float32 )
nx.MolWeight = natx.MolWeight.apply( np.float32 )

sns.lmplot( 'ROT','FSP3', data=natx, )
sns.distplot(natx.MolWeight)
sns.jointplot(natx.MolWeight, natx.FSP3, kind='hex')

Fig 3 indicates that the dataset have many FSp3 rich molecules.
I’m interested in the dataset and will analyse more details of the Dataset.
What is the most important thing in the library design ? Novelty, synthetic accessibility, cost, diversity, druggability, etc. How do you define the quality of compound / library ?

fig1
fsp3rot
fig2
mw
fig3
fsp3_molwt

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s