How to visualize QSAR model.

I often discuss with other chemist(s) about QSAR.
And sometime they told me …”QSAR is useful tool for drug discovery, but I don’t understand it. Because QSAR model (i.e. ML) is hard to understand why the compound is good ?”
Hmm, I agree his opinion.
SVM, NB, RF etc are very useful but these models are black box. So, it difficult to understand effect of substructures to the moldes.
Jürgen Bajorath et al. challenged to solve the gap and published interesting paper in J. Chem. Inf. Model.

They described in the paper…

understanding why a compound has undesirable ADME cahracterisitcs is just as important as knowing that it(ADME prediction) does.

I like this phrase.

They developed python library named nbvis that depend on scikitlearn and matplotlib.
The library can visualise contribution of each features of vectors.
I think the key point of the method is that the author used MACCSkeys to build model.
Because MACCSkey is easy to understand for chemist.
I wrote demo_code using RDKit.
Sample data was downloaded following ftp.
And added Class properties.(I set active flag “IC50_uM < 0.1 is active”)
At first, I set arguments 'names' and 'groups'.
Then wrote sample script like following.

import nbviz
import numpy as np
import sys
import maccskey
from rdkit import Chem
from rdkit.Chem import MACCSkeys
from sklearn.naive_bayes import BernoulliNB

def calc_MACCS_fp( mol ):
	mol_fp =list( MACCSkeys.GenMACCSKeys( mol ).GetOnBits() )
	mol_fp_vec = np.zeros( 167, )
	mol_fp_vec[ mol_fp ] = 1
	return mol_fp_vec

def make_fp_array( mols ):
	fp_array = [ calc_MACCS_fp( mol ) for mol in mols ]
	return fp_array

mols = [ mol for mol in Chem.SDMolSupplier( sys.argv[1] ) ]
X = make_fp_array( mols )
Y = [ float(mol.GetProp( "Class" )) for mol in mols ]

model = BernoulliNB( alpha=0.1 ) X[1:], Y[1:] )
conditional_probs = np.exp( model.feature_log_prob_ )
prior = np.exp( model.class_log_prior_[1] )
print 'condtional feature prob', conditional_probs
print 'class prior', prior
nbviz.visualize_model( conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

nbviz.visualize_prediction( X[0], conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

Let,s run script!

modelviz iwatobipen$ python mol_viz_demo/cox2_test.sdf 

Then two figures generated.
Red and blue colour of circles indicate that positive / negative influence of features and distance indicate that log odds ratio.
The approach is useful for discussion, because the figure provide information to chemists why the model indicate the substructures are effective.
But, it hard for me to visualise each targets….




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s