How to visualize QSAR model.

I often discuss with other chemist(s) about QSAR.
And sometime they told me …”QSAR is useful tool for drug discovery, but I don’t understand it. Because QSAR model (i.e. ML) is hard to understand why the compound is good ?”
Hmm, I agree his opinion.
SVM, NB, RF etc are very useful but these models are black box. So, it difficult to understand effect of substructures to the moldes.
Jürgen Bajorath et al. challenged to solve the gap and published interesting paper in J. Chem. Inf. Model.
http://pubs.acs.org/doi/abs/10.1021/ci500410g

They described in the paper…

understanding why a compound has undesirable ADME cahracterisitcs is just as important as knowing that it(ADME prediction) does.

I like this phrase.

They developed python library named nbvis that depend on scikitlearn and matplotlib.
The library can visualise contribution of each features of vectors.
I think the key point of the method is that the author used MACCSkeys to build model.
Because MACCSkey is easy to understand for chemist.
I wrote demo_code using RDKit.
https://github.com/iwatobipen/chemo_info/tree/master/modelviz
Sample data was downloaded following ftp.
ftp://ftp.ics.uci.edu/pub/baldig/learning/Sutherland/
And added Class properties.(I set active flag “IC50_uM < 0.1 is active”)
At first, I set arguments 'names' and 'groups'.
Then wrote sample script like following.

import nbviz
import numpy as np
import sys
import maccskey
from rdkit import Chem
from rdkit.Chem import MACCSkeys
from sklearn.naive_bayes import BernoulliNB


def calc_MACCS_fp( mol ):
	mol_fp =list( MACCSkeys.GenMACCSKeys( mol ).GetOnBits() )
	mol_fp_vec = np.zeros( 167, )
	mol_fp_vec[ mol_fp ] = 1
	return mol_fp_vec

def make_fp_array( mols ):
	fp_array = [ calc_MACCS_fp( mol ) for mol in mols ]
	return fp_array

mols = [ mol for mol in Chem.SDMolSupplier( sys.argv[1] ) ]
X = make_fp_array( mols )
Y = [ float(mol.GetProp( "Class" )) for mol in mols ]

model = BernoulliNB( alpha=0.1 )
model.fit( X[1:], Y[1:] )
conditional_probs = np.exp( model.feature_log_prob_ )
prior = np.exp( model.class_log_prior_[1] )
print 'condtional feature prob', conditional_probs
print 'class prior', prior
nbviz.visualize_model( conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

nbviz.visualize_prediction( X[0], conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

Let,s run script!

modelviz iwatobipen$ python view_model_demo.py mol_viz_demo/cox2_test.sdf 

Then two figures generated.
Red and blue colour of circles indicate that positive / negative influence of features and distance indicate that log odds ratio.
The approach is useful for discussion, because the figure provide information to chemists why the model indicate the substructures are effective.
But, it hard for me to visualise each targets….

figure_0

figure_1

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中