タグ: RDKit

Installing TensorFlow on Mac OX X with GPU support

Yesterday, I tried to install tensorflow-gpu on my mac.
My PC is MacBook Pro (Retina, 15-inch, Mid 2014). The PC has NVIDA GPU.
OS is Seirra.
Details are described in following URL.
https://www.tensorflow.org/install/install_mac

I installed tensorflow directly by using pip command.

 $ pip install --upgrade tensorflow-gpu  # for Python 2.7 and GPU #for python2
 $ pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU  #for python2

Almost done, but not finished yet.
To finish the installation, I need to disable System Integrity Protection (SIP).
To do that I need follow these steps.

Restart my Mac.
Before OS X starts up, hold down Command-R and keep it held down until you see an Apple icon and a progress bar. ...
From the Utilities menu, select Terminal.
At the prompt type exactly the following and then press Return: csrutil disable.

I tested following code.

import tensorflow as tf

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Runs the op.
print sess.run(c)

And the results seems tensorflow can use GPU.

iwatobipen$ python testcode.py
2017-06-13 22:24:28.952288: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952314: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952319: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952323: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:29.469570: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] OS X does not support NUMA - returning NUMA node zero
2017-06-13 22:24:29.470683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.80GiB
2017-06-13 22:24:29.470713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-06-13 22:24:29.470720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y
2017-06-13 22:24:29.470731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0
2017-06-13 22:24:29.490805: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495363: I tensorflow/core/common_runtime/simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495384: I tensorflow/core/common_runtime/simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495395: I tensorflow/core/common_runtime/simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

ref URL
https://github.com/tensorflow/tensorflow/issues/3723

Graph convolution classification with deepchem

I posted about graph convolution regression using deepchem. And today, I tried graph convolution classification using deepchem.
Code is almost same as regression model. The only a difference point is use dc.models.MultitaskGraphClassifier instead of dc.models.MultitaskGraphRegressor.
I got sample ( JAK3 inhibitor ) data from chembl and tried to make model.

At first I used pandas to convert activity class ( active, non active )to 0,1 bit. Easy to do it.

import panda as pd
import pandas as pd
df = pd.read_table('jak3_chembl.txt', header=0)
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )
pd.factorize( df.ACTIVITY_COMMENT )
len(pd.factorize( df.ACTIVITY_COMMENT ))
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )[0]

df.to_csv('./preprocessed_jak3.csv', index=False)

Next wrote model and test it.

import tensorflow as tf
import deepchem as dc
import numpy as np
import pandas as pd

graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['activity_class'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer )
dataset = loader.featurize( './preprocessed_jak3.csv' )

splitter = dc.splits.splitters.RandomSplitter()
trainset,testset = splitter.train_test_split( dataset )

hp = dc.molnet.preset_hyper_parameters
param = hp.hps[ 'graphconv' ]
print(param['batch_size'])
g = tf.Graph()
graph_model = dc.nn.SequentialGraph( 75 )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphGather( 10 , activation='tanh'))

with tf.Session() as sess:
    model_graphconv = dc.models.MultitaskGraphClassifier( graph_model,
                                                      1,
                                                      75,
                                                     batch_size=10,
                                                     learning_rate = param['learning_rate'],
                                                     optimizer_type = 'adam',
                                                     beta1=.9,beta2=.999)
    model_graphconv.fit( trainset, nb_epoch=50 )

train_scores = {}
#regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean )
classification_metric = dc.metrics.Metric( dc.metrics.roc_auc_score, np.mean )
train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ classification_metric ]  )
p=model_graphconv.predict( testset )

for i in range( len(p )):
    print( p[i], testset.y[i] )

print(train_scores) 

root@08d8f729f78b:/deepchem/pen_test# python graphconv_jak3.py

And datalog file is….

Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./preprocessed_jak3.csv
Loading shard 1 of size 8192.
Featurizing sample 0
TIMING: featurizing shard 0 took 2.023 s
TIMING: dataset construction took 3.830 s
Loading dataset from disk.
TIMING: dataset construction took 2.263 s
Loading dataset from disk.
TIMING: dataset construction took 1.147 s
Loading dataset from disk.
50
Training for 50 epochs
Starting epoch 0
On batch 0
...............
On batch 0
On batch 50
computed_metrics: [0.97176380945032259]
{'graphconvreg': {'mean-roc_auc_score': 0.97176380945032259}}

Not so bad.
Classification model gives better result than regression model.
All code is pushed my github repository.
https://github.com/iwatobipen/deeplearning

Graph convolution regression with deepchem

Somedays ago, I posted blog about deepchem. I am still playing with deepchem. Today I tried to use graph convolution regression model.
Deepchem provided Graph convolution Regressor. Cool.
I used solubility data provided from AstraZeneca. https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL3301364
My test code is following. Almost same as deepchem”s example code.
CSVLoader method is very useful because it can not only read data but also calculate graph feature of each molecule.
Next, Define of Graph convolution network.

import tensorflow as tf
import deepchem as dc
import numpy as np
graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['LogS'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer )
dataset = loader.featurize( './bioactivity.csv' )

splitter = dc.splits.splitters.RandomSplitter()
trainset,testset = splitter.train_test_split( dataset )

hp = dc.molnet.preset_hyper_parameters
param = hp.hps[ 'graphconvreg' ]
print(param['batch_size'])
g = tf.Graph()
graph_model = dc.nn.SequentialGraph( 75 )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
#graph_model.add( dc.nn.GraphGather(param['batch_size'], activation='tanh'))
graph_model.add( dc.nn.GraphGather( 10 , activation='tanh'))

with tf.Session() as sess:
    model_graphconv = dc.models.MultitaskGraphRegressor( graph_model,
                                                      1,
                                                      75,
                                                     batch_size=10,
                                                     learning_rate = param['learning_rate'],
                                                     optimizer_type = 'adam',
                                                     beta1=.9,beta2=.999)
    model_graphconv.fit( trainset, nb_epoch=30 )

train_scores = {}
regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean )
train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ regression_metric ]  )
p=model_graphconv.predict( testset )

print(train_scores) 

Next run the code.

root@08d8f729f78b:/deepchem/pen_test# python graphconv_test.py > datalog

And datalog file is….

Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./bioactivity.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
...
Starting epoch 29
On batch 0
On batch 50
On batch 100
computed_metrics: [0.52744994044080606]
{'graphconvreg': {'mean-pearson_r2_score': 0.52744994044080606}}

r2 score is still row, but I think it can improve by change of nb_epochs.

All sample code was uploaded to github.
https://github.com/iwatobipen/deeplearning/blob/master/datalog

how to get molecular graph features

Belated I am interested in deepchem that is an open-source deep learning toolkit for drug discovery. Deep-chem supported many features for chemoinformatics.
And one of interested feature is calculation of molecular graphs. It is more primitive than hashed finger print. I tried to caluclate it.

Currently the toolkit supports only linux, so I installed deepchem via docker.
The installation was very easy.

iwatobipen$ docker pull deepchemio/deepchem
# wait a moment.... 😉
iwatobipen$ docker run -i -t deepchemio/deepchem
iwatobipen$ pip install jupyter
# following code is not necessary.
iwatobipen$ apt-get install vim

That’s all.
Next, I worked in docker env.

import deepchem as dc
from deepchem.feat import graph_features
from rdkit import Chem
convmol=graph_features.ConvMolFeaturizer()
mol = Chem.MolFromSmiles('c1ccccc1')
# convmol needs list of molecules
fs = convmol.featurize( [mol] )
f = fs[ 0 ]
# check method
dir( f )
Out[41]:
[ .....
 'agglomerate_mols',
 'atom_features',
 'canon_adj_list',
 'deg_adj_lists',
 'deg_block_indices',
 'deg_id_list',
 'deg_list',
 'deg_slice',
 'deg_start',
 'get_adjacency_list',
 'get_atom_features',
 'get_atoms_with_deg',
 'get_deg_adjacency_lists',
 'get_deg_slice',
 'get_null_mol',
 'get_num_atoms',
 'get_num_atoms_with_deg',
 'max_deg',
 'membership',
 'min_deg',
 'n_atoms',
 'n_feat']

To get atom features, use ‘get_atom_features’
To get edge information, use ‘get_adjacency_list’

f.get_atom_features()
Out[42]:
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1]])
f.get_adjacency_list()
Out[43]: [[1, 5], [0, 2], [1, 3], [2, 4], [3, 5], [4, 0]]

The array of atom feature means, carbon atom, degree is 2, SP2, and aromatic as one hot vector.

Next step, I will try to build model by using molecular graph.

Target prediction using local ChEMBL

Yesterday, I posed about target prediction using ChEMBLDB web API.
If I want to predict many molecules, it will need many time. So, I changed code to use local chembldb.
I used sqlalchemy, because the library is powerful and flexible to use any RDB.
Test code is following. The sample code needs a smiles strings for input, and returns top 10 predicted target.
I think python is very powerful language. I can do chemical structure handling, RDB searching, merge data etc. etc. by using only python!

from sqlalchemy import create_engine, MetaData
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from sklearn.externals import joblib

import pandas as pd
import numpy as np
import sys

smiles = sys.argv[ 1 ]
morgan_nb = joblib.load( 'models_22/10uM/mNB_10uM_all.pkl' )
classes = list( morgan_nb.targets )

mol = Chem.MolFromSmiles( smiles )
fp = AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits = 2048 )
res = np.zeros( len(fp), np.int32 )
DataStructs.ConvertToNumpyArray( fp, res )

probas = list( morgan_nb.predict_proba( res.reshape(1,-1))[0] )
predictions = pd.DataFrame(  list(zip(classes, probas)), columns=[ 'id', 'proba' ])

top10_pred = predictions.sort_values( by = 'proba', ascending = False ).head( 10 )
db = create_engine( 'postgres+psycopg2://<username>:<password>@localhost/chembl_22' )
conn = db.connect()

def getprefname( chemblid ):
    res = conn.execute( "select chembl_id, pref_name,organism from target_dictionary where chembl_id='{0}'".format( chemblid ))
    res = res.fetchall()
    return res[0]

plist = []
for i, e in enumerate( top10_pred['id'] ):
    plist.append( list(getprefname(e)) )
conn.close()
target_info = pd.DataFrame( plist, columns = ['id', 'name', 'organism'] )
summary_df = pd.merge( top10_pred, target_info, on='id')

print( summary_df )

OK, check the performance.
From shell script.

Tofa.

iwatobipen$ python targetprediction.py 'CC1CCN(CC1N(C)C2=NC=NC3=C2C=CN3)C(=O)CC#N'
           id     proba                                   name      organism
0  CHEMBL2835  1.000000           Tyrosine-protein kinase JAK1  Homo sapiens
1  CHEMBL2148  1.000000           Tyrosine-protein kinase JAK3  Homo sapiens
2  CHEMBL2971  1.000000           Tyrosine-protein kinase JAK2  Homo sapiens
3  CHEMBL5073  1.000000                     CaM kinase I delta  Homo sapiens
4  CHEMBL3553  0.999986           Tyrosine-protein kinase TYK2  Homo sapiens
5  CHEMBL4147  0.999966                    CaM kinase II alpha  Homo sapiens
6  CHEMBL4924  0.999896    Ribosomal protein S6 kinase alpha 6  Homo sapiens
7  CHEMBL5698  0.999871         NUAK family SNF1-like kinase 2  Homo sapiens
8  CHEMBL3032  0.999684                      Protein kinase N2  Homo sapiens
9  CHEMBL5683  0.999640  Serine/threonine-protein kinase DCLK1  Homo sapiens

imatinib

iwatobipen$ python targetprediction.py 'CN1CCN(CC2=CC=C(C=C2)C(=O)NC2=CC(NC3=NC=CC(=N3)C3=CN=CC=C3)=C(C)C=C2)CC1'
           id     proba                                               name  \
0  CHEMBL1862  1.000000                        Tyrosine-protein kinase ABL   
1  CHEMBL5145  1.000000              Serine/threonine-protein kinase B-raf   
2  CHEMBL1936  1.000000                   Stem cell growth factor receptor   
3  CHEMBL2007  1.000000      Platelet-derived growth factor receptor alpha   
4  CHEMBL5122  1.000000             Discoidin domain-containing receptor 2   
5  CHEMBL1974  0.999999              Tyrosine-protein kinase receptor FLT3   
6  CHEMBL3905  0.999999                        Tyrosine-protein kinase Lyn   
7  CHEMBL4722  0.999994           Serine/threonine-protein kinase Aurora-A   
8   CHEMBL279  0.999991      Vascular endothelial growth factor receptor 2   
9  CHEMBL5319  0.999988  Epithelial discoidin domain-containing receptor 1   

       organism  
0  Homo sapiens  
1  Homo sapiens  
2  Homo sapiens  
3  Homo sapiens  
4  Homo sapiens  
5  Homo sapiens  
6  Homo sapiens  
7  Homo sapiens  
8  Homo sapiens  
9  Homo sapiens  

Known molecules are predicted with high accuracy. Next I want to try unknown molecules. 😉

Target prediction using ChEMBL

You know, there are some database that can publicly available database in chemo informatics area.
ChEMBL DB is one of useful database. George Papadatos introduced useful tool for target prediction using ChEMBL. He provided chembl target prediction model via ftp server !
So, everyone can use the model.
I used the model and tried to target prediction.
At first, I get the model from ftp server. And launched jupyter notebook. 😉

iwatobipen$ wget wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions/chembl_22_models.tar.gz
iwatobipen$ tar -vzxf chembl_22_models.tar.gz
iwatobipen$ jupyter notebook

It ready! Go ahead.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import PandasTools
from rdkit import DataStructs
import pandas as pd
from pandas import concat
from collections import OrderedDict
import requests
import numpy
from sklearn.externals import joblib
from rdkit import rdBase
print( rdBase.rdkitVersion )
>2016.09.2

I tried in python3.5 environment.

morgan_nb = joblib.load( 'models_22/10uM/mNB_10uM_all.pkl' )
classes = list( morgan_nb.targets )
len( classes )
> 1524 # model has 1524 targets ( classes )

I used sitagliptin as input molecule.

smiles = 'C1CN2C(=NN=C2C(F)(F)F)CN1C(=O)C[C@@H](CC3=CC(=C(C=C3F)F)F)N'
mol = Chem.MolFromSmiles( smiles )
mol

Next, calculate morgan fingerprint and convert the fingerprint to numpy array.

fp = AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=2048 )
res = numpy.zeros( len(fp), numpy.int32 )
DataStructs.ConvertToNumpyArray( fp, res )

Predict target and sort the result by probability.

probas = list( morgan_nb.predict_proba( res.reshape(1,-1))[0] ) 
top_pred = predictions.sort_values( by='proba', ascending = False).head(10)
top_pred

Jupyter notebook ver5 changed table view!

Then convert from chembl ID to target name.

def fetch_WS( trgt ):
    re = requests.get( 'https://www.ebi.ac.uk/chembl/api/data/target/{0}.json'.format(trgt) )
    return ( trgt, re.json()['pref_name'], re.json()['organism'] )
plist = []
for i , e in enumerate( top_pred['id'] ):
    plist.append( fetch_WS(e) )
target_info = pd.DataFrame( plist, columns=['id', 'name', 'organism'])
pd.merge( top_pred, target_info )

The model predicted stagliptin is DPP4 modulator! I think this work is interesting. I will try to predict another molecules and integrate local ChEMBL DB to improve performance.
😉

Original source code is following URL. Thanks for useful information!!!!
https://github.com/madgpap/notebooks
http://chembl.blogspot.jp/2016/03/target-prediction-models-update.html

Draw molecule with atom index in RDKit

I found interesting topics in rdkit discuss. How to draw molecule with atom index.
Greg developer of RDKit answered tips to do it. It can use molAtomMapNumber.
https://sourceforge.net/p/rdkit/mailman/message/31663468/
I didn’t know that!
I tried that in my PC. RDKit can draw molecule easily using IPythonConsole.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.ipython_useSVG = True

def mol_with_atom_index( mol ):
    atoms = mol.GetNumAtoms()
    for idx in range( atoms ):
        mol.GetAtomWithIdx( idx ).SetProp( 'molAtomMapNumber', str( mol.GetAtomWithIdx( idx ).GetIdx() ) )
    return mol

Test in a kinase inhibitor

mol = Chem.MolFromSmiles( "C1CC2=C3C(=CC=C2)C(=CN3C1)[C@H]4[C@@H](C(=O)NC4=O)C5=CNC6=CC=CC=C65" )

Draw molecule.

#Default
mol
#With index
mol_with_atom_index(mol)

https://github.com/iwatobipen/chemo_info/blob/master/rdkit_notebook/drawmol_with%2Bidx.ipynb

screenshot
screenshot