Deep learning with python.

I’m interested in machine-learning. And python is good tool to do that for me.
Deep learning is one of the hot topic in these area.
There are some library for deep learning in python, “Theano”, “Pylearn2”.
But, these packages are difficult for me ;-( .
So, I used nolearn to do deep-learning
Nolearn is easy to install. If you want, you can install via pip.
Let’s write code.
At first, I got sampledata following link.
here
The dataset contains 4337 Structures with AMES Categorisation (mutagen/nonmutagen).
At first I converted mutagen/nonmutagen tag to 1/0.

from rdkit import Chem
mols = [ mol for mol in Chem.SDMolSupplier("cas_4337.sdf")  ]
writer = Chem.SDWriter("ames.sdf")
for mol in mols:
    if mol.GetProp("Ames test categorisation") == "mutagen":
        mol.SetProp("Act", str(1))
    else:
        mol.SetProp("Act",str(0))
    writer.write( mol )
writer.close()

OK, Let’s Predict some molecules.
To calculate molecular descriptors I used rdkit.
1. Calc descriptors
2. Split train/test data sets
3.Build model and predict test set.

#Ames rprediction using DBN

import numpy as np
from rdkit import Chem
from rdkit.ML.Descriptors import MoleculeDescriptors
from rdkit.Chem import Descriptors

from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report

from nolearn.dbn import DBN

nms = [ x[0] for x in Descriptors._descList ]
calc = MoleculeDescriptors.MolecularDescriptorCalculator( nms )
# define descriptor calculator.
def calc_descs( mol ):
    res = calc.CalcDescriptors( mol )
    return res
# read molecules.
mols = [ mol for mol in Chem.SDMolSupplier("ames.sdf")  ]
descs = [ calc_descs( mol ) for mol in mols ]
acts = [ int(mol.GetProp("Act")) for mol in mols ]
# convert list to array and convert nan to 0. and scaling.

descs = scale(np.nan_to_num(np.asarray( descs )))
acts = np.asarray( acts )

# split data.
train_descs, test_descs,  train_acts, test_acts = train_test_split(descs, acts,
                                                                 test_size=0.3,
                                                                 random_state=0)


print train_descs.shape, train_acts.shape
print test_descs.shape, test_acts.shape

# define parameter
# increase epochs, you'll need long time.
dbn = DBN( [descs.shape[1], descs.shape[1]/3,2],
           learn_rates = 0.27,
           minibatch_size = train_descs.shape[0],
           epochs=10,
           verbose=0,)

dbn.fit(train_descs, train_acts)
pred=dbn.predict(test_descs)

print(classification_report( test_acts, pred )

Let’s try!

iwatobipen-MacBook-Air:cas_4337 iwatobipen$ python ames_pred.py
(3503, 196) (3503,)
(1502, 196) (1502,)
             precision    recall  f1-score   support

          0       0.84      0.81      0.82       669
          1       0.85      0.87      0.86       833

avg / total       0.84      0.84      0.84      1502

Results were not so bad 😉 .
Nolearn is powerful but easy to implementation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s