Deep learning with python.

I’m interested in machine-learning. And python is good tool to do that for me.
Deep learning is one of the hot topic in these area.
There are some library for deep learning in python, “Theano”, “Pylearn2”.
But, these packages are difficult for me ;-( .
So, I used nolearn to do deep-learning
Nolearn is easy to install. If you want, you can install via pip.
Let’s write code.
At first, I got sampledata following link.
The dataset contains 4337 Structures with AMES Categorisation (mutagen/nonmutagen).
At first I converted mutagen/nonmutagen tag to 1/0.

from rdkit import Chem
mols = [ mol for mol in Chem.SDMolSupplier("cas_4337.sdf")  ]
writer = Chem.SDWriter("ames.sdf")
for mol in mols:
    if mol.GetProp("Ames test categorisation") == "mutagen":
        mol.SetProp("Act", str(1))
    writer.write( mol )

OK, Let’s Predict some molecules.
To calculate molecular descriptors I used rdkit.
1. Calc descriptors
2. Split train/test data sets
3.Build model and predict test set.

#Ames rprediction using DBN

import numpy as np
from rdkit import Chem
from rdkit.ML.Descriptors import MoleculeDescriptors
from rdkit.Chem import Descriptors

from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report

from nolearn.dbn import DBN

nms = [ x[0] for x in Descriptors._descList ]
calc = MoleculeDescriptors.MolecularDescriptorCalculator( nms )
# define descriptor calculator.
def calc_descs( mol ):
    res = calc.CalcDescriptors( mol )
    return res
# read molecules.
mols = [ mol for mol in Chem.SDMolSupplier("ames.sdf")  ]
descs = [ calc_descs( mol ) for mol in mols ]
acts = [ int(mol.GetProp("Act")) for mol in mols ]
# convert list to array and convert nan to 0. and scaling.

descs = scale(np.nan_to_num(np.asarray( descs )))
acts = np.asarray( acts )

# split data.
train_descs, test_descs,  train_acts, test_acts = train_test_split(descs, acts,

print train_descs.shape, train_acts.shape
print test_descs.shape, test_acts.shape

# define parameter
# increase epochs, you'll need long time.
dbn = DBN( [descs.shape[1], descs.shape[1]/3,2],
           learn_rates = 0.27,
           minibatch_size = train_descs.shape[0],
           verbose=0,), train_acts)

print(classification_report( test_acts, pred )

Let’s try!

iwatobipen-MacBook-Air:cas_4337 iwatobipen$ python
(3503, 196) (3503,)
(1502, 196) (1502,)
             precision    recall  f1-score   support

          0       0.84      0.81      0.82       669
          1       0.85      0.87      0.86       833

avg / total       0.84      0.84      0.84      1502

Results were not so bad 😉 .
Nolearn is powerful but easy to implementation.



以下に詳細を記入するか、アイコンをクリックしてログインしてください。 ロゴ アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中