Belated I am interested in deepchem that is an open-source deep learning toolkit for drug discovery. Deep-chem supported many features for chemoinformatics.
And one of interested feature is calculation of molecular graphs. It is more primitive than hashed finger print. I tried to caluclate it.
Currently the toolkit supports only linux, so I installed deepchem via docker.
The installation was very easy.
iwatobipen$ docker pull deepchemio/deepchem # wait a moment.... ;-) iwatobipen$ docker run -i -t deepchemio/deepchem iwatobipen$ pip install jupyter # following code is not necessary. iwatobipen$ apt-get install vim
That’s all.
Next, I worked in docker env.
import deepchem as dc from deepchem.feat import graph_features from rdkit import Chem convmol=graph_features.ConvMolFeaturizer() mol = Chem.MolFromSmiles('c1ccccc1') # convmol needs list of molecules fs = convmol.featurize( [mol] ) f = fs[ 0 ] # check method dir( f ) Out[41]: [ ..... 'agglomerate_mols', 'atom_features', 'canon_adj_list', 'deg_adj_lists', 'deg_block_indices', 'deg_id_list', 'deg_list', 'deg_slice', 'deg_start', 'get_adjacency_list', 'get_atom_features', 'get_atoms_with_deg', 'get_deg_adjacency_lists', 'get_deg_slice', 'get_null_mol', 'get_num_atoms', 'get_num_atoms_with_deg', 'max_deg', 'membership', 'min_deg', 'n_atoms', 'n_feat']
To get atom features, use ‘get_atom_features’
To get edge information, use ‘get_adjacency_list’
f.get_atom_features() Out[42]: array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]]) f.get_adjacency_list() Out[43]: [[1, 5], [0, 2], [1, 3], [2, 4], [3, 5], [4, 0]]
The array of atom feature means, carbon atom, degree is 2, SP2, and aromatic as one hot vector.
Next step, I will try to build model by using molecular graph.