タグ: chemoinfo

GET TID and PREF_NAME from CHEMBL

I want to retrieve relationship between TID and PREF_NAME in specific case from ChEMBL DB.
SQL query is following.

COPY (
  2         SELECT  DISTINCT TID, PREF_NAME FROM ACTIVITIES
  3                       JOIN ASSAYS USING (ASSAY_ID)
  4                       JOIN TARGET_DICTIONARY USING (TID)
  5                       WHERE STANDARD_TYPE = 'Ki'
  6                       AND STANDARD_VALUE IS NOT NULL
  7                       AND STANDARD_RELATION = '='
  8                        )
  9         TO '/path/td.csv'
 10         ( FORMAT CSV )

I ran the sql and got result like ….

1,Maltase-glucoamylase
3,Phosphodiesterase 5A
6,Dihydrofolate reductase
7,Dihydrofolate reductase
8,Tyrosine-protein kinase ABL
9,Epidermal growth factor receptor erbB1
11,Thrombin
12,Plasminogen
13,Beta-lactamase TEM
14,Adenosine deaminase
15,Carbonic anhydrase II
19,Estrogen receptor alpha
21,Neuraminidase
23,Plasma kallikrein
24,HMG-CoA reductase
25,Glucocorticoid receptor
28,Thymidylate synthase
30,Aldehyde dehydrogenase
35,Insulin receptor
36,Progesterone receptor
41,Alcohol dehydrogenase alpha c

🙂

 

Installing TensorFlow on Mac OX X with GPU support

Yesterday, I tried to install tensorflow-gpu on my mac.
My PC is MacBook Pro (Retina, 15-inch, Mid 2014). The PC has NVIDA GPU.
OS is Seirra.
Details are described in following URL.
https://www.tensorflow.org/install/install_mac

I installed tensorflow directly by using pip command.

 $ pip install --upgrade tensorflow-gpu  # for Python 2.7 and GPU #for python2
 $ pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU  #for python2

Almost done, but not finished yet.
To finish the installation, I need to disable System Integrity Protection (SIP).
To do that I need follow these steps.

Restart my Mac.
Before OS X starts up, hold down Command-R and keep it held down until you see an Apple icon and a progress bar. ...
From the Utilities menu, select Terminal.
At the prompt type exactly the following and then press Return: csrutil disable.

I tested following code.

import tensorflow as tf

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Runs the op.
print sess.run(c)

And the results seems tensorflow can use GPU.

iwatobipen$ python testcode.py
2017-06-13 22:24:28.952288: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952314: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952319: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:28.952323: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-13 22:24:29.469570: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] OS X does not support NUMA - returning NUMA node zero
2017-06-13 22:24:29.470683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.80GiB
2017-06-13 22:24:29.470713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-06-13 22:24:29.470720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y
2017-06-13 22:24:29.470731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0
2017-06-13 22:24:29.490805: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495363: I tensorflow/core/common_runtime/simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495384: I tensorflow/core/common_runtime/simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-06-13 22:24:29.495395: I tensorflow/core/common_runtime/simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

ref URL
https://github.com/tensorflow/tensorflow/issues/3723

Open drug discovery toolkit for python

Recently There are lots of python libraries for chemoinformatics and machine learning. One of my favorites is RDKit. 😉
These area is still active. And today I tried new library named “ODDT” open drug discovery toolkit.
Reference URL is
https://jcheminf.springeropen.com/articles/10.1186/s13321-015-0078-2.
ODDT is well documented in http://oddt.readthedocs.io/en/latest/index.html?highlight=InteractionFingerprint. ⭐️
Oddt is implemented shape and electronic similarities!! I have never known the open source library that implemented electronic similarity. And the library also implemented function that can detect protein ligand interaction.
So, I tried to use oddt.
First, calculation of some similarities codes are below.
To calculate electroshape, just use shape.electroshape method.

from oddt import toolkit
from oddt import shape
from oddt import fingerprints
from rdkit.Chem import Draw
mols = toolkit.readfile( 'sdf', 'cdk2.sdf' )
mols = [ m for m in mols ]
print(len( mols ))
[out] 47
e_shapes = [ shape.electroshape( mol ) for mol in mols ]
usrcats = [ shape.usr_cat( mol ) for mol in mols ]
usrs = [ shape.usr( mol ) for mol in mols ]

To calculate similarity, just use usr_similarity method.
Following.

for i in range( len( mols[ :5 ] ) ):
    for j in range( i ):
        e_sim = shape.usr_similarity( e_shapes[i], e_shapes[j] )
        usrcat_sim = shape.usr_similarity( usrcats[i], usrcats[j] )
        usr_sim = shape.usr_similarity( usrs[i], usrs[j])
        print( i, j, "e_shim", e_sim, 'usrcat_sim', usrcat_sim,'usr_sim',usr_sim )
1 0 e_shim 0.879372074943 usrcat_sim 0.742055515733 usr_sim 0.676152090576
2 0 e_shim 0.865690221755 usrcat_sim 0.428271350002 usr_sim 0.686898339111
2 1 e_shim 0.896725884564 usrcat_sim 0.481233989554 usr_sim 0.763231432529
3 0 e_shim 0.766813506629 usrcat_sim 0.609482600031 usr_sim 0.463058006246
3 1 e_shim 0.7349875959 usrcat_sim 0.548950403001 usr_sim 0.459194544856
3 2 e_shim 0.715411936912 usrcat_sim 0.360330544106 usr_sim 0.424537194619
4 0 e_shim 0.810683079155 usrcat_sim 0.62174869307 usr_sim 0.61705827303
4 1 e_shim 0.774077718141 usrcat_sim 0.635441642096 usr_sim 0.694498992613
4 2 e_shim 0.755174336047 usrcat_sim 0.394074936141 usr_sim 0.618174238781
4 3 e_shim 0.931446873697 usrcat_sim 0.780733001638 usr_sim 0.562721912484

OK, next check protein-ligand contact. To do that I prepare protein and ligand file from pdb.
And the read each files and preform calculation.

from oddt import interactions
pdb1 = next(toolkit.readfile( 'pdb', '1atp_apo.pdb'))
pdb1.protein = True
ligand = next( toolkit.readfile('sdf', 'atp.sdf'))
proteinatoms, ligandatoms, strict=interactions.hbonds( pdb1, ligand )
proteinatoms['resname']
[out]
array(['GLU', 'GLU', 'GLU', 'GLU', 'HOH', 'HOH', 'ARG', 'VAL', 'SER',
       'ALA', 'HOH', 'PHE', 'GLY', 'LYS', 'HOH', 'HOH', 'THR'], 
      dtype='<U3')

ODDT also can calculate protein-ligand interaction fingerprint.

IFP = fingerprints.InteractionFingerprint( ligand, pdb1)
print( IFP )
[out] array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)

I think oddt is very nice toolkit for chemoinformatics.
I uploaded my code on my github repo.

https://github.com/iwatobipen/oddt_test

Graph convolution classification with deepchem

I posted about graph convolution regression using deepchem. And today, I tried graph convolution classification using deepchem.
Code is almost same as regression model. The only a difference point is use dc.models.MultitaskGraphClassifier instead of dc.models.MultitaskGraphRegressor.
I got sample ( JAK3 inhibitor ) data from chembl and tried to make model.

At first I used pandas to convert activity class ( active, non active )to 0,1 bit. Easy to do it.

import panda as pd
import pandas as pd
df = pd.read_table('jak3_chembl.txt', header=0)
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )
pd.factorize( df.ACTIVITY_COMMENT )
len(pd.factorize( df.ACTIVITY_COMMENT ))
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )[0]

df.to_csv('./preprocessed_jak3.csv', index=False)

Next wrote model and test it.

import tensorflow as tf
import deepchem as dc
import numpy as np
import pandas as pd

graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['activity_class'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer )
dataset = loader.featurize( './preprocessed_jak3.csv' )

splitter = dc.splits.splitters.RandomSplitter()
trainset,testset = splitter.train_test_split( dataset )

hp = dc.molnet.preset_hyper_parameters
param = hp.hps[ 'graphconv' ]
print(param['batch_size'])
g = tf.Graph()
graph_model = dc.nn.SequentialGraph( 75 )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphGather( 10 , activation='tanh'))

with tf.Session() as sess:
    model_graphconv = dc.models.MultitaskGraphClassifier( graph_model,
                                                      1,
                                                      75,
                                                     batch_size=10,
                                                     learning_rate = param['learning_rate'],
                                                     optimizer_type = 'adam',
                                                     beta1=.9,beta2=.999)
    model_graphconv.fit( trainset, nb_epoch=50 )

train_scores = {}
#regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean )
classification_metric = dc.metrics.Metric( dc.metrics.roc_auc_score, np.mean )
train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ classification_metric ]  )
p=model_graphconv.predict( testset )

for i in range( len(p )):
    print( p[i], testset.y[i] )

print(train_scores) 

root@08d8f729f78b:/deepchem/pen_test# python graphconv_jak3.py

And datalog file is….

Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./preprocessed_jak3.csv
Loading shard 1 of size 8192.
Featurizing sample 0
TIMING: featurizing shard 0 took 2.023 s
TIMING: dataset construction took 3.830 s
Loading dataset from disk.
TIMING: dataset construction took 2.263 s
Loading dataset from disk.
TIMING: dataset construction took 1.147 s
Loading dataset from disk.
50
Training for 50 epochs
Starting epoch 0
On batch 0
...............
On batch 0
On batch 50
computed_metrics: [0.97176380945032259]
{'graphconvreg': {'mean-roc_auc_score': 0.97176380945032259}}

Not so bad.
Classification model gives better result than regression model.
All code is pushed my github repository.
https://github.com/iwatobipen/deeplearning

Graph convolution regression with deepchem

Somedays ago, I posted blog about deepchem. I am still playing with deepchem. Today I tried to use graph convolution regression model.
Deepchem provided Graph convolution Regressor. Cool.
I used solubility data provided from AstraZeneca. https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL3301364
My test code is following. Almost same as deepchem”s example code.
CSVLoader method is very useful because it can not only read data but also calculate graph feature of each molecule.
Next, Define of Graph convolution network.

import tensorflow as tf
import deepchem as dc
import numpy as np
graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['LogS'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer )
dataset = loader.featurize( './bioactivity.csv' )

splitter = dc.splits.splitters.RandomSplitter()
trainset,testset = splitter.train_test_split( dataset )

hp = dc.molnet.preset_hyper_parameters
param = hp.hps[ 'graphconvreg' ]
print(param['batch_size'])
g = tf.Graph()
graph_model = dc.nn.SequentialGraph( 75 )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
#graph_model.add( dc.nn.GraphGather(param['batch_size'], activation='tanh'))
graph_model.add( dc.nn.GraphGather( 10 , activation='tanh'))

with tf.Session() as sess:
    model_graphconv = dc.models.MultitaskGraphRegressor( graph_model,
                                                      1,
                                                      75,
                                                     batch_size=10,
                                                     learning_rate = param['learning_rate'],
                                                     optimizer_type = 'adam',
                                                     beta1=.9,beta2=.999)
    model_graphconv.fit( trainset, nb_epoch=30 )

train_scores = {}
regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean )
train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ regression_metric ]  )
p=model_graphconv.predict( testset )

print(train_scores) 

Next run the code.

root@08d8f729f78b:/deepchem/pen_test# python graphconv_test.py > datalog

And datalog file is….

Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./bioactivity.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
...
Starting epoch 29
On batch 0
On batch 50
On batch 100
computed_metrics: [0.52744994044080606]
{'graphconvreg': {'mean-pearson_r2_score': 0.52744994044080606}}

r2 score is still row, but I think it can improve by change of nb_epochs.

All sample code was uploaded to github.
https://github.com/iwatobipen/deeplearning/blob/master/datalog

integration of spotfire and pdb viewer

Some years ago, I heard a presentation about implementation of pdb viewer in spotfire in JCUP. It was really impressive for me because spotfire can not handle PDB files. You know, spotfire is one of the popular tool for data visualization. I like the tool.

Recently I found unique library for spotfire named ‘JSViz’. The library is not native library but user can get it from community site. To use JSViz, spotfire can communicate JS library such as D3.js, highcharts.js etc. 😉

Lots of examples are provided from the site.
I thought “Hmm… If there is pdb viewer written in javascript, I can implement pdb viewer in spotfire”.

So, I tried it.
Install jsviz at first.
And then wrote pdb_loader_script using template. I used pv.js for PDB loader.
JSViz gets data from spotfire as sfdata. sfdata is JSON format. If reader who needs more details for the data structure, I recommend read original document. ( or comment the post )
My data format is following.
#, pdb_id, ligandname
1,1ATP, ANP

And used pdb_id and ligandname for sfdata.
My strategy is….
1. Build external pdb supply server. ( simple http server written in python )
2. Access the url and get pdb file from the server and render it ( using jsviz ).

Following code is JSViz sample code. The code render protein as cartoon and ligand as ball and stick.

/*
 Copyright (c) 2016 TIBCO Software Inc

 THIS SOFTWARE IS PROVIDED BY TIBCO SOFTWARE INC. ''AS IS'' AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
 SHALL TIBCO SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

//////////////////////////////////////////////////////////////////////////////
// #region Drawing Code

var pv = require("bio-pv");

//
//
// Main Drawing Method
//

function renderCore(sfdata)
{
    if (resizing) {
        return;
    }

    // Log entering renderCore
    log ( "Entering renderCore" );

    // Extract the columns
    var columns = sfdata.columns;
    // Extract the data array section
	var chartdata = sfdata.data;

    // count the marked rows in the data set, needed later for marking rendering logic
    var markedRows = 0;
    for (var i = 0; i < chartdata.length; i++) {
        if (chartdata[i].hints.marked) {
            markedRows = markedRows + 1;
        }
    }
    var width = window.innerWidth;
    var height = window.innerHeight;

    //
    // Replace the following code with actual Visualization code
    // This code just displays a summary of the data passed in to renderCore
    //
    //displayWelcomeMessage ( document.getElementById ( "viewer" ), sfdata );

    displaypdb(document.getElementById('js_chart'), chartdata);
    wait ( sfdata.wait, sfdata.static );
};

//
// #endregion Drawing Code
//////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////
// #region Marking Code
//

//
// This method receives the marking mode and marking rectangle coordinates
// on mouse-up when drawing a marking rectangle
//
function markModel(markMode, rectangle)
{
	// Implementation of logic to call markIndices or markIndices2 goes here
}

//
// Legacy mark function 2014 HF2
//
function mark(event)
{
}

//
// #endregion Marking Code
//////////////////////////////////////////////////////////////////////////////

//////////////////////////////////////////////////////////////////////////////
// #region Resizing Code
//

var resizing = false;

window.onresize = function (event) {
    resizing = true;
    if ($("#js_chart")) {
    }
    resizing = false;
};

//
// #endregion Resizing Code
//////////////////////////////////////////////////////////////////////////////

//
// This is a sample visualization that indicates that JSViz is installed
// and configured correctly.  It is an example of how to draw standard
// HTML objects based on the data sent from JSViz.
//

function displaypdb( div, chartdata ){
	var html;
	div.innerHTML = "
<div id='viewer'>pdb</div>
";
    var options = {
	  background: 'lightgrey',
      width: 800,
      height: 600,
      antialias: true,
       quality : 'medium'
       };
    // insert the viewer under the Dom element with id 'gl'.
    var viewer = pv.Viewer(document.getElementById('viewer'), options);
	var pdb_id = ( chartdata[0].items[0] );
	var ligand_name = ( chartdata[0].items[1] );
	var url = 'http://localhost:9000/'+pdb_id+'.pdb'
	$.ajax( url )
    .done(function(data) {
    var structure = pv.io.pdb(data);
	var ligand = structure.select({rnames : [ ligand_name ]});
	viewer.ballsAndSticks('ligand', ligand);

	viewer.cartoon('protein', structure, { color : color.ssSuccession() });
	viewer.centerOn(structure);
});
};

Go next. Following code is simple HTTP server. In the same folder, place pdbfiles for supply.

import os
import sys
import http.server
import socketserver
PORT = 9000
class HTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
    def end_headers(self):
        self.send_header('Access-Control-Allow-Origin', '*')
        http.server.SimpleHTTPRequestHandler.end_headers(self)

def server(port):
    httpd = socketserver.TCPServer(('', port), HTTPRequestHandler)
    return httpd

if __name__ == "__main__":
    port = PORT
    httpd = server(port)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        print("\n...shutting down http server")
        httpd.shutdown()
        sys.exit()

This is very brief introduction. Also to use JSViz, it can get user event like a clicking the ligand, residue etc….
It seems very interesting. But do I need to develop new visualization in spotfire ? ;-p

ref
https://community.tibco.com/wiki/javascript-visualization-framework-jsviz-and-tibco-spotfire
https://biasmv.github.io/pv/

how to get molecular graph features

Belated I am interested in deepchem that is an open-source deep learning toolkit for drug discovery. Deep-chem supported many features for chemoinformatics.
And one of interested feature is calculation of molecular graphs. It is more primitive than hashed finger print. I tried to caluclate it.

Currently the toolkit supports only linux, so I installed deepchem via docker.
The installation was very easy.

iwatobipen$ docker pull deepchemio/deepchem
# wait a moment.... 😉
iwatobipen$ docker run -i -t deepchemio/deepchem
iwatobipen$ pip install jupyter
# following code is not necessary.
iwatobipen$ apt-get install vim

That’s all.
Next, I worked in docker env.

import deepchem as dc
from deepchem.feat import graph_features
from rdkit import Chem
convmol=graph_features.ConvMolFeaturizer()
mol = Chem.MolFromSmiles('c1ccccc1')
# convmol needs list of molecules
fs = convmol.featurize( [mol] )
f = fs[ 0 ]
# check method
dir( f )
Out[41]:
[ .....
 'agglomerate_mols',
 'atom_features',
 'canon_adj_list',
 'deg_adj_lists',
 'deg_block_indices',
 'deg_id_list',
 'deg_list',
 'deg_slice',
 'deg_start',
 'get_adjacency_list',
 'get_atom_features',
 'get_atoms_with_deg',
 'get_deg_adjacency_lists',
 'get_deg_slice',
 'get_null_mol',
 'get_num_atoms',
 'get_num_atoms_with_deg',
 'max_deg',
 'membership',
 'min_deg',
 'n_atoms',
 'n_feat']

To get atom features, use ‘get_atom_features’
To get edge information, use ‘get_adjacency_list’

f.get_atom_features()
Out[42]:
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 1, 0, 0, 0, 1]])
f.get_adjacency_list()
Out[43]: [[1, 5], [0, 2], [1, 3], [2, 4], [3, 5], [4, 0]]

The array of atom feature means, carbon atom, degree is 2, SP2, and aromatic as one hot vector.

Next step, I will try to build model by using molecular graph.