access CHEMBL DB with python

CHEMBL is a one of big public database. It has lots of useful data.
If you are good at python, pychembldb will be good tool.

I like the python package.
And I found another package chembl “chembl_web_client”.
The package can install using pip.

iwatobipen$ sudo pip install chembl_web_client

Yah, it easy. OK Let’s get data from python !
For example following code get data about Ty-kinase inh c-Met.
For example, chembl_assay_id is 1003887.

from chembl_webresource_client import *
assay = AssayResource()
comp = CompoundResource()
target = TargetResource()

bio_act = assay.bioactivities("CHEMBL1003887")
print bio_act[0]
{u'activity_comment': u'Unspecified',
 u'assay_chemblid': u'CHEMBL1003887',
 u'assay_description': u'Inhibition of MET',
 u'assay_type': u'B',
 u'bioactivity_type': u'IC50',
 u'ingredient_cmpd_chemblid': u'CHEMBL509101',
 u'name_in_reference': u'2',
 u'operator': u'=',
 u'organism': u'Homo sapiens',
 u'parent_cmpd_chemblid': u'CHEMBL509101',
 u'reference': u'J. Med. Chem., (2008) 51:17:5330',
 u'target_chemblid': u'CHEMBL3717',
 u'target_confidence': 8,
 u'target_name': u'Hepatocyte growth factor receptor',
 u'units': u'nM',
 u'value': u'1.8'}

Result can get as python dict-type. If you can use pandas, dic can convert data frame.

import pandas as pd
df = pd.DataFrame(bio_act)
In [15]: df.head()
Out[15]: 
  activity_comment assay_chemblid  assay_description assay_type  \
0      Unspecified  CHEMBL1003887  Inhibition of MET          B   
1      Unspecified  CHEMBL1003887  Inhibition of MET          B   
2      Unspecified  CHEMBL1003887  Inhibition of MET          B   
3      Unspecified  CHEMBL1003887  Inhibition of MET          B   
4      Unspecified  CHEMBL1003887  Inhibition of MET          B   

  bioactivity_type ingredient_cmpd_chemblid name_in_reference operator  \
0             IC50             CHEMBL509101                 2        =   
1             IC50             CHEMBL459876                49        =   
2             IC50             CHEMBL459875                 4        =   
3             IC50             CHEMBL508403                47        =   
4             IC50             CHEMBL451789                40        =   

       organism parent_cmpd_chemblid                         reference  \
0  Homo sapiens         CHEMBL509101  J. Med. Chem., (2008) 51:17:5330   
1  Homo sapiens         CHEMBL459876  J. Med. Chem., (2008) 51:17:5330   
2  Homo sapiens         CHEMBL459875  J. Med. Chem., (2008) 51:17:5330   
3  Homo sapiens         CHEMBL508403  J. Med. Chem., (2008) 51:17:5330   
4  Homo sapiens         CHEMBL451789  J. Med. Chem., (2008) 51:17:5330   

  target_chemblid  target_confidence                        target_name units  \
0      CHEMBL3717                  8  Hepatocyte growth factor receptor    nM   
1      CHEMBL3717                  8  Hepatocyte growth factor receptor    nM   
2      CHEMBL3717                  8  Hepatocyte growth factor receptor    nM   
3      CHEMBL3717                  8  Hepatocyte growth factor receptor    nM   
4      CHEMBL3717                  8  Hepatocyte growth factor receptor    nM   

  value  
0   1.8  
1   1.8  
2   1.3  
3   470  
4  1400  

Also easy to get compound data.

cmp = comp.get( df.ingredient_cmpd_chemblid[0] )
c=comp.get(df.ingredient_cmpd_chemblid[0])
print c

{u'smiles': u'Fc1ccc(cc1)N2C=CC=C(C(=O)Nc3ccc(Oc4ccnc5[nH]ccc45)c(F)c3)C2=O', u'chemblId': u'CHEMBL509101', u'passesRuleOfThree': u'No', u'molecularWeight': 458.42, u'molecularFormula': u'C25H16F2N4O3', u'acdLogp': 1.92, u'stdInChiKey': u'OBSFXHDOLBYWRJ-UHFFFAOYSA-N', u'acdLogd': 1.91, u'knownDrug': u'No', u'medChemFriendly': u'No', u'rotatableBonds': 5, u'acdAcidicPka': 10.72, u'alogp': 3.52, u'numRo5Violations': 0, u'species': u'NEUTRAL', u'acdBasicPka': 5.46}

Easy to use. 😉

Make graph with python

Visualization of data is very important.
Some years ago, I was interested in cytoscape to visualize molecular network (a.k.a. similarity, MMP, etc.).
But it was difficult to integrate python script.
Today, I found cool python library “d3py”.
It can get from github ;-). “https://github.com/mikedewar/d3py
This library can make any graphs easy like ggplot, vincent and also can make network using networkx.
I found good example.
Code is following….

import d3py
import networkx as nx
G=nx.Graph()
G.add_edge(1,2)
G.add_edge(1,3)
G.add_edge(3,2)
G.add_edge(3,4)
G.add_edge(4,2)
p = d3py.NetworkXFigure(G, name="graph", width="200",height="200")
p = d3py.NetworkXFigure(G, name="graph", width=200,height=200)
p += d3py.ForceLayout()
p.css['.node'] = {'fill':'blue', 'stroke': 'magenta'
p.show()

type p.show() commnad, or run the script, local server was started and you can get intaractive graph.
graph

That’s nice.
So, How about make mmp network using this library and supply data for MedChem ?
Node measns each molecule, and edge means transform.It was very easy to coding.

I think it was very interesting, but for other chemist Hmm 😦 .