Make Drug central ER diagram with python #chemoinfo

Recently I knew useful database “DrugCentral“.
From About.
DrugCentral provides information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action. We monitor FDA, EMA, and PMDA for new drug approval on regular basis to ensure currency of the resource.

By using the site, user can search many information on web browser. And also the site provides posgresql dump file with all data of DrugCentral.
I had interest the data so I got dump file and use it.
Afte download the dump file, I made local db in my postgres env and install the db.

iwatobipen$ psql -U postgres
postgres=# create database drugcentral;
postgres=# \q
iwatobipen$ psql drugcental < drugcentral.dump.08292017.sql

OK now I made local drugcentral db.
Next, I would like to know the structure of the database, schema. I could not find the schema in the site but I found good library named "eralchemy".
ERAlchemy generates Entity Relation (ER) diagram (like the one below) from databases or from SQLAlchemy models. I installed the package via pip and made ER diagram. ;-)

iwatobipen$ pip install eralchemy
iwatobipen$ eralchemy -i 'postgresql+psycopg2://postgres@127.0.0.1:5432/drugcentral' -o er.pdf

Second code of above generates ER diagram as PDF format. Let's check it.
er
Good! ;-)

First extract smiles and DDI risk.

iwatobipen$ psql -U postgres -D drugcentral
drugcentral=# SELECT STRUCTURES.SMILES, DDI.DESCRIPTION, DDI.DDI_RISK FROM STRUCTURES, DDI WHERE STRUCTURES.ID = DDI.ID LIMIT 10;
                                                 smiles                                                 |                    description                     |  ddi_risk
--------------------------------------------------------------------------------------------------------+----------------------------------------------------+-------------
 OC[C@H]1N[C@H]([C@H](O)[C@@H]1O)C1=CNC2=C1NC=NC2=O                                                     | FLUCONAZOLE/OSPEMIFENE [VA Drug Interaction]       | Significant
 CO[C@@H]1[C@@H](C[C@H]2O[C@]1(C)N1C3=CC=CC=C3C3=C4CNC(=O)C4=C4C5=CC=CC=C5N2C4=C13)N(C)C(=O)C1=CC=CC=C1 | MERCAPTOPURINE/TOFACITINIB [VA Drug Interaction]   | Critical
 CCS(=O)(=O)N1CC(CC#N)(C1)N1C=C(C=N1)C1=NC=NC2=C1C=CN2                                                  | FLUDROCORTISONE/RISPERIDONE [VA Drug Interaction]  | Significant
 CNCC1=CC=C(C=C1)C1=C2CCNC(=O)C3=CC(F)=CC(N1)=C23                                                       | ARIPIPRAZOLE/HYDROCORTISONE [VA Drug Interaction]  | Significant
 OC(CNC1=CC=CC=N1)C1=CC=CC=C1                                                                           | CISAPRIDE/ZIPRASIDONE [VA Drug Interaction]        | Significant
 C#CC1=CC=CC(NC2=NC=NC3=C2C=C2OCCOCCOCCOC2=C3)=C1                                                       | ARIPIPRAZOLE/PHENYTOIN [VA Drug Interaction]       | Significant
 CN1CCN(CC1)C1=CC=C(NC2=NC3=C(SC=C3)C(OC3=CC=CC(NC(=O)C=C)=C3)=N2)C=C1                                  | ARIPIPRAZOLE/PREDNISOLONE [VA Drug Interaction]    | Significant
 OB1OCC2=CC(OC3=CC=C(C=C3)C#N)=CC=C12                                                                   | ARIPIPRAZOLE/FLUDROCORTISONE [VA Drug Interaction] | Significant
 CC1=CN(C=N1)C1=CC(NC(=O)C2=CC=C(C)C(NC3=NC=CC(=N3)C3=CN=CC=N3)=C2)=CC(=C1)C(F)(F)F                     | AMOBARBITAL/RISPERIDONE [VA Drug Interaction]      | Significant

Second extract smiles and mode of action.

drugcentral=# SELECT ST.SMILES, AC.ACTION_TYPE, AC.DESCRIPTION FROM STRUCTURES AS ST, ACTION_TYPE AS AC WHERE ST.ID = AC.ID LIMIT 10;
                                     smiles                                     |           action_type            |                                                                          description
--------------------------------------------------------------------------------+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------
 COC1=C2OC=CC2=CC2=C1OC(=O)C=C2                                                 | PHARMACOLOGICAL CHAPERONE        | Pharmaceutical chaperones may help stabilize the protein structure thereby restoring folding and/or preventing misfolding of the protein
 FC1=CNC(=O)NC1=O                                                               | MINIMUM INHIBITORY CONCENTRATION | The lowest concentration of an antimicrobial that will inhibit the visible growth of a microorganism
 CC(=O)OC[C@H]1O[C@H]([C@H](OC(C)=O)[C@@H]1OC(C)=O)N1N=CC(=O)NC1=O              | ANTIBODY BINDING                 | Antibody binding activity
 CCCCN1CCCC[C@H]1C(=O)NC1=C(C)C=CC=C1C                                          | ANTAGONIST                       | Binds to a receptor and prevents activation by an agonist through competing for the binding site
 COC(=O)C1=C(C)NC(C)=C([C@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OCCN(C)CC1=CC=CC=C1 | ANTISENSE INHIBITOR              | Prevents translation of a complementary mRNA sequence through binding to it
 CCOC(=O)C1=C(C)NC(C)=C([C@@H]1C1=CC(=CC=C1)[N+]([O-])=O)C(=O)OC                | BINDING AGENT                    | Binds to a substance such as a cell surface antigen, targetting a drug to that location, but not necessarily affecting the functioning of the substance itself
 C[C@@H](CCC1=CC=C(O)C=C1)NCCC1=CC=C(O)C(O)=C1                                  | MODULATOR                        | Effects the normal functioning of a protein in some way e.g., mixed agonist/antagonist or unclear whether action is positive or negative
 NC1=NC2=NC=C(CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O)=O)C(O)=O)N=C2C(N)=N1           | POSITIVE MODULATOR               | Positively effects the normal functioning of a protein e.g., receptor agonist, positive allosteric modulator, ion channel activator
 NCC1=CC=C(C=C1)C(O)=O                                                          | PROTEOLYTIC ENZYME               | Hydrolyses a protein substrate through enzymatic reaction
 OC(=O)CCCC1=CC=CC=C1                                                           | SUBSTRATE                        | Carried by a transporter, possibly competing with the natural substrate for transport
(10 rows)

This is very limited example of the DB. If reader who interested in the DB how about play and analyze with the DB? And ERAlchemy is very useful!!!
Enjoy!
* To use ERAlchemy with postgresql, you need to install psycopg2 at first.

Advertisements

RDKit / PostgresSQL

RDKitはPostgresSQLと連携させて構造情報を色々と扱うことができます。
ChemBL DBはPostgresSQL用のファイルが利用できますし、
ローカルにDBを作って作業すると、自分の幅が少し広がりそうです。
SQLなんかの勉強にもなる?
取りあえずやってみました。
まずpostgresql9.2.4をビルドしました。
マニュアルに従って、、、、

./configure
make
su
make install
adduser postgres
mkdir /usr/local/pgsql/data
chown postgres /usr/local/pgsql/data
su - postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
pg_ctl start

でOK
続いてRDKitの設定

cd $RDBASE/Code/PgSQL/rdkit
make
make install
make installcheck

でエラーが出なかったら成功。
途中で何個かwarningでましたが、一応自分の環境では8個のテスト全部パスしました。
ちなみに
RDKit2013_03_2
PostgreSQL 9.2.4
BOOST 1.53.0でビルドしました。
で実際にデーターベースを作ってみます。
先ほどのフォルダの直下にdataという1000件のsmilesが入ったファイルがあったのでこれを使ってみます。

cd ./data
createdb testdb
psql testdb

これでテスト用のDBを作りました。
続いてファイルを読み込みます。

psql testdb
testdb=# create table moltable (id integer primary key, smiles text);
testdb=# copy moltable from filedir/data;

copy xxx from yyy;でファイルを取り込みます。デフォルトの区切りはタブなのでそのまま。
続いてrdkitの機能を入れる(という表現がいいのか不明ですが)

testdb=# create extension if not exists rdkit;
testdb=# create shcema rdk;
testdb=# select * into mols from (select id,mol_from_smiles(smiles::cstring) m from moltable) tmp where m is not null;
testdb=# create index molidx on mols using gist(m);

でmolsテーブルができました。

testdb=# select * from mols limit 10;
   id    |                                     m                                      
---------+----------------------------------------------------------------------------
 6308684 | O=C1C=Cc2ccccc2/C1=N/Nc1nccs1
 6061070 | O=C1C=CC(=Nc2cc(Cl)c3c(c2[N+](=O)[O-])NON3)C=C1
 5485201 | Cl.O=C1CC[C@@]2(O)[C@H]3Cc4ccc(O)c5c4[C@@]2(CCN3CC2CC2)[C@H]1O5
 5307588 | O=C(C1CCN(S(=O)(=O)c2cccc3cccnc32)CC1)N1CCn2c3ccccc3nc21
 5243548 | CC1(C)CC(=O)C(=CNc2cccnc2NC=C2C(=O)CC(C)(C)CC2=O)C(=O)C1
 4209216 | O=C(NN=Cc1cccc2cccnc12)c1ccccc1O
 3746037 | C[N+]1=C2C=CC=CC=C2SC12C([N+](=O)[O-])=CC(=[N+]([O-])[O-])C=C2[N+](=O)[O-]
 3240530 | O=C(C1CCN(S(=O)(=O)c2cccc3nsnc32)CC1)N1CCn2c3ccccc3nc21
 3128208 | O=[N+]([O-])c1c2nonc2c(Cl)cc1Nc1cccc(O)c1
 2772102 | O=[N+]([O-])c1c2nonc2c(Cl)cc1Nc1ccccn1
(10 rows)

Time: 10.815 ms

次にプロパティの計算をしてみます。

testdb=# select id, mol_amw(m) as amw , mol_logp(m) as logp into props from mols;
SELECT 1000
Time: 1699.063 ms
testdb=# \dt
          List of relations
 Schema |   Name   | Type  |  Owner   
--------+----------+-------+----------
 public | data     | table | postgres
 public | fps      | table | postgres
 public | mols     | table | postgres
 public | moltable | table | postgres
 public | props    | table | postgres
(5 rows)

testdb=# select * from props limit 100;
    id    |   amw   |  logp   
----------+---------+---------
  6308684 | 255.302 |  2.5552
  6061070 | 306.665 |  2.6998
  5485201 | 377.868 |  1.9472
  5307588 | 461.547 |   3.032
  5243548 | 409.486 |  3.5896
  4209216 |  291.31 |  2.7043
  3746037 | 362.323 |  1.3066
  3240530 | 468.564 |  2.4885
  3128208 | 306.665 |  3.2336
  2772102 | 291.654 |   2.923
  2104451 | 339.298 |  4.0692
  2082098 | 304.396 | 3.40552
   972880 | 268.345 | 3.65212
   852940 | 257.318 |  2.1271
  5224323 | 345.402 |  3.6678
  3712088 | 402.473 |  4.8402
  3623511 | 329.743 |  3.6509
  2193022 | 363.156 |   4.669
   571783 | 273.723 |   2.889
   161167 |  284.34 |  3.0356
 16219722 | 529.611 |  2.5168
 16129550 | 726.194 |   3.201
  5719581 | 306.665 |  2.6998
Time: 1.419 ms

1000件の分子量とlogpの計算が2秒弱なので早いですね。(と思うんですがどうでしょう。)