Quantum chemistry calculation with python.

In this weekend, I played molecular design toolkit.
https://bionano.autodesk.com/MolecularDesignToolkit/
This is very nice open source tool kit for pythonist I think. At first, I tried to install the TK in OSx directly, but I had some trouble to run the code. So I installed my linux virtual PC. It is not so difficult to install molecular design TK in linux environment.
At frist mdt can install by using pip.

pip install moldesign

Easy!
And next, I installed some packages by using conda.
I recommend to install pyscf ( quantum chemistry package for python ) by using conda. Because I had some troubles when I build pyscf from source code.

conda install -c moldesign nbmolviz 
conda install -c conda-forge widgetsnbextension
conda install -c pyqc pyscf 
conda install -c omnia openmm
conda install -c omnia pdbfixer 

And also I installed openbabel by apt-get.

apt-get install openbabel python-openbabel

OK, Now ready!
Let’s start coding.
Today, I tried to simple example.
Read string, generate 3D, calculate orbitals and visualize it.
Following code is running on jupyter notebook. 😉

import moldesign as mdt
import moldesign.units as u
import pybel
# read molecule from smiles and generate 3D conf and save sdf.
mol = pybel.readstring( "smi","C1=NC2=C(N1)C(=NC=N2)N" )
mol.addh()
mol.make3D()
mol.write("sdf",'adenine.sdf')

Read and draw

mol=mdt.read("adenine.sdf")
mol.draw()


Draw function draws 2D and 3D structure.

Next, calculate energy and draw molecular orbital.

mol.set_energy_model( mdt.models.RHF, basis='sto-3g')
prop = mol.calculate()
print( prop.keys() )
print( "Energy: ", prop['potential_energy'])
['positions', 'mulliken', 'wfn', 'potential_energy', 'dipole_moment']
('Energy: ', <Quantity(-12479.0741253, 'eV')>)
mol.draw_orbitals()


Works fine. draw_orbitals() can draw some orbitals like HOMO, LUMO, and any other energy level orbitals.

Finally, minimize it.
And draw orbitals.

mintraj = mol.minimize()

Starting geometry optimization: built-in gradient descent
Starting geometry optimization: SciPy/bfgs with analytical gradients
Step 2/20, ΔE=-1.858e-01 eV, RMS ∇E=4.161e-01, max ∇E=1.305e+00 eV / ang
Step 4/20, ΔE=-2.445e-01 eV, RMS ∇E=1.818e-01, max ∇E=6.069e-01 eV / ang
Step 6/20, ΔE=-2.589e-01 eV, RMS ∇E=1.359e-01, max ∇E=4.905e-01 eV / ang
Step 8/20, ΔE=-2.620e-01 eV, RMS ∇E=1.250e-01, max ∇E=5.032e-01 eV / ang
Step 10/20, ΔE=-2.660e-01 eV, RMS ∇E=1.264e-01, max ∇E=3.384e-01 eV / ang
Step 12/20, ΔE=-2.751e-01 eV, RMS ∇E=1.125e-01, max ∇E=2.966e-01 eV / ang
Step 14/20, ΔE=-2.915e-01 eV, RMS ∇E=2.315e-01, max ∇E=5.801e-01 eV / ang
Step 16/20, ΔE=-2.942e-01 eV, RMS ∇E=2.492e-01, max ∇E=6.325e-01 eV / ang
Step 18/20, ΔE=-2.978e-01 eV, RMS ∇E=2.712e-01, max ∇E=7.771e-01 eV / ang
Step 20/20, ΔE=-3.016e-01 eV, RMS ∇E=2.639e-01, max ∇E=7.127e-01 eV / ang
Warning: Maximum number of iterations has been exceeded.
         Current function value: -12479.375700
         Iterations: 19
         Function evaluations: 26
         Gradient evaluations: 26
Reduced energy from -12479.0741253 eV to -12479.3757001 eV
mintraj.draw_orbitals()


mintraj object has each steps energy state. And it can be shown as movie.

Also lots of functions are implemented in molecular design tool kit.
I will play with package more and more. 😉
Today’s code was pushed my repository.
https://github.com/iwatobipen/moldesigntk/blob/master/test1.ipynb

Advertisements

Fragmentation of molecules.

I often use molecular fragmentation for getting small drug like fragments from any data sources.
RECAP, BRICS, or another algorithms are used to do it.
RDKit is suite tool to do that. I often use the toolkit.
Also I was interested in a tool that named “molBLOCKS” and tried to use it.
If reader who is interested in it, more details are described in this paper.
You can get source code from following url.
http://compbio.cs.princeton.edu/molblocks/
And install is easy.

iwatobipen$ wget http://compbio.cs.princeton.edu/molblocks/molblocks.tar.gz
iwatobipen$ tar xzvf molblocks.tar.gz
iwatobipen$ cd molbcloks
iwatobipen$ make

That’s all.
===========================================
molBLOCKS depend on openbabel.
So, I installed openbabel before installing molBLOCKS.
===========================================
Ready to fragmentation.
I tested molBLOCKS using drugbank data set.

Over 6000 mols set.
And structure data is provided as smiles

iwatobipen$ wc drugbank/all_drugs.txt 
    6813   10647  379523 drugbank/all_drugs.txt
iwatobipen$ head -n 5 drugbank/all_drugs.txt 
O=C(N1[C@@H](CCC1)C(=O)NNC(=O)N)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H]1NC(=O)CC1)Cc1[nH]cnc1)Cc1c2c([nH]c1)cccc2)CO)Cc1ccc(O)cc1)COC(C)(C)C)CC(C)C)CCCN=C(N)N	[NO NAME]
NC(=O)CNC(=O)[C@@H](NC(=O)[C@@H]1CCCN1C(=O)[C@H]1NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@@H](NC(=O)CCSSC1)Cc1ccc(cc1)O)CC(=O)N)CCCNC(=N)N	[NO NAME]
c12c(cccc1)ccc(c2)C[C@@H](NC(=O)C)C(=O)N[C@@H](C(=O)N[C@H](Cc1cccnc1)C(=O)N[C@H](C(=O)N[C@@H](Cc1ccc(cc1)O)C(=O)N[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(=O)N1[C@H](C(=O)N[C@@H](C(=O)N)C)CCC1)CCCNC(=N)N)CCCNC(=O)N)CO)Cc1ccc(cc1)Cl	[NO NAME]
C/C=C/CC(C)C(C1C(=NC(CC)C(=O)N(C)CC(=O)N(C)C(CC(C)C)C(=NC(C(C)C)C(=O)N(C)C(CC(C)C)C(=NC(C)C(=NC(C)C(=O)N(C)C(CC(C)C)C(=O)N(C)C(CC(C)C)C(=O)N(C)C(C(C)C)C(=O)N1C)O)O)O)O)O	
NC(=O)CNC(=O)[C@@H](NC(=O)[C@@H]1CCCN1C(=O)[C@H]1NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)N)NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@@H](NC(=O)[C@H](CSSC1)N)Cc1ccccc1)CC(=O)N)CCCCN	

Here we go!
I ran fragment command with following option.
-i input file / drugbank/all_drugs.txt
-r rules.txt / RECAP.txt
-n min. number of atoms in fragment / 4
-o out put / drugbank/fragment_drugs.txt

iwatobipen$ time ./fragment -i drugbank/all_drugs.txt -r ./RECAP.txt -n 4 -o drugbank/fragment_drugs.txt 
[==================================================] 100%     


real	0m17.283s
user	0m17.184s
sys	0m0.081s

iwatobipen$ head -n 5 drugbank/fragment_drugs.txt 
CC(O)(C)C.NC(=O)NNC(=O)[C@@H]1CCCN1.O=C[C@@H]1CCC(=O)N1.O=C[C@H](Cc1c[nH]c2c1cccc2)N.N[C@H](C=O)Cc1cnc[nH]1.O=C[C@H](Cc1ccc(cc1)O)N.O=C[C@H](CC(C)C)N.O=C[C@H](CCCN=C(N)N)N.N[C@H](C=O)CO.C[C@H](C=O)N	[NO
O=C[C@@H]1CSSCCC(=O)N[C@@H](Cc2ccc(cc2)O)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CC(=O)N)CCC(=O)N)Cc1ccccc1.O=C[C@@H]1CCCN1.O=C[C@H](CCCNC(=N)N)N.NCC(=O)N	[NO
NC(=O)[C@H](N)C.O=C[C@@H](Cc1ccc2c(c1)cccc2)NC(=O)C.O=C[C@@H]1CCCN1.O=C[C@@H](Cc1cccnc1)N.O=C[C@H](Cc1ccc(cc1)O)N.O=C[C@@H](Cc1ccc(cc1)Cl)N.O=C[C@H](CC(C)C)N.O=C[C@H](CCCNC(=N)N)N.O=C[C@@H](CCCNC(=O)N)N.N[C@H](C=O)CO	[NO
C/C=C/CC(C(C1C(=NC(CC)C(=O)N(C)CC(=O)N(C)C(CC(C)C)C(=NC(C(C)C)C(=O)N(C)C(CC(C)C)C(=NC(C(=NC(C(=O)N(C(C(=O)N(C(C(=O)N(C(C(=O)N1C)C(C)C)C)CC(C)C)C)CC(C)C)C)C)O)C)O)O)O)O)C
O=C[C@@H]1CSSC[C@H](N)C(=O)N[C@@H](Cc2ccccc2)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)CC(=O)N)CCC(=O)N)Cc1ccccc1.O=C[C@@H]1CCCN1.NCC(=O)N.NCCCC[C@@H](C=O)N

I got period separated data set.
This program also can analyse frequency of fragments.
Like this.

iwatobipen$ time ./analyze -i drugbank/all_drug_frag.txt -o drugbank/all_drug_analyse.txt

Storing the main fragment set

[==================================================] 100%     


real	0m0.348s
user	0m0.307s
sys	0m0.023s
iwatobipen$ head -n 5 drugbank/all_drug_analyse.txt 
208	c1ccccc1	177 274 349 166548 384 483 678 712 713 796 805 812 872 991 1029 1075 1138 1139 1148 1167 1263 1342 1349 1424 1429 1435 1580 4575 4842 65781 71388 5335 2200 [NO [NO [NO 1495 580 3585 4817 4826 11047 3516 4468 4468 4468 4468 4468 220 110221-27-7.mol TNL L10 A45 BFL BIR HTQ IOE 669 INV BPG LY2 PFE IOF NFP BPY R18 OPA 696 BSI TSX PF3 FXN S58 EQU DFA IPC PTU ISF D1L LVA 132 CL3 977 KMB FTA 656 2AN 580 201 824 BIF HYQ UHD 2361 11F 11X 130 199 19A 251 2A6 2PD 2SC 37A 3DE 509 521 53R 547 5B2 5MS 642 6C3 6NH 6SC 783 839 979 A51 AAX AAZ ARH B08 B28 B2Y B70 B76 BCE BDL BHF BP4 BP7 BPS BWP BWP C00 C4E CMF CP7 CY0 D25 D26 DF1 DF2 DFW DFY DFZ DMZ FL9 FRZ G14 HA3 HDY HS4 IHI IOK J88 JNH JT5 JT6 KJ2 KOM KSF KSL L5G LJ3 LKG LRG LZL MD7 MMG N4D NN3 OA4 P19 P29 P44 P45 P49 P4T P55 P63 PB2 PFD PFQ PIQ PXB QPP RBS RF2 RIV SCJ SCQ SCW SCZ T1D TF5 TFG TFK TR1 TUO VII VRV VX3 X98 ZY6 ZZA 1548992
162	Nc1ncnc2c1nc[nH]2	118 131 170 32326 464205 640 718 2703 65781 157 1892 4468 929 537 5RM 2HC [NO [NO I84 SUB NLA NLA LI6 LI6 DFN DFN DMA PIL 186 BEK 446202 FL8 FL8 FSP INT INT A45 WAC WAC SU2 FR2 HTQ OFL OFL HBO DST ROI 669 669 669 669 6JZ 3IP KTP PHN BYS BPG FOH CP5 FLV FLV 655 TQT ISA 113 207 LHY FIL Structure NU1 EMT PVB 772 PLO CAH NBF CXM FCD CK4 K21 K21 6IN R18 B1V CXA 4PN 7I2 219 TRR BH7 696 696 IH5 PFA 5DE 5DE AHC AL8 SN2 P28 INQ TSX TSX U55 U55 46936647 LIH LIH TYI PPT I06 AI1 NBL NBL S58 S58 AL9 AL9 BRN BRN MBP MXA PTU TPI ISF BLN LVA F6B PGX 616 PU4 PU4 PU4 PY1 SDS MF2 CB4 194 HC4 DEB AL3 CK3 HUX CL3 QUE LUM LUM PHC SG1 MBS MBS 2AN TCO BTQ 103 3DH 5F1 6IA AD5 B32 EH9 ZIP
98	Cc1ccccc1	214 274 298 425 482 524 612 622 678 692 743 796 843 863 925 966 1029 1116 1244 1349 1626 4842 5105 [NO 4833 11047 4468 4468 4468 TNL 16A WSK PP1 INT INT INT A45 OLO CP8 4BT 669 3IP INV 8IN TAQ 6IN 1PB IOA OPA I3N SN2 TSX 2BF DFA SCL PTU BBH AFI 580 008 201 047 22M 23M 2SK 4BR 53S 575 5SC 812 8AP 8CA 910 A17 B34 CP9 E20 F1H G6A GK3 GK4 GN8 II4 IX1 J67 JEN L05 L13 LJ1 LJ2 LZQ MI2 OST PB2 SC3 SP6 SPB TR1
82	Oc1ccccc1	251 481 573 600 887 925 944 1010 1025 1149 1400 1608 2703 4842 [NO [NO 3585 [NO GEN 929 WSK KMP DZN 5353306 IOE 122 124 BPY FRM GP8 DTQ HQQ 9HP 340 132 LIM 194 MAX CK6 AB3 87Y CA2 134 166 1HP 244 342 397 3B9 3GV 4MR 4NA 555 5PP 608 694 697 7MR 7NH 859 92G ABJ HMDB02124.mol ALH B65 BI5 C00 EES EI1 FMX IHX INK LJ4 MSI MSR N20 OA5 OBP PDZ POT RS2 ZTW
66	c1cccnc1	235 51-83-2 469 619 705 792 817 976 1072 1427 4842 [NO 5282204 25295 [NO 4819 4468 FYX 446202 IPO MNX FPH Structure NHB PF3 LIG PRC WAI PY1 D4M M1N 255 2EA 2PY 325 6EA 7PY 855 896 8IP AAX AAZ D13 D1G D2G D3G GAX GG5 HM2 L20 LL1 MOJ MUH PF8 PF9 PPX RC8 SB2 SB6 SDZ SM5 SS3 SS4 TAK XMJ XMK

Format of out put is separated following data; frequency of fragment, fragment, molecules ids.

There are additional options in this program.
Also nice document is provided developer’s site.
http://compbio.cs.princeton.edu/molblocks/

align-itを使ってみた

以前からファーマコフォアベースで分子をアライメントするツールをささっとかけると
良いな〜って思ってました。
社内リソースを使おうと思うと、ライセンスの限りがあるのでオンデマンドに解析できなかったりするためです。
RDKitでやっている事例があったのでそれを参考に書いてみたのですが、途中でだんだん複雑になってしまい、ちょっとそのまま放置しています。
そこで、もう少し簡単にということでalign-itを使ってみました。
これは前にも紹介したsilicos-itから公開されているオープンソースです。
Openbabelがカバーするフォーマットを入出力に使えるので、特定のフォーマットに変換するという作業から解放されてよいと思います。
インストールはマニュアルに従って

> cd /usr/local/src
> sudo tar -xvf ~/Downloads/align-it-1.0.1.tar.gz
> cd align-it-1.0.1
> sudo mkdir build
> cd build
> sudo cmake ..
> sudo make
> sudo make install

でよいと思います。ubuntuにパッケージマネージャーでopenbabel入れてある状態でやると
openbabel2 無い。とエラーがでました。どうもパッケージマネージャーではな入らないみたいですのでこの場合は、自分でopenbabelを最初にビルドしてやるといいかと思います。

続いてテスト用のファイルの準備です。
今回は面倒なんでpubchemでEGFR阻害剤のデータ21化合物をとってきました。
二個目の分子をリファレンス(ref.sdf)としてアライメントします。

$ align-it -r ref.sdf -d pubmed_egfr.sdf -p out.phar -o out.sdf -s score.tab --rankBy TANIMOTO
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  Align-it v1.0.1 | Apr  1 2012 11:40:30

  -> GCC:         4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)
  -> Open Babel:  2.3.1

  Copyright 2012 by Silicos-it, a division of Imacosi bvba

  Align-it is free software: you can redistribute it and/or modify
  it under the terms of the GNU Lesser General Public License as published
  by the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  Align-it is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU Lesser General Public License for more details.

  Align-it is linked against OpenBabel version 2.
  OpenBabel is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation version 2 of the License.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

COMMAND_LINE OPTIONS:

  -> Reference file:    ref.sdf
  -> Reference type:    MOL
  -> Database file:     pubmed_egfr.sdf
  -> Database type:     MOL
  -> Mol output file:   out.sdf
  -> Pharm output file: out.phar
  -> Scores file:       score.tab
  -> Cutoff:            no
  -> Best hits:         no
  -> Rank by:           0
  -> Functional groups: AROM HDON HACC LIPO NEGC POSC HYBH HYBL
  -> Hybrids:           yes
  -> Epsilon:           0.5
  -> Merge pharm:       no
  -> Include normals:   yes
  -> With exclusion:    no
  -> Scores only:       no
  -> Quied mode:        no

Reference pharmacophore 5328042
   number of points:            8
   number of exclusion spheres: 0
   totalvolume:                 192.851
..
Processed 21 molecules
21 alignments in 0.849194 seconds (24.7293 alignments per second).

結果

$ more out.phar
NAME
HYBL    -1.24093        1.32284 0.407397        0.7     0       -0.120027       -0.867182       0.289608
HYBL    -3.03966        -0.157844       -0.170924       0.7     0       -0.120027       -0.867182       0.289608
HDON    0.564533        -0.760158       0.474673        1       1       0.236509        -1.68345        0.274832
HDON    -3.99885        -2.74223        -0.736352       1       1       -4.34047        -3.66037        -0.937119
HYBL    -1.01379        2.35455 0.601621        0.7     0       -0.120027       -0.867182       0.289608
HYBL    -3.67946        0.960047        -0.140495       0.7     0       -0.120027       -0.867182       0.289608
$$$$
NAME
HYBL    -1.23497        1.32782 0.409317        0.7     0       0       0       -1.11022e-16
HYBL    -3.03463        -0.151717       -0.169017       0.7     0       0       -1.66533e-16    -2.77556e-17
HYBL    3.35087 -0.639483       0.171317        0.7     0       0       0       0
HDON    0.5692  -0.7563 0.4762  1       1       0.208101        -1.67633        0.628354
HDON    -3.9954 -2.7354 -0.7347 1       1       -4.33756        -3.65332        -0.935583
HYBL    -1.04641        2.3288  0.59128 0.7     0       2.22045e-16     -4.44089e-16    -1.11022e-16
HYBL    -3.66258        0.968428        -0.135842       0.7     0       -4.44089e-16    0       2.77556e-17
HYBL    4.39451 -0.954693       0.404542        0.7     0       8.88178e-16     0       -5.55112e-17
$$$$

.pharにはフィーチャー ファーマコフォアの三次元座標 ガウシアン体積のα値 方向性があるかないかのブール値 そのベクトル
のタブ区切りがかえってきます。このデータって何で見れるんだろう???

またオプションの -sはスコアを返すので

lion:pubmed_egf iwatobipen $ cat score.tab
5328042	192.851	5328028	165.959	138.741	0	138.741	6	0.6304	0.7194	0.836
5328042	192.851	5328042	192.851	192.851	0	192.851	8	1	1	1
5328042	192.851	9818251	165.959	150.395	0	150.395	7	0.7216	0.7798	0.9062
5328042	192.851	9882519	192.851	163.02	0	163.02	7	0.7321	0.8453	0.8453
5328042	192.851	9885081	165.959	150.406	0	150.406	7	0.7217	0.7799	0.9063
5328042	192.851	10220590	219.743	191.497	0	191.497	8	0.8661	0.993	0.8715
5328042	192.851	10222656	192.851	178.099	0	178.099	8	0.8579	0.9235	0.9235
5328042	192.851	10245856	150.209	149.599	0	149.599	6	0.7733	0.7757	0.9959
5328042	192.851	10276061	192.851	163.018	0	163.018	7	0.7321	0.8453	0.8453
5328042	192.851	10868706	123.318	94.8864	0	94.8864	4	0.4288	0.492	0.7694
5328042	192.851	10870494	177.101	136.714	0	136.714	6	0.5862	0.7089	0.772
5328042	192.851	10902421	139.067	110.037	0	110.037	5	0.4959	0.5706	0.7912
5328042	192.851	10905749	165.959	158.397	0	158.397	7	0.7904	0.8213	0.9544
5328042	192.851	10926736	150.209	94.7072	0	94.7072	4	0.3813	0.4911	0.6305
5328042	192.851	10938977	192.851	150.2	0	150.2	7	0.6378	0.7788	0.7788
5328042	192.851	10992313	150.209	149.602	0	149.602	6	0.7733	0.7757	0.996
5328042	192.851	11012217	80.6759	79.1852	0	79.1852	3	0.4075	0.4106	0.9815
5328042	192.851	11055442	123.318	94.7115	0	94.7115	4	0.4277	0.4911	0.768
5328042	192.851	11080556	192.851	162.84	0	162.84	7	0.7307	0.8444	0.8444
5328042	192.851	11807783	165.959	163.157	0	163.157	7	0.8339	0.846	0.9831
5328042	192.851	11822927	123.318	109.048	0	109.048	5	0.5265	0.5654	0.8843

となりそれぞれの列はマニュアルにあるように
column Content
—— ———————————————————————
1 Id of the reference structure
2 Maximum volume of the reference structure
3 Id of the database structure
4 Maximum volume of the database structure
5 Maximum volume overlap of the two structures
6 Overlap between pharmacophore and exclusion spheres in the reference
7 Corrected volume overlap between database pharmacophore and reference
8 Number of pharmacophore points in the processed pharmacophore
9 TANIMOTO score
10 TVERSKY_REF score
11 TVERSKY_DB score

となってます。
2列目と5列目が近い程よく重なってるということになります。
また -o 指定をすることでアライメント後の分子を書き出すことにしています。
CUIしか無いですが、結構有用なツールと思いました。