Use ORBKIT for rendering MO #orbkit #rdkit #psikit #quantum_chemistry

As you know, there many packages for quantum chemistry not only commercial software but also free tools. And each soft has own output format. So user need to understand how to use it. But it is very time consuming step I think.

ORBIKIT is a modular python toolbox for cross-platform post processing of quantum chemical wavefunction data.

The reference URL is below.
https://arxiv.org/abs/1601.03069

It seems nice isn’t it? Fortunately the source code is disclosed on github.

So I tried to use orbkit with psikit. At first, I installed orbkit while referring the original site.

At first, I installed cython, numpy, scipy, h5py with conda and mayavi installed with pip (mayavi couldn’t install with conda due to package conflicts).

Then install orbkit.

$ cd $HOME
$ git clone https://github.com/orbkit/orbkit.git
$ export ORBKITPATH=$HOME/orbkit
$ cd $ORBKITPATH
$ python setup.py build_ext --inplace clean
$ export PYTHONPATH=$PYHONPATH:$ORBKITPATH

And also I added environment variables to my .bashrc.

Now ready, let’s use orbkit. I used psikit for making fchk file for orbkit. orbkit supports many kinds of input format. If reader has interest the package, please check the original document.

Following code shows how to use psikit and visualize MO with orbkit.

After running the code, mayavi viewer launched and could get MO view, the image is shown below.

Of course current psikit has some functions for communicate pymol for rendering MO but orbkit is also useful package for quantum chemistry.

I uploaded today’s code on my github repo.

https://github.com/iwatobipen/playground/blob/master/orbkit_psikit_test.ipynb

Make input file for geomeTRIC #openbabel #psi4 #geomeTRIC

geomeTRIC is code for molecular structures geometry optimization. TRIC means translation-rotation-internal coordinate (TRIC) system. As described in their publication, geomeTRIC can optimize molecular geometry rapidly and few iteration numbers.

It seems useful for Quantum Chemistry. And geomeTRIC supports many QC engines Q-Chem, TeraChem, Psi4 andMolpro. I like psi4 so I tried to use geomeTRIC with psi4. However to use the package native psi4 input file is required, so I can’t use psikit. So I make input generator for geomeTRIC.

To do it, I use pybel and jinja because pybel can calculate not only charge of molecule but also spin.

And jinja2 is used for template enjine.

My script is very simple as below.

#psi4inpgen.py
import pybel
from jinja2 import Template
from argparse import ArgumentParser

tmpl = Template("""molecule {
{{FC}} {{MP}}
{{XYZ}}
}
set basis {{BASIS}}
gradient('{{METHOD}}')
""")

def getArgs():
    parser = ArgumentParser()
    parser.add_argument("smi", type=str, help="SMILES strings of inputmolecule")
    parser.add_argument("--basis", type=str, help='BASIS SETS', default="sto-3g")
    parser.add_argument("--method", type=str, help="DFT method", default="hf")
    parser.add_argument("--output", type=str, help="outputfile name", default="input")
    return parser

if __name__=="__main__":
    parser = getArgs()
    args = parser.parse_args()
    obmol = pybel.readstring("smi", args.smi)
    obmol.make3D()
    xyz = obmol.write('xyz').split("n")[2:]

    xyz = "n".join(xyz)
    output = tmpl.render(FC=obmol.charge,
                     MP=obmol.spin,
                     XYZ=xyz,
                     BASIS=args.basis,
                     METHOD=args.method)
    with open(f"{args.output}.in", "w") as f:
        for line in output:
            f.write(line)

input.in file is generated just typing following command from bash.

$python psi4inpgen.py "CO" # methanol as an example
cat input.in
molecule {
0 1
C          0.93502       -0.05907        0.03690
O          2.35068       -0.04556        0.04204
H          0.57490       -0.85612        0.69201
H          0.57490       -0.21982       -0.98222
H          0.56919        0.90323        0.40263
H          2.64600       -0.91154       -0.28708

}

set basis sto-3g

gradient('hf')

Now ready to geometory-optimization. I used –psi4 option for calculation. –verbose and –qdata are optional arguments, they aren’t necessary parameters.

After running code, I could get optimized structure as XYZ format file and it can read from pymol.

$geometic-optimize --psi4 input.in --verbose --qdata
<pre><code>                                    ())))))))))))))))/                     
                                ())))))))))))))))))))))))),                
                            *)))))))))))))))))))))))))))))))))             
                    #,    ()))))))))/                .)))))))))),          
                  #%%%%,  ())))))                        .))))))))*        
                  *%%%%%%,  ))              ..              ,))))))).      
                    *%%%%%%,         ***************/.        .)))))))     
            #%%/      (%%%%%%,    /*********************.       )))))))    
          .%%%%%%#      *%%%%%%,  *******/,     **********,      .))))))   
            .%%%%%%/      *%%%%%%,  **              ********      .))))))  
      ##      .%%%%%%/      (%%%%%%,                  ,******      /)))))  
    %%%%%%      .%%%%%%#      *%%%%%%,    ,/////.       ******      )))))) 
  #%      %%      .%%%%%%/      *%%%%%%,  ////////,      *****/     ,))))) 
#%%  %%%  %%%#      .%%%%%%/      (%%%%%%,  ///////.     /*****      ))))).</code></pre>
#%%%%.      %%%%%#      /%%%%%%*      #%%%%%%   /////)     <strong><em><strong>*      ))))),
#%%%%##%  %%%#      .%%%%%%/      (%%%%%%,  ///////.     /</em></strong></strong>      ))))).
##     %%%      .%%%%%%/      <em>%%%%%%,  ////////.      <strong></em></strong>/     ,)))))
#%%%%#      /%%%%%%/      (%%%%%%      /)/)//       <strong><em></strong></em>      ))))))
##      .%%%%%%/      (%%%%%%,                  <strong><em>*</em></strong>      ))))))
.%%%%%%/      *%%%%%%,  <strong>.             /<strong><em>*</em></strong>      .))))))
*%%%%%%/      (%%%%%%   <strong><em><em></em></strong></em>/<em>..,</em>/<strong><em>*</em></strong></strong>       *))))))
#%%/      (%%%%%%,    <strong><em><em></em></strong><strong><em></em></em></strong><strong><em>*</em></strong>/        )))))))
*%%%%%%,         ,<strong><em><em></em></strong><strong><em></em></em></strong>/         ,))))))/
(%%%%%%   ()                              ))))))))
#%%%%,  ())))))                        ,)))))))),
#,    ())))))))))                ,)))))))))).
()))))))))))))))))))))))))))))))/
())))))))))))))))))))))))).
())))))))))))))),

-=#  geomeTRIC started. Version: 0.9.7.2  #=-
geometric-optimize called with the following command line:
/home/takayuki/anaconda3/envs/chemo37/bin/geometric-optimize --psi4 input.in --verbose --qdata
18 internal coordinates being used (instead of 18 Cartesians)
Internal coordinate system (atoms numbered from 1):
Distance 1-2
Distance 1-3
Distance 1-4
----snip---
Iter: 1 Err-dQ = 1.93889e-07 RMSD: 2.85573e-04 Damp: 1.00000e+00
Cartesian coordinates obtained after 1 microiterations (rmsd = 2.856e-04 |dQ| = 1.939e-07)
dy(i): 0.0009 dy(c) -&gt; target: 0.0003 -&gt; 0.2000
Step    5 : Displace = 2.617e-04/3.489e-04 (rms/max) Trust = 2.000e-01 (+) Grad = 4.562e-05/7.685e-05 (rms/max) E (change) = -113.5493994088 (-8.227e-08) Quality = 0.730
Converged! =D
<pre><code>#==========================================================================#
#| If this code has benefited your research, please support us by citing: |#
#|                                                                        |#
#| Wang, L.-P.; Song, C.C. (2016) "Geometry optimization made simple with |#
#| translation and rotation coordinates", J. Chem, Phys. 144, 214108.     |#
#| http://dx.doi.org/10.1063/1.4952956                                    |#
#==========================================================================#</code></pre>
Time elapsed since start of run_optimizer: 4.998 seconds

$ cat input_optim.xyz
6
Iteration 0 Energy -113.54722876
C        0.9350200000   -0.0590700000    0.0369000000
O        2.3506800000   -0.0455600000    0.0420400000
H        0.5749000000   -0.8561200000    0.6920100000
H        0.5749000000   -0.2198200000   -0.9822200000
---snip---
H        0.5804872645    0.8928197117    0.3986730191
H        2.6495111809   -0.9134121048   -0.2877923937

6 iteration data is shown as 6 states in pymol.

This image has an empty alt attribute; its file name is fig.png

To use the output file I need to parse the file. At first I search code for using geomeTRIC and how to parse or handle the data.

I uploaded today’s code on my repository.

https://github.com/iwatobipen/qchem

Quantum Chemistry data of drug bank #QCportal #Quantum_Chemistry

I’m still learning QCArchive. I posted qcportal with reaction dataset. And today I tried to retrieve of drug bank from qcportal. QCportal provides not only calculated numeric data but also 3D mol view by using py3Dmol.

OK let’s go to code. get_molecule method provides many data from qcportal web server.

import qcportal as ptl
client = ptl.FractalClient()
ds = client.get_collection("Dataset", "COMP6 DrugBank")
mols = ds.get_molecules()
mols.shape
> (13379, 1)

What kinds of data in the dataset? It is easy to do it, just call some methods.

ds.list_values().reset_index()['method'].unique()
> array(['ωB97x', 'b3lyp', 'b3lyp-d3m(bj)', 'hf', 'pbe', 'pbe-d3m(bj)',
       'svwn', 'wb97m', 'wb97m-d3(bj)'], dtype=object)
ds.list_values().reset_index()['basis'].unique()
> array(['6-31g*', 'def2-tzvp'], dtype=object)

ds.list_values()

This dataset has not only data from psi4 but also gaussian!.

I got data from method=’wB97x’

val = ds.get_values(method='ωB97x')
val.columns
> Index(['CM5 Charges', 'Hirshfeld Charges', 'Energy', 'Gradient',
       'Hirshfeld Dipole', 'Spin Density'],
      dtype='object')

I got energy from the data and visualize molecules.

energy = val['Energy']
mols['molecule'][0].show()
energy[0]
> -636107.9519541461

Py3Dmol works very well. I could get QC energy of molecule in drug bank and could render molecule as 3D object.

It is very cool!

My whole code is uploaded following URL.

Have a nice week end! ;)

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/drug_bank.ipynb

Open data source of Quantum chemistry! #qcportal #rdkit #cheminformatics #quantum_chemisry

In RDKit UGM 2019, I had interest about QCArchive. QCArchive is MolSSI quantum chemistry archive. It provides useful data and python packages.

By using one package named qcportal, we can access huge data source of quantum chemistry. It is very useful because QC calculation is useful but it requires computational cost. QC data is useful for drug design and machine learning (i.e. building machine learning based force field etc…..).

I used the package. At first I installed qcportal via conda in my env. It isn’t good choice because I couldn’t install new version of the package. Old version of qcportal causes error. So I installed via pip. It worked fine.

Following code is almost same as original document. But I tried it for my memorandum. At first import packages and make client object. I used datasource from MolSSI.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
import qcportal as ptl
client = ptl.FractalClient()

Then checked the list of torsion drive dataset. There are many dataset is available.

client.list_collections("TorsionDriveDataset")

>Fragment Stability Benchmark	None
>OpenFF Fragmenter Phenyl Benchmark	Phenyl substituent torsional barrier heights.
>OpenFF Full TorsionDrive Benchmark 1	None
>OpenFF Group1 Torsions	None
>OpenFF Primary TorsionDrive Benchmark 1	None
>OpenFF Substituted Phenyl Set 1	None
>Pfizer Discrepancy Torsion Dataset 1	None
>SMIRNOFF Coverage Torsion Set 1	None
>TorsionDrive Paper	None

ds = client.get_collection("TorsionDriveDataset", "OpenFF Fragmenter Phenyl Benchmark")
ds.df.head()

>c1c[cH:1][c:2](cc1)[C:3](=[O:4])O
>c1[cH:1][c:2](cnc1)[C:3](=[O:4])O
>[cH:1]1cncc[c:2]1[C:3](=[O:4])O
>[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-]
>Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O

OK I succeeded to loading data. Let’s visualize some completed dataset. RDKit is very useful package for drawing molecules!!!!!

complete_data = ds.status(["b3lyp-d3"], collapse=False, status="COMPLETE")
Draw.MolsToGridImage([Chem.MolFromSmiles(complete_data['B3LYP-D3'].index[i]) for i in range(10)],
                    molsPerRow=5)

Finally visualize torsion energy!

ds.visualize([complete_data['B3LYP-D3'].index[i] for i in range(10)],"B3LYP-D3", units="kJ / mol")

Purple line (4th structure) has highest torsion energy at -90, 90 degree.
The molecule is 5-Hydroxynicotinic acid. Hydroxyl group is located para-positon of carboxylic group. So conjugation effect to make relative energy higher than other structures.

The package is useful for not only data source of QC but also visualization and analysis of molecules.

I uploaded today’s code on my gist.

Call psi4 from web app! #Psi4 #QuantumChemistry

Psi4 is a python package for quantum chemistry. I like it. Native psi4 has some difficulties for preparing input file. So psikit is useful I think. And I think webmo is useful package for calling psi4. I’ve always been interested webmo. So I tried to use it today.
Basic function of webmo is provided free licence.

At first I registered webmo for getting licence code. https://www.webmo.net/index.html

After registration, I could get webmo download link, licence number, and password.

Second, I installed several packages with apt command.

apt install apache2 apache2-suexec-custom libcgi-pm-perl

Then create directory structure.

$ cd ~
$ mkdir public_html
$ chmod 755 public_html
$ cd public_html
$ mkdir cgi-bin
$ chmod 755 cgi-bin
$ exit
Start apache
# a2enmod userdir
# a2enmod suexec
# a2enmod cgid
# service apache2 start

Then eddit /etc/apache2/mods-available/userdir.conf, to enable cgi script.


        UserDir public_html
        UserDir disabled root

        
                AllowOverride FileInfo AuthConfig Limit Indexes
                Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec ExecCGI #ExecCGI added
                AddHAndler cgi-script .cgi #Added
                Require method GET POST OPTIONS
        


Then restart apache ;)

# service apache2 restart

Then install webmo with perl script.

$ tar xzf WebMO.9.x.xxx.tar.gz
$ cd WebMO.install
$ perl setup.pl

Now ready, after restarting apache, I could access local webmo HP via web browser. Default setting is username is admin, password is blank.

After login as admin, system asked to change admin password. So I changed and the create user for calling Psi4.

sample URL is below
http://localhost/~yourusername/cgi-bin/webmo/login.cgi

You can get more details about installation is following URL.
https://www.webmo.net/support/installation.html

Then from admin top page, I set psi4 as quantum chemistry engine. Click interface tab, change psi4 from disable to enable. Then click edit button and set path for psi4 execution.

Tips, If reader installed psi4 via conda, the path schould be ‘/home/***/anaconda/envs/yourenv’. In the case, psi4 path is /home/***/anaconda/envs/yourenv/bin/psi4. bin/psi4 should not be included setting path!!!

After completing the configuration, I made user for QM calculation and login the system as a user.

Making input file is very easy, import file or draw molecule with implemented molecular editor. I used editor for the test.

After drawing the molecule, minimize energy and symmetrize(if possible) is done from selecting upper menu bar. It’s very easy isn’t it?

Then set job options which defines calculation method, basis function, reference and several QM related options. Also the setting can do from GUI.

After setting the options, input data can check from preview tab.

All setting is done, click submit job to run the calculation.

Following screenshot shows water as an example because benzoic acid required looooooooooooooong calculation time in my hobby PC.

Result can see jobmanager. I calculated water molecule vibration. So I could see calculated IR values and spectra.

The spectrum is not so accurate, the reason is that I used low level basis set I think.

Of course psi4 raw output file can download from webserver.

With free charge licence, it is not able to draw MO. But with commercial licence, rendering MO of molecule is available.

WebMO is very useful tool for education, and wet researchers. Because it is easy to use and rich GUI.

If reader who has interested in WebMO, try it! ;)

Electrostatic Potential Surface(ESP) calculation with GCNN #RDKit #chemoinformatics

ESP is a key feature in Drug Discovery. There are many publications discussing ESP in Drug Design. However getting accurate ESP is time-consuming because it needs high level QM calculations. To reduce the calculation cost of QM, predict quantum nature by using Deep learning is researched.

And some days ago, I found interesting article published by reseachers in Astex. URL is below.
https://pubs.acs.org/doi/10.1021/acs.jmedchem.9b01129

They build GCNN model with over than 100,000 diverse drug like molecules QM calculation (B3LYP/6-31G* basis set) data set. And their model showed good performance in ESP calculation, pKa prediction and binding affinity prediction.

And also GCNN based calculation is faster than QM based calculation, Fig 3 shows that their approach is 4 order faster than conventional DFT based calculation.

Luckly, author disclosed the code on github. ;)
https://github.com/AstexUK/ESP_DNN

….But,,,,,, the code is written for python 2.x.
Hmm…. I would like to use the code in python 3.x.

So I forked ESP_DNN and modify the original code. You can find the code here.
https://github.com/iwatobipen/ESP_DNN

My env for the ESP_DNN is below.
Python 3.7
numpy 1.17.2
rdkit 2019.03.4
xarray 0.14.0
tensorflow-gpu 1.14.0
keras-base 2.2.4

Install is very easy.

$ git clone https://github.com/iwatobipen/ESP_DNN.git
$ cd ESP_DNN
$ pip install -e . # or/ pip install .

Then run the example code which is introduced in README.md

ESP_DNN$ python -m esp_dnn.predict -m ligand -i examples/ligands/

After running the code forementioned, pqr files are generated in -i folder.
It can be visualized in http://nglviewer.org/ngl/ where is recommend in the document.

Open the URL with web browser and read a pqr file.
Then change and set visualize settings according to the document.
Finally I could get image with ESP. Following ESP is generated from provided GCNN model.

QM based approach is powerful for drug discovery however the calculation cost is high. On the other side, deep learning based approach is very fast but it requires high quality training data.

So I think the authors work is very useful because they provided not only their code but also high quality trained model. Awesome work!

If I have to say something, in the original repo seems that training dataset is not provided. I would like to check chemical space of training data.

Anyway, calculation of accurate ESP in short time is very useful for me. I would like to apply some prediction tasks ASAP.

Calculate solvent effect in Psi4 #psi4 #quantumchemistry

Recently I use not only chemoinformatics tools but also quantum chemistry tool, my favorite is Psi4. Psi4 has many options and plug-ins for quantum calculation. Most setting of calculation is vacuum, but it actually true. So considering the solvent around the molecules is important.

Can psi4 perform calculation with solvent effect?

Yes!

PCMSolver is plugin for PCM calculation. It can be installed via conda.

$ conda install -c psi4 pcmsolver

PCM means ‘polarizable continuum model’. PCM describes solute-solvent interactions only by means of electrostatics and polarization between the solute molecule and solvent. The solvent is modeled as a dielectric continuum with a given permittivity ε.

To use PCMsolver in psi4, user need to make specific input file for the calculation which should has kind of solvent, basis function, CPCM or IEFPCM, etc.

I made to input file from toulene.mol file which is retrieved from ChEMBL.

input files are below. PCMsolver can not call from python interpreter so I need to make input.dat.

# toluene in water
#! pcm

molecule mol {

0 1
C -2.0705902285428452 0.09342117730391691 0.21438842743524625
C 0.06881622580040792 1.2107631870687914 0.004377262725162135
C -0.043709196272760036 -1.211640074147377 -0.026831077837745607
C -1.3202142335884217 1.271231179761799 0.15648583022799556
C 0.714364020910022 -0.03184758851751915 -0.10370200526939809
C 2.2076605753403253 -0.0996279156897954 -0.22684462930707974
C -1.4325455046382927 -1.1469904802166477 0.1253309895373383
H -3.1452716192643337 0.14180771794289201 0.3335057008466673
H 0.6396267675023236 2.1305648085368674 -0.0295934787763336
H 0.4394292831589224 -2.1792111515476216 -0.08511724686650844
H -1.8144108252945523 2.2311166905139106 0.23329105314598383
H 2.59680405039162 0.7889905119305699 -0.7681174009541791
H 2.5134086150024317 -1.0062726983460601 -0.7912601501934461
H 2.660308829389096 -0.1337122631434785 0.78606242092797
H -2.013676759894258 -2.058593101450173 0.17802430435839692
no_reorient
no_com
units angstrom

}

set {
basis 6-31G**
scf_type pk
pcm true
pcm_scf_type total
}

pcm = {
Units = Angstrom
Medium {
SolverType = IEFPCM
Solvent = Water
}

Cavity {
RadiiSet = UFF
Type = GePol
Scaling = false
Area = 0.3
Mode = Implicit
}
}

energy_pte, wfn = energy(‘scf’, return_wfn=True)

# toluene in benzene
#! pcm

molecule mol {

0 1
C -2.0705902285428452 0.09342117730391691 0.21438842743524625
C 0.06881622580040792 1.2107631870687914 0.004377262725162135
C -0.043709196272760036 -1.211640074147377 -0.026831077837745607
C -1.3202142335884217 1.271231179761799 0.15648583022799556
C 0.714364020910022 -0.03184758851751915 -0.10370200526939809
C 2.2076605753403253 -0.0996279156897954 -0.22684462930707974
C -1.4325455046382927 -1.1469904802166477 0.1253309895373383
H -3.1452716192643337 0.14180771794289201 0.3335057008466673
H 0.6396267675023236 2.1305648085368674 -0.0295934787763336
H 0.4394292831589224 -2.1792111515476216 -0.08511724686650844
H -1.8144108252945523 2.2311166905139106 0.23329105314598383
H 2.59680405039162 0.7889905119305699 -0.7681174009541791
H 2.5134086150024317 -1.0062726983460601 -0.7912601501934461
H 2.660308829389096 -0.1337122631434785 0.78606242092797
H -2.013676759894258 -2.058593101450173 0.17802430435839692
no_reorient
no_com
units angstrom

}

set {
basis 6-31G**
scf_type pk
pcm true
pcm_scf_type total
}

pcm = {
Units = Angstrom
Medium {
SolverType = IEFPCM
Solvent = Benzene
}

Cavity {
RadiiSet = UFF
Type = GePol
Scaling = false
Area = 0.3
Mode = Implicit
}
}

energy_pte, wfn = energy(‘scf’, return_wfn=True)

PCMsolver can has many kinds of solvents listed below.

    1:'Water',
    2:'Propylene Carbonate',
    3:'Dimethylsulfoxide',
    4:'Nitromethane',
    5:'Acetonitrile',
    6:'Methanol',
    7:'Ethanol',
    8:'Acetone',
    9:'1,2-Dichloroethane',
    10:'Methylenechloride',
    11:'Tetrahydrofurane',
    12:'Aniline',
    13:'Chlorobenzene',
    14:'Chloroform',
    15:'Toluene',
    16:'1,4-Dioxane',
    17:'Benzene',
    18:'Carbon Tetrachloride',
    19:'Cyclohexane',
    20:'N-heptane',
    21:'Explicit' 

After making the input files, following step is as same as default psi4 calculation.

$ psi4 input.dat -o output.dat

After waiting few seconds, I could get two output results. Picked up line 355-363.

## toulene in water
@RHF Final Energy: -269.75544369636918

=> Energetics Energetics <=

Nuclear Repulsion Energy = 268.6733324038011688
One-Electron Energy = -895.7789138809855558
Two-Electron Energy = 357.3548941675961714
PCM Polarization Energy = -0.0019460266422257
Total Energy = -269.7526333362304172

Computation Completed

There is difference in PCM Polarization Energy between in water and in toluene, in water it is -0.0052558561183017A.U.(-3.298kcal/mol) and in tolunene, it is -0.0019460266422257A.U.(-1.22kcal/mol)
*1 a.u. =627.50 kcal/mol

I would like to test some molecules. Are there freely available experimental data set? I’ll try to find them.