Use ORBKIT for rendering MO #orbkit #rdkit #psikit #quantum_chemistry

As you know, there many packages for quantum chemistry not only commercial software but also free tools. And each soft has own output format. So user need to understand how to use it. But it is very time consuming step I think.

ORBIKIT is a modular python toolbox for cross-platform post processing of quantum chemical wavefunction data.

The reference URL is below.
https://arxiv.org/abs/1601.03069

It seems nice isn’t it? Fortunately the source code is disclosed on github.

So I tried to use orbkit with psikit. At first, I installed orbkit while referring the original site.

At first, I installed cython, numpy, scipy, h5py with conda and mayavi installed with pip (mayavi couldn’t install with conda due to package conflicts).

Then install orbkit.

$ cd $HOME
$ git clone https://github.com/orbkit/orbkit.git
$ export ORBKITPATH=$HOME/orbkit
$ cd $ORBKITPATH
$ python setup.py build_ext --inplace clean
$ export PYTHONPATH=$PYHONPATH:$ORBKITPATH

And also I added environment variables to my .bashrc.

Now ready, let’s use orbkit. I used psikit for making fchk file for orbkit. orbkit supports many kinds of input format. If reader has interest the package, please check the original document.

Following code shows how to use psikit and visualize MO with orbkit.

After running the code, mayavi viewer launched and could get MO view, the image is shown below.

Of course current psikit has some functions for communicate pymol for rendering MO but orbkit is also useful package for quantum chemistry.

I uploaded today’s code on my github repo.

https://github.com/iwatobipen/playground/blob/master/orbkit_psikit_test.ipynb

Rendering molecular orbital on Jupyter notebook #psikit #py3dmol #rdkit #memo

@fmkz___ and I( @iwatobipen ) are developing psikit which is a thin wrapper of psi4 and rdkit. I hope the package integrates quantum chemistry (Psi4) and chemoinformatics (RDKit).

By using psikit, user can make molecular orbital data very convinienlry.

Rendering MO is useful for understanding molecular electrostatic shape and nature, but sometime it is difficult to medicinal chemist because it requires CADD skills for rendering it.

Psikit has useful method named ‘create_cube_files’, the method make cube files of HOMO, LUMO and ESP. And also psikit can call pymol to rendering them. However it can’t render MO directory on jupyter notebook so I would like to show you how to render MO on jupyter notebook today.

Fortunately, py3Dmol is easy tool for rendering CUBE file!

Sample code is below. I used psikit for quantum calculation and used py3dmol for rendering. First, calculate energy and create cube files. Default gridspace value is set 0.3 but I recommend it should be changed more large r value. Because cube file with small gridspace value will become huge size file.

import py3Dmol
from psikit import Psikit
from rdkit import Chem
pk = Psikit()
pk.read_from_smiles('O')
pk.energy()
pk.create_cube_files(gridspace=0.5)

After running the method, cube files are generated. So read the file.

homo_voldata = open("Psi_a_5_1-A\"_HOMO.cube", "r").read()
lumo_voldata = open("Psi_a_6_5-A\'_LUMO.cube", "r").read()

Now ready to render. Make view object and attach some dataset. addVolumetricData method loads cube data with options. Then call addModel with molblock data. RDKit can convert molobject to molblock by using MolToMolBlock method.

v = py3Dmol.view()
v.addVolumetricData(homo_voldata, "cube", {'isoval': -0.03, 'color': "red", 'opacity': 0.75})
v.addVolumetricData(homo_voldata, "cube", {'isoval': 0.03, 'color': "blue", 'opacity': 0.75})
v.addModel(Chem.MolToMolBlock(pk.mol), 'mol')
v.setStyle({'stick':{}})
v.zoomTo()
v.show()

Works fine. The molecule object is rotatable. LUMO can be rendered with same manner.

In summary, psi4, rdkit and py3dmol integration can render molecule with MO at the touch of a button. I uploaded today’s code on my repository.

https://github.com/iwatobipen/playground/blob/master/mo/RenderMO.ipynb

Example usage of psi4-openmm-interface #Psi4 #OpenMM #RDKit

Molecular dynamics and Quantum Chemistry are important tools for CADD. I have interested in these topics and OpenMM and Psi4 are nice tool to handing MD and QM.

Today I tried to use psi4-openmm-interface which allows passing of molecular systems between each program.

I reviewed test script and found that the package pass the molecule which optimized with openmm to psi4 as xyz format. And also the package can conduct MD calculation very simply.

I used it, following code is almost same as original repo.

At first, I install the package you can find how to install it in following URL.
https://github.com/mzott/Psi4-OpenMM-Interface

Install psi4, openmm and set PYTHONPASS. It was simple. And I need to modify some code to import the package correctly.

  1. It uses PeriodicTable class of qcelemental, but the module can’t import _temp_symbol attribute from periodictable.py, so I added _temp_symbol line to qcelemental/periodic_table.py
# qcelemental/periodic_table.py 
class PeriodicTable:
    """Nuclear and mass data about chemical elements from NIST.

    Parameters
    ----------
    None

    Attributes
    ----------
    A : list of int
        Mass number, number of protons and neutrons, starting with 0 for dummies.
    Z : list of int
        Atomic number, number of protons, starting with 0 for dummies.
    E : list of str
        Element symbol from periodic table, starting with "X" for dummies. "Fe" capitalization.
    EA : list of str
        Nuclide symbol in E + A form, e.g., "Li6".
        List `EA` is a superset of `E`; that is, both "Li6" and "Li" present.
        For hydrogen, "D" and "T" also included.
    mass : list of :py:class:`decimal.Decimal`
        Atomic mass [u].
        For nuclides (e.g., "Li6"), the reported mass.
        For stable elements (e.g., "Li"), the mass of the most abundant isotope ("Li7").
        For unstable elements (e.g., "Pu"), the mass of the longest-lived isotope ("Pu244").
    name : list of str
        Element name from periodic table, starting with "Dummy". "Iron" capitalization.

    """

    def __init__(self):

        from . import data

        # Of length number of elements
        self.Z = data.nist_2011_atomic_weights["Z"]
        self.E = data.nist_2011_atomic_weights["E"]
        self.name = data.nist_2011_atomic_weights["name"]

        self._el2z = dict(zip(self.E, self.Z))
        self._z2el = collections.OrderedDict(zip(self.Z, self.E))
        self._element2el = dict(zip(self.name, self.E))
        self._el2element = dict(zip(self.E, self.name))

        # Of length number of isotopes
        self._EE = data.nist_2011_atomic_weights["_EE"]
        self.EA = data.nist_2011_atomic_weights["EA"]
        self.A = data.nist_2011_atomic_weights["A"]
        self.mass = data.nist_2011_atomic_weights["mass"]

        self._eliso2mass = dict(zip(self.EA, self.mass))
        self._eliso2el = dict(zip(self.EA, self._EE))
        self._eliso2a = dict(zip(self.EA, self.A))
        self._el2a2mass = collections.defaultdict(dict)
        for EE, m, A in zip(self._EE, self.mass, self.A):
            self._el2a2mass[EE][A] = float(m)
       # following code was added
        self._temp_symbol =  [
    "X", "H", "HE", "LI", "BE", "B", "C", "N", "O", "F", "NE", "NA", "MG", "AL", "SI", "P", "S", "CL", "AR", "K", "CA",
    "SC", "TI", "V", "CR", "MN", "FE", "CO", "NI", "CU", "ZN", "GA", "GE", "AS", "SE", "BR", "KR", "RB", "SR", "Y",
    "ZR", "NB", "MO", "TC", "RU", "RH", "PD", "AG", "CD", "IN", "SN", "SB", "TE", "I", "XE", "CS", "BA", "LA", "CE",
    "PR", "ND", "PM", "SM", "EU", "GD", "TB", "DY", "HO", "ER", "TM", "YB", "LU", "HF", "TA", "W", "RE", "OS", "IR",
    "PT", "AU", "HG", "TL", "PB", "BI", "PO", "AT", "RN", "FR", "RA", "AC", "TH", "PA", "U", "NP", "PU", "AM", "CM",
    "BK", "CF", "ES", "FM", "MD", "NO", "LR", "RF", "DB", "SG", "BH", "HS", "MT", "DS", "RG", "UUB", "UUT", "UUQ",
    "UUP", "UUH", "UUS", "UUO"
]
.....snip...

2. Line 81 of psiomm/molecule.py should be change below.

            # from
            #z_vals.append(periodictable.el2z[Zxyz_list[0].upper()])
            #To 
            z_vals.append(periodictable._el2z[Zxyz_list[0].upper()])

3. gaff2.xml which is forcefield parameters is placed in openmm installed path openmm/app/data/. Because for my env, gaff2.xml file couldn’t find in the folder.

4. Got cov_radii.py from qcdb/qcdb and placed in psi4/driver/qcdb folder.

Now ready. Let’s write code.

Import some packages at first I used psikit for molecular geometry generation.

import numpy as np

from psikit import Psikit
from psiomm import molecule
from psiomm import psi_omm as po

from simtk.openmm.app import Simulation
from simtk.openmm.app import ForceField
from simtk.openmm.app import Modeller
from simtk.openmm.app import PDBReporter

from simtk.openmm import Platform
from simtk.openmm import LangevinIntegrator

from simtk.openmm import Vec3

from simtk.unit import picosecond
from simtk.unit import kelvin
from simtk.unit import kilocalorie_per_mole
from simtk.unit import angstrom

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole

Work flow

  • make molecule from xyz string => solute

  • generate bonds

  • generate atom types

  • generate charges

  • make topology

  • define forcefield

  • generate template with FF, topology and solute

  • make modeller

  • perform simulation to get trajectory

pk = Psikit()
pk.read_from_smiles('CCOc1cccnc1')
pk.rdkit_optimize()

Psikit generate xyz strings with charge and multiplicity information but psiomm can’t read the line so I remove them and passed the string to psiomm. Then generate some molecular properties.

# remove mutiplicity, formal charge
xyzstring = '\n'.join(pk.mol2xyz().split('\n')[2:])
solute = molecule.Molecule.from_xyz_string(xyzstring)
solute.generate_bonds()
solute.generate_atom_types()
solute.generate_charges()
# check generated atoms and charges
for i, at in enumerate(solute.atom_types):
    print(i, at, solute.charges[i])

0 c3 -0.16240415191041357
1 c3 0.03679384783580364
2 os -0.263056064493707
   ---snip---
17 h4 0.07500144690962385

Following workflow is based on openmm. GAFF2 is implemented in AMBER and TIP3P is water model which is implemented in CHARMM.

topology = po.make_topology(solute)
forcefield = ForceField('gaff2.xml', 'tip3p.xml')
po.generateTemplate(forcefield, topology, solute)
coords = []
for i in range(len(solute.xyz)):
    print(solute.xyz[i])
    coords.append(Vec3(solute.xyz[i][0], solute.xyz[i][1], solute.xyz[i][2]))
positions = coords * angstrom

Make modeller with topology and positions. I added 50 water molecules to the model. And defined the simulation system.

modeller = Modeller(topology, positions)
modeller.addSolvent(forcefield, numAdded=50)
omm_sys = forcefield.createSystem(modeller.topology)
integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picosecond)

simulation = Simulation(modeller.topology,
                       omm_sys,
                       integrator,
                       platform=Platform.getPlatformByName('Reference'))

I used PDBReporter module to store the simulation. Simulation step set 400 and reporter object save data each 10 step. So I could get output pdb with 40 (400 /10 ) states.

simulation.context.setPositions(modeller.positions)
simulation.reporters.append(PDBReporter('output.pdb', 10))
simulation.step(400)
simulation.minimizeEnergy(tolerance=2*kilocalorie_per_mole)
MD movie

It worked fine. following code didn’t use psi4 based calculation, to do it I need to modify input file because last data has not only target molecule but also 50 water molecules. I’ll check the package and psi4 more.

Today’s code can find from my gist/github repo.

https://github.com/iwatobipen/playground/blob/master/openmm-psikit.ipynb

Partial charge visualization with RDKit #RDKit #QuantumChemistry #psikit

Last post, I showed how to visualize each fingerprint contribution for predictive model results with RDKit.

The function also can visualize atomic properties such as a partial charge of atoms.

Dr. Thomas Evangelidis shared me an example of visualization of quantum chemistry based atom property. He showed an example of conformation effect to the QM calculation.

He tried to calculate PM-6 semi empirical partial charge and draw it with rdkit. He used MOPAC as QM engine. Following figure shows partial charge of same molecule with different conformation. Red arrow shows difference of each molecule.

And also by using partial change from 100 conformers data, he showed how to visualize std and mean of the partial charge.

Mean of partial charge
std of two conformers partial charge

From above image, conjugated aromatic system shows large difference between two conformers. It seems reasonable I think.
You can check whole his code on following URL.
https://github.com/tevang/tutorials/tree/master/compare_atomic_properties

Back to the latest rdkit blog post, Greg showed an example of visualization of partial charge with Extended Huckel method which is newly implemented on RDKit!

I had interest the approach so I tried to use psikit for partial charge calculation.

Following code is almost same as original blog post but little difference, just using psikit.

Current version of psikit can calculate mulliken charge, esp, resp charge and lowdin charge very conveniently. But it took longer time for calculation compared to rdEHTools method.

code on my gist

psikit seems more sensitive to aromatic atoms. I think it is needed to select method in case by case. But rdkit’s rdEHTools works very fast so it is useful tool of light weight QM calc.

Finally thanks Dr. Thomas again for sharing nice code example!

Added Notebook Example for psikit #psi4 #RDKit

Some days ago. I updated psikit, added new function, SAPT.

The method can calculate inter-intra molecular interaction with psi4. Psikit can call the method very easily.

And I added sample notebook in the repository. Following codes are very simple example of SATP for water dimer and FSAPT for phenol dimer.

SAPT for water dimer.



from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from psikit import Sapt

w1 = Chem.MolFromMolFile('water1.mol', removeHs=False)
w2 = Chem.MolFromMolFile('water2.mol', removeHs=False)
Draw.MolsToGridImage([w1,w2])
sapt = Sapt()
sapt.monomer1_from_molfile('water1.mol')
sapt.monomer2_from_molfile('water2.mol')
sapt.make_dimer()
res = sapt.run_sapt()
---stdout is below
Initializing SAPT object...

RHF for monomer A finished in 0.55 seconds.
RHF for monomer B finished in 0.52 seconds.
Building ERI tensor...
...built ERI tensor in 3.180 seconds.
Size of the ERI tensor is 0.36 GB, 82 basis functions.

...finished initializing SAPT object in  4.49 seconds.

Starting electrostatics...
...electrostatics took a total of  0.17 seconds.

Starting exchange...
...exchange took a total of  0.63 seconds.

Starting dispersion...
...dispersion took a total of  8.92 seconds.

Starting induction...
Ind20,r (AB)           -1.37852223 mH       -0.86503511 kcal/mol
Exch-Ind20,r (AB)       0.88580457 mH        0.55585034 kcal/mol
...induction took a total of  15.90 seconds.

SAPT0 Results
----------------------------------------------------------------------
Exch10 (S^2)             10.53844851 mH        6.61297129 kcal/mol
Elst10                  -13.02830646 mH       -8.17537956 kcal/mol
Disp20                   -3.42996225 mH       -2.15233218 kcal/mol
Exch-Disp20               0.61399531 mH        0.38528758 kcal/mol
Ind20,r                  -4.33068313 mH       -2.71754264 kcal/mol
Exch-Ind20,r              2.30125737 mH        1.44405971 kcal/mol
----------------------------------------------------------------------
Total SAPT0              -7.33525065 mH       -4.60293580 kcal/mol

F-SAPT for phenol dimer

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from psikit import Sapt

p1 = Chem.MolFromMolFile('phenol1.mol', removeHs=False)
p2 = Chem.MolFromMolFile('phenol2.mol', removeHs=False)
Draw.MolsToGridImage([p1,p2])
sapt = Sapt()
sapt.monomer1_from_molfile('phenol1.mol')
sapt.monomer2_from_molfile('phenol2.mol')
sapt.make_dimer()
res = sapt.run_fisapt()

--fA.dat and fB.dat files are generated in fsapt folder the  call fsapt.py from terminal, it generates FSAPT results which is shown below.
   => Full Analysis  Reduced Analysis  F-ISAPT: Links 50-50  Full Analysis  Reduced Analysis <=

Frag1     Frag2         Elst     Exch    IndAB    IndBA     Disp    Total 
c1ccccc1_0 c1ccccc1_0    0.686    2.197    0.007   -0.208   -2.403    0.278 
c1ccccc1_0 O_0         -2.751    0.733   -0.147   -0.227   -0.675   -3.067 
O_0       c1ccccc1_0    1.392    0.720    0.222   -0.347   -0.793    1.194 
O_0       O_0         -8.421    6.218   -0.584   -1.512   -1.250   -5.549 
c1ccccc1_0 All         -2.065    2.930   -0.140   -0.435   -3.078   -2.789 
O_0       All         -7.030    6.938   -0.362   -1.859   -2.043   -4.356 
All       c1ccccc1_0    2.078    2.917    0.229   -0.556   -3.196    1.472 
All       O_0        -11.173    6.951   -0.731   -1.739   -1.925   -8.617 
All       All         -9.095    9.868   -0.502   -2.295   -5.121   -7.145 

Psikit can use SAPT, FSAPT efficiently I think. There is room for improvement for FSAPT fragmentation method. Any comments and advice will be greatly appreciated.

URL is below.

GITHUB
https://github.com/Mishima-syk/psikit
FSAPT
https://nbviewer.jupyter.org/github/Mishima-syk/psikit/blob/master/examples/example_sapt/fsapt_ex.ipynb
SAPT
https://nbviewer.jupyter.org/github/Mishima-syk/psikit/blob/master/examples/example_sapt/sapt_ex.ipynb

Coloring molecule with RESP charge on pymol #Pymol

To visualize 3D structure of molecule, PyMol is nice tool. What I would like to write on the post is how to visualize calculated RESP charge on pymol ;)
One idea is embed calculated RESP charge to b_factor of molecule pdb. PDB file can make easily from rdkit mol object.
A problem for me is how to edit b_factor of PDB file. I found good tool to solve it today, python package ‘BioPandas’!
http://rasbt.github.io/biopandas/ It can install from pypi or anaconda.
By using biopandas, user can handle pdb file like pandas dataframe.

At first I would like to show examples of biopandas. If you often working with PDB you feel it is very useful I think.
Following code is an example for loading PDB and retrieve protein and ligand data as pandas dataframe.

from biopandas.pdb import PandasPdb
pbdobj = PandasPdb('1atp.pdb')
#Check Atom data
pbdobj.df['ATOM'].head(3)
#Check ligand data
pbdobj.df['HETATM'].head(3)

Now I can access pdb file as pandas data frame, it means that it is easy to edit and analyze pdb data! More examples are described in official site. So I move to next.

Following example is …
Calculate RESP charge with Psikit and then make pdf file from rdkit mol object and finally replace the b_factor to calculated RESP charge.
At frist load packages.

import os
import sys
from biopandas.pdb import PandasPdb
import pandas as pd
sys.path.append('path for /psikit')
from psikit import Psikit

Then define the function which make pdb file.

def make_pdb_with_Resp(smi, fname=None):
    pk = Psikit()
    pk.read_from_smiles(smi)
    pk.optimize()
    charge = pk.resp_charge
    RESP = charge['Restrained Electrostatic Potential Charges']
    if fname == None:
        fname = 'mol'
    else:
        fname = fname
    Chem.MolToPDBFile(pk.mol, '{}.pdb'.format(fname))
    pdbobj = PandasPdb().read_pdb('{}.pdb'.format(fname))
   # Following step is editing state, you can see the approach like a pandas method ;)
    pdbobj.df['HETATM']['b_factor'] = RESP
    pdbobj.to_pdb('{}.pdb'.format(fname))

Run the function! After running the code, I could get two files named aceticacid.pdb and tetrazole.pdb

It can load from pymol.

make_pdb_with_Resp('CC(=O)O','acetic_acid')
make_pdb_with_Resp('C1=NNN=N1','tetrazole')

Now I would like to color the molecule with edited b_factor. So I use spectrum command on pymol command line.

spectrum b, blue_red, minimum=-1., maximum=1.
Acetic acid with RESP charge

Acetic acid has negative charge on two oxygen atoms. And carbonyl carbon has the most positive charge.

Tetrazole with RESP charge

The result of tetrazole is strange for me because the nitrogen at position 2 which has hydrogen shows positive. And the nitrogen at position 1 is the most negative. I investigated two protonated state of the tetrazole, 1H-tetrazole and 2H-tetrazole.
But both results shows the nitrogen atom which has hydrogen has positive charge than other nitrogen atoms. I found that BioPandas is useful. But I could not get solution for the last problem. Hmm…

Draw HOMO LUMO with psikit-2 #chemoinformatics

Yesterday I wrote a post about psikit function for HOMO LUMO drawing. And @fmkz__ gave me very suggestive comment.

He performed QM calculation about tetrazole and disclosed the its distribution of negative charge.

I had interested in his suggestion and I tried it. By using psikit, I calculated energy of acetic acid and 5-methyl-1H-tetrazole and got HOMO LUMO cube files from the results.

Results is below. At first HOMO and LUMO of acetic acid.

HOMO of acetic acid
LUMO of acetic acid

These results shows that carbonyl C=O has large MO and it means carbonyl is most reactive position of the acetic acid. And also LUMO around the alpha carbon is large. It indicates that the site is reactive with electrophile I think.

Next, let’s see result of tetrazole. Tetrazole is often used in medicinal chemistry as a bioisoster of carboxylic acid.

HOMO of tetrazole
LUMO of tetrazole

These results shows 3-N position has large MO compared to other nitrogen atoms. I think it indicates that 3-N position is more reactive than other nitrogen. And I found good example to explain the result in slide share. URL is below.

https://www.slideshare.net/ShrikantPharande/tetrazole-and-triazole-67637231
Page 21 of the slide shows example of 5-sub tetrazoles alkylation.
The page shows example of reaction between 5-Me-1H-tetrazole and trityl-Cl. And most of alkylation occurred at n-3 position.

Borrowed from slide share

It is very exciting for me because I could know that QM is very useful tool for reactivity prediction. The result motivates me to enhance of psikit. ;-)

Any suggestion and advices are highly appreciated.