Partial charge visualization with RDKit #RDKit #QuantumChemistry #psikit

Last post, I showed how to visualize each fingerprint contribution for predictive model results with RDKit.

The function also can visualize atomic properties such as a partial charge of atoms.

Dr. Thomas Evangelidis shared me an example of visualization of quantum chemistry based atom property. He showed an example of conformation effect to the QM calculation.

He tried to calculate PM-6 semi empirical partial charge and draw it with rdkit. He used MOPAC as QM engine. Following figure shows partial charge of same molecule with different conformation. Red arrow shows difference of each molecule.

And also by using partial change from 100 conformers data, he showed how to visualize std and mean of the partial charge.

Mean of partial charge
std of two conformers partial charge

From above image, conjugated aromatic system shows large difference between two conformers. It seems reasonable I think.
You can check whole his code on following URL.

Back to the latest rdkit blog post, Greg showed an example of visualization of partial charge with Extended Huckel method which is newly implemented on RDKit!

I had interest the approach so I tried to use psikit for partial charge calculation.

Following code is almost same as original blog post but little difference, just using psikit.

Current version of psikit can calculate mulliken charge, esp, resp charge and lowdin charge very conveniently. But it took longer time for calculation compared to rdEHTools method.

code on my gist

psikit seems more sensitive to aromatic atoms. I think it is needed to select method in case by case. But rdkit’s rdEHTools works very fast so it is useful tool of light weight QM calc.

Finally thanks Dr. Thomas again for sharing nice code example!

Added Notebook Example for psikit #psi4 #RDKit

Some days ago. I updated psikit, added new function, SAPT.

The method can calculate inter-intra molecular interaction with psi4. Psikit can call the method very easily.

And I added sample notebook in the repository. Following codes are very simple example of SATP for water dimer and FSAPT for phenol dimer.

SAPT for water dimer.

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from psikit import Sapt

w1 = Chem.MolFromMolFile('water1.mol', removeHs=False)
w2 = Chem.MolFromMolFile('water2.mol', removeHs=False)
sapt = Sapt()
res = sapt.run_sapt()
---stdout is below
Initializing SAPT object...

RHF for monomer A finished in 0.55 seconds.
RHF for monomer B finished in 0.52 seconds.
Building ERI tensor...
...built ERI tensor in 3.180 seconds.
Size of the ERI tensor is 0.36 GB, 82 basis functions.

...finished initializing SAPT object in  4.49 seconds.

Starting electrostatics...
...electrostatics took a total of  0.17 seconds.

Starting exchange... took a total of  0.63 seconds.

Starting dispersion...
...dispersion took a total of  8.92 seconds.

Starting induction...
Ind20,r (AB)           -1.37852223 mH       -0.86503511 kcal/mol
Exch-Ind20,r (AB)       0.88580457 mH        0.55585034 kcal/mol
...induction took a total of  15.90 seconds.

SAPT0 Results
Exch10 (S^2)             10.53844851 mH        6.61297129 kcal/mol
Elst10                  -13.02830646 mH       -8.17537956 kcal/mol
Disp20                   -3.42996225 mH       -2.15233218 kcal/mol
Exch-Disp20               0.61399531 mH        0.38528758 kcal/mol
Ind20,r                  -4.33068313 mH       -2.71754264 kcal/mol
Exch-Ind20,r              2.30125737 mH        1.44405971 kcal/mol
Total SAPT0              -7.33525065 mH       -4.60293580 kcal/mol

F-SAPT for phenol dimer

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from psikit import Sapt

p1 = Chem.MolFromMolFile('phenol1.mol', removeHs=False)
p2 = Chem.MolFromMolFile('phenol2.mol', removeHs=False)
sapt = Sapt()
res = sapt.run_fisapt()

--fA.dat and fB.dat files are generated in fsapt folder the  call from terminal, it generates FSAPT results which is shown below.
   => Full Analysis  Reduced Analysis  F-ISAPT: Links 50-50  Full Analysis  Reduced Analysis <=

Frag1     Frag2         Elst     Exch    IndAB    IndBA     Disp    Total 
c1ccccc1_0 c1ccccc1_0    0.686    2.197    0.007   -0.208   -2.403    0.278 
c1ccccc1_0 O_0         -2.751    0.733   -0.147   -0.227   -0.675   -3.067 
O_0       c1ccccc1_0    1.392    0.720    0.222   -0.347   -0.793    1.194 
O_0       O_0         -8.421    6.218   -0.584   -1.512   -1.250   -5.549 
c1ccccc1_0 All         -2.065    2.930   -0.140   -0.435   -3.078   -2.789 
O_0       All         -7.030    6.938   -0.362   -1.859   -2.043   -4.356 
All       c1ccccc1_0    2.078    2.917    0.229   -0.556   -3.196    1.472 
All       O_0        -11.173    6.951   -0.731   -1.739   -1.925   -8.617 
All       All         -9.095    9.868   -0.502   -2.295   -5.121   -7.145 

Psikit can use SAPT, FSAPT efficiently I think. There is room for improvement for FSAPT fragmentation method. Any comments and advice will be greatly appreciated.

URL is below.


Coloring molecule with RESP charge on pymol #Pymol

To visualize 3D structure of molecule, PyMol is nice tool. What I would like to write on the post is how to visualize calculated RESP charge on pymol ;)
One idea is embed calculated RESP charge to b_factor of molecule pdb. PDB file can make easily from rdkit mol object.
A problem for me is how to edit b_factor of PDB file. I found good tool to solve it today, python package ‘BioPandas’! It can install from pypi or anaconda.
By using biopandas, user can handle pdb file like pandas dataframe.

At first I would like to show examples of biopandas. If you often working with PDB you feel it is very useful I think.
Following code is an example for loading PDB and retrieve protein and ligand data as pandas dataframe.

from biopandas.pdb import PandasPdb
pbdobj = PandasPdb('1atp.pdb')
#Check Atom data
#Check ligand data

Now I can access pdb file as pandas data frame, it means that it is easy to edit and analyze pdb data! More examples are described in official site. So I move to next.

Following example is …
Calculate RESP charge with Psikit and then make pdf file from rdkit mol object and finally replace the b_factor to calculated RESP charge.
At frist load packages.

import os
import sys
from biopandas.pdb import PandasPdb
import pandas as pd
sys.path.append('path for /psikit')
from psikit import Psikit

Then define the function which make pdb file.

def make_pdb_with_Resp(smi, fname=None):
    pk = Psikit()
    charge = pk.resp_charge
    RESP = charge['Restrained Electrostatic Potential Charges']
    if fname == None:
        fname = 'mol'
        fname = fname
    Chem.MolToPDBFile(pk.mol, '{}.pdb'.format(fname))
    pdbobj = PandasPdb().read_pdb('{}.pdb'.format(fname))
   # Following step is editing state, you can see the approach like a pandas method ;)
    pdbobj.df['HETATM']['b_factor'] = RESP

Run the function! After running the code, I could get two files named aceticacid.pdb and tetrazole.pdb

It can load from pymol.


Now I would like to color the molecule with edited b_factor. So I use spectrum command on pymol command line.

spectrum b, blue_red, minimum=-1., maximum=1.
Acetic acid with RESP charge

Acetic acid has negative charge on two oxygen atoms. And carbonyl carbon has the most positive charge.

Tetrazole with RESP charge

The result of tetrazole is strange for me because the nitrogen at position 2 which has hydrogen shows positive. And the nitrogen at position 1 is the most negative. I investigated two protonated state of the tetrazole, 1H-tetrazole and 2H-tetrazole.
But both results shows the nitrogen atom which has hydrogen has positive charge than other nitrogen atoms. I found that BioPandas is useful. But I could not get solution for the last problem. Hmm…

Draw HOMO LUMO with psikit-2 #chemoinformatics

Yesterday I wrote a post about psikit function for HOMO LUMO drawing. And @fmkz__ gave me very suggestive comment.

He performed QM calculation about tetrazole and disclosed the its distribution of negative charge.

I had interested in his suggestion and I tried it. By using psikit, I calculated energy of acetic acid and 5-methyl-1H-tetrazole and got HOMO LUMO cube files from the results.

Results is below. At first HOMO and LUMO of acetic acid.

HOMO of acetic acid
LUMO of acetic acid

These results shows that carbonyl C=O has large MO and it means carbonyl is most reactive position of the acetic acid. And also LUMO around the alpha carbon is large. It indicates that the site is reactive with electrophile I think.

Next, let’s see result of tetrazole. Tetrazole is often used in medicinal chemistry as a bioisoster of carboxylic acid.

HOMO of tetrazole
LUMO of tetrazole

These results shows 3-N position has large MO compared to other nitrogen atoms. I think it indicates that 3-N position is more reactive than other nitrogen. And I found good example to explain the result in slide share. URL is below.
Page 21 of the slide shows example of 5-sub tetrazoles alkylation.
The page shows example of reaction between 5-Me-1H-tetrazole and trityl-Cl. And most of alkylation occurred at n-3 position.

Borrowed from slide share

It is very exciting for me because I could know that QM is very useful tool for reactivity prediction. The result motivates me to enhance of psikit. ;-)

Any suggestion and advices are highly appreciated.

Draw HOMO LUMO with psikit #RDKit #Psi4 #PyMol

Now I and @fmkz___ are developing a thin wrapper library for Psi4 and RDKit named psikit.
Today I added new function for viewing molecular orbital HOMO and LUMO with pymol. Psi4 has the function which can output cube format file of MO named ‘cubeprop’. So I try to implement the function in to psikit.
By using the library, you can get HOMO/LUMO cube file easily.
Example is below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
from psikit import Psikit
smi = 'COc1cccnc1'
mol = Chem.MolFromSmiles(smi)

Just call ‘getMOview’ after optimization. It took a little bit long time on my PC.

pk = Psikit()
%time pk.optimize()
>Optimizer: Optimization complete!
>CPU times: user 1min 54s, sys: 3.97 s, total: 1min 58s
>Wall time: 31.5 s
# get MO view!

After executing the method, psikit makes 5 files named target.mol, Psi_a_n_n-A.cube, Psi_b_n_n-A.cube, Psi_a_n+1_n+1-A.cube, Psi_b_n+1_n+1-A.cube. First file is mol file. Second and third is information of its HOMO and forth, fifth is information of LUMO.

Then load all files from pymol. Change HOMO/LUMO object show setting as dot. I could get following image.

Optimized molecule
Molecule and HOMO
Molecule and LUMO
Molecule and HOMO/LUMO

The function and pymol seem to work fine. But I want to compare data between this result and another QM’s result.
Current version of psikit is still under development and some bugs. I would like to fix and tune the code.