Visualize HOMO LUMO with psi4 #RDKit #psi4 #psikit

Now thin wrapper of psi4 named psikit can generate cube file which has frontier orbital information. After calling getMOview, I would like to check the orbital shape.

Psi4 provides script which generates cool view on VMD. To run the script on python3, it is needed to change line 332 from ‘for k,v in options.iteritems():’ to ‘for k,v in options.items():’.

After changed the code. Ready to visualize molecular orbital! Let’s check it.
I run the following code on jupyter notebook.

import os
import sys
import pandas as pd
import numpy as np
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from psikit import Psikit
pk = Psikit()
>Optimizer: Optimization complete!

Then call getMOview() to generate MO cube files.

Next, run the The script has many options. To check the option, -h gives help.

I run the code interactive mode. After run the code, I could see MO on VMD.

To run the code, reader need install VMD in your PC. Of course the script generates image of each MO.

I think it is interesting that visualize MO of protein-ligand complex because it will show interaction of protein and ligand. It is very useful for understanding of the interaction. Any comments or suggestions are greatly appreciated. ;)


Calculate atomic charges with psikit #RDKit #psi4

Recently I implemented new function in psikit for atomic charge calculation. Now user can get mulliken charges, RESP charges and Lowdin charges very easily. There is quick example below.

At first import libraries.

import os
import sys
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
import pandas as pd
import numpy as np
from collections import defaultdict
from psikit import Psikit

I used methanol and imidazole for demo. And psikit can generate 3D structure from SMILES and optimize the conformer with RDKit. Then optimize geometry with psi4.

First example is methanol

pk = Psikit()
methanol = 'CO'
imidazole = 'C1=CN=CN1'
# from literature 6-31G*
# C 0.21 0.04
# O 0.61 0.52
# H 0.13-0.16  0.04
# H 0.39 0.35
resp = pk.resp_charges
mulliken = pk.mulliken_charges
res = defaultdict(list)
for i, atom in enumerate(pk.mol.GetAtoms()):
    #print('{:.3f} {:.3f} {}'.format(mulliken[i], resp[i], atom.GetSymbol()))
methanol_df = pd.DataFrame(res)

Oxygen atom shows negative charge. It seems reasonable I think but the values are far from the reference values.

Second example is imidazole.


resp = pk.resp_charges
mulliken = pk.mulliken_charges
# from literature 6-31G*
# C 0.02 0.24 
# H 0.15 0.16 
# N 0.55 0.10
# C 0.21 0.06
# H 0.15 0.12
# C 0.03 0.05
# H 0.14 0.11
# N 0.42 0.42
# H 0.33 0.26

#Basis set 6-31G**
res = defaultdict(list)
for i, atom in enumerate(pk.mol.GetAtoms()):
    #print('{:.3f} {:.3f} {}'.format(mulliken[i], resp[i], atom.GetSymbol()))
imidazole_df = pd.DataFrame(res)

Two nitrogen atoms shows negative values in both Mulliken/RESP methods.

Reference data came from the URL. I searched mulliken charges of both molecules in google. I found some reports about that but all values are slightly different. I would like to know the accurate value of atomic charges of these molecules. And would like to know how to calculate spin multiplicity with RDKit. Any advices are greatly appreciated. ;)

Draw HOMO LUMO with psikit-2 #chemoinformatics

Yesterday I wrote a post about psikit function for HOMO LUMO drawing. And @fmkz__ gave me very suggestive comment.

He performed QM calculation about tetrazole and disclosed the its distribution of negative charge.

I had interested in his suggestion and I tried it. By using psikit, I calculated energy of acetic acid and 5-methyl-1H-tetrazole and got HOMO LUMO cube files from the results.

Results is below. At first HOMO and LUMO of acetic acid.

HOMO of acetic acid
LUMO of acetic acid

These results shows that carbonyl C=O has large MO and it means carbonyl is most reactive position of the acetic acid. And also LUMO around the alpha carbon is large. It indicates that the site is reactive with electrophile I think.

Next, let’s see result of tetrazole. Tetrazole is often used in medicinal chemistry as a bioisoster of carboxylic acid.

HOMO of tetrazole
LUMO of tetrazole

These results shows 3-N position has large MO compared to other nitrogen atoms. I think it indicates that 3-N position is more reactive than other nitrogen. And I found good example to explain the result in slide share. URL is below.
Page 21 of the slide shows example of 5-sub tetrazoles alkylation.
The page shows example of reaction between 5-Me-1H-tetrazole and trityl-Cl. And most of alkylation occurred at n-3 position.

Borrowed from slide share

It is very exciting for me because I could know that QM is very useful tool for reactivity prediction. The result motivates me to enhance of psikit. ;-)

Any suggestion and advices are highly appreciated.

Draw HOMO LUMO with psikit #RDKit #Psi4 #PyMol

Now I and @fmkz___ are developing a thin wrapper library for Psi4 and RDKit named psikit.
Today I added new function for viewing molecular orbital HOMO and LUMO with pymol. Psi4 has the function which can output cube format file of MO named ‘cubeprop’. So I try to implement the function in to psikit.
By using the library, you can get HOMO/LUMO cube file easily.
Example is below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
from psikit import Psikit
smi = 'COc1cccnc1'
mol = Chem.MolFromSmiles(smi)

Just call ‘getMOview’ after optimization. It took a little bit long time on my PC.

pk = Psikit()
%time pk.optimize()
>Optimizer: Optimization complete!
>CPU times: user 1min 54s, sys: 3.97 s, total: 1min 58s
>Wall time: 31.5 s
# get MO view!

After executing the method, psikit makes 5 files named target.mol, Psi_a_n_n-A.cube, Psi_b_n_n-A.cube, Psi_a_n+1_n+1-A.cube, Psi_b_n+1_n+1-A.cube. First file is mol file. Second and third is information of its HOMO and forth, fifth is information of LUMO.

Then load all files from pymol. Change HOMO/LUMO object show setting as dot. I could get following image.

Optimized molecule
Molecule and HOMO
Molecule and LUMO
Molecule and HOMO/LUMO

The function and pymol seem to work fine. But I want to compare data between this result and another QM’s result.
Current version of psikit is still under development and some bugs. I would like to fix and tune the code.

New functionality of psikit #Chemoinformatics #RDKit #Psi4

Happy new year! ‘Akemashite Omedetou Gozaimasu’ in Japanese!

I and @kzfm_ -san are developing a library which uses a rdkit & psi4 named psikit. You can find brief introduction the concept following URL.

Today I added new function that can calculate RESP charge of give molecule. RESP charge means ‘Restraints for Deriving
Atomic Charges’. Details can read from this link.

OK, I would like to show how to call new function. It is easy to use but it needs long time for large molecule. I used acetic acid as example. To calculate RESP Charge, user need to run optimization.

# This code came from RESP branch, not master branch.

from psikit import Psikit
smi = 'CC(=O)O'
# load smiles and perform optimzation
pk = Psikit()
%time pk.optimize()
>Optimizer: Optimization complete!
>CPU times: user 21.6 s, sys: 920 ms, total: 22.5 s
>Wall time: 6.23 s

Now I can get HOMO, LUMO and RESP Charge. RESP charges are set as atomic properties of pk.mol object. Let’s check before calculation and after calculation.

# Get Atoms from pk.mol object and check atom properties
atoms = pk.mol.GetAtoms()
> []
# run the calculation
charges = pk.resp_charge
atoms = pk.mol.GetAtoms()
> ['EP_C', 'RESP_C']  # EP_C means electro static potential, RESP_C means RESP charge.

Finally draw molecule with RESP charges.

from rdkit.Chem.Draw import MolDrawOptions
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
view = rdMolDraw2D.MolDraw2DSVG(600,800)
op = view.drawOptions()
for idx, atom in enumerate(pk.mol.GetAtoms()):
    num = float(atom.GetProp('RESP_C'))
    op.atomLabels[idx] = "{}({:,.2f})".format(atom.GetSymbol(), num)
svg = view.GetDrawingText().replace('svg:', '')
acetic acid with RESP charge 6-31G** basis

The results indicates Oxygen atom has the most negative charge and alpha carbon shows more negative than carbonyl carbon.
The library is now under development. We would like to release the package near the feature.

All code of the post can get from the URL below.

Draw similarity network #RDKit #Cyjupyter

Recently Kei Ono who is developer of cytoscape developed cyjupyter.
It seems attractive for me because the library can draw network diagram on jupyter notebook.
There are many network structured data in chemoinformatics. For example molecule, molecular similarity map and MMP etc… I used the library to draw similarity map of molecules today.
I am newbie of the library, so following code is very simple but there are several useful examples are provided in official repository.
At first, load modules.

import os
import numpy as np
import igraph
from py2cytoscape import util
from cyjupyter import Cytoscape
from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from rdkit import RDConfig
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

I used CDK2.sdf as a sample dataset

filedir = os.path.join(RDConfig.RDDocsDir,'Book/data/cdk2.sdf')
mols = [mol for mol in Chem.SDMolSupplier(filedir) if mol != None]
for mol in mols:
fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]
smiles_list = [Chem.MolToSmiles(mol) for mol in mols]

Then make graph object, node as an each molecule and make edge if tanimoto similarity more 0.5.

g = igraph.Graph()
for smiles in smiles_list:
for i in range(len(mols)):
    for j in range(i):
        tc = DataStructs.TanimotoSimilarity(fps[i], fps[j])
        if tc >= 0.5:
            g.add_edge(smiles_list[i], smiles_list[j])

Finally convert graph object to json by using py2cytoscape and Draw graph with default settings.

graph_data = util.from_igraph(g)

I could get following image.

And check network

Cyjupyter is useful network drawing tool for jupyter notebook user. I would like to check the way to control visualization.
My example code is uploaded my repository. URL is below.

Calculate HOMO and LUMO with Psi4 reviced #RDKit #Psi4

Yesterday, I got comments from reader.
Regarding the comment, to calculate HOMO LUMO with psi4 correct way is below.

import psi4
import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
psi4.core.set_output_file("output1.dat", True)
def mol2xyz(mol):
    mol = Chem.AddHs(mol)
    AllChem.EmbedMolecule(mol, useExpTorsionAnglePrefs=True,useBasicKnowledge=True)
    atoms = mol.GetAtoms()
    string = string = "\n"
    for i, atom in enumerate(atoms):
        pos = mol.GetConformer().GetAtomPosition(atom.GetIdx())
        string += "{} {} {} {}\n".format(atom.GetSymbol(), pos.x, pos.y, pos.z)
    string += "units angstrom\n"
    return string, mol

Next, calculate HOMO-LUMO of benzene with the function and psi4.

mol = Chem.MolFromSmiles("c1ccccc1")
xyz, mol=mol2xyz(mol)
psi4.set_memory('4 GB')
benz = psi4.geometry(xyz)
%time scf_e, scf_wfn ="B3LYP/cc-pVDZ", return_wfn=True)
> CPU times: user 13.8 s, sys: 226 ms, total: 14 s
> Wall time: 4.51 s

After the calculation, I could access HOMO-LUMO, the code is below.

# HOMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha()]
# LUMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha() + 1]
HOMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha()-1]
LUMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha()]
print(HOMO, LUMO, scf_e)
> -0.2529021800443842 -0.006506519238935586 -232.26252757817264

Check log file

!cat out2.dat

  Memory set to   3.725 GiB by Python driver.
  Threads set to 4 by Python driver.

*** tstart() called on takayukis-MacBook-Pro.local
*** at Mon Aug 27 22:26:56 2018

   => Loading Basis Set <=

    Name: CC-PVDZ
    Role: ORBITAL
    Keyword: BASIS
    atoms 1-6  entry C          line   138 file /Users/iwatobipen/.pyenv/versions/anaconda3-4.2.0/share/psi4/basis/cc-pvdz.gbs 
    atoms 7-12 entry H          line    22 file /Users/iwatobipen/.pyenv/versions/anaconda3-4.2.0/share/psi4/basis/cc-pvdz.gbs 

    There are an even number of electrons - assuming singlet.
    Specify the multiplicity in the molecule input block.

            by Justin Turney, Rob Parrish, Andy Simmonett
                             and Daniel Smith
                              RKS Reference
                        4 Threads,   3814 MiB Core

  ==> Geometry <==

    Molecular point group: c1
    Full point group: C1

    Geometry (in Angstrom), charge = 0, multiplicity = 1:

       Center              X                  Y                   Z               Mass       
    ------------   -----------------  -----------------  -----------------  -----------------
         C            1.160791903590    -0.776928934439    -0.079150803890    12.000000000000
         C            1.255739388827     0.616792238521    -0.002681541758    12.000000000000
         C            0.094947479782     1.393721178119     0.076469246393    12.000000000000
         C           -1.160791914824     0.776928940041     0.079150771790    12.000000000000
         C           -1.255739387065    -0.616792248432     0.002681608472    12.000000000000
         C           -0.094947472815    -1.393721173263    -0.076469222838    12.000000000000
         H            2.058813166717    -1.377983059255    -0.140384154914     1.007825032070
         H            2.227214722117     1.093960079368    -0.004756245235     1.007825032070
         H            0.168401517636     2.471943131887     0.135627614553     1.007825032070
         H           -2.058813208566     1.377983069413     0.140383740077     1.007825032070
         H           -2.227214704360    -1.093960120700     0.004756292428     1.007825032070
         H           -0.168401463717    -2.471943107238    -0.135627939511     1.007825032070

  Running in c1 symmetry.

  Rotational constants: A =      0.18924  B =      0.18924  C =      0.09462 [cm^-1]
  Rotational constants: A =   5673.32397  B =   5673.32388  C =   2836.66196 [MHz]
  Nuclear repulsion =  203.019334728971273

  Charge       = 0
  Multiplicity = 1
  Electrons    = 42
  Nalpha       = 21
  Nbeta        = 21

  ==> Algorithm <==

  SCF Algorithm Type is DF.
  DIIS enabled.
  MOM disabled.
  Fractional occupation disabled.
  Guess Type is SAD.
  Energy threshold   = 1.00e-06
  Density threshold  = 1.00e-06
  Integral threshold = 0.00e+00

  ==> Primary Basis <==

  Basis Set: CC-PVDZ
    Blend: CC-PVDZ
    Number of shells: 54
    Number of basis function: 114
    Number of Cartesian functions: 120
    Spherical Harmonics?: true
    Max angular momentum: 2

  ==> DFT Potential <==

   => Composite Functional: B3LYP <= 

    B3LYP Hyb-GGA Exchange-Correlation Functional

    P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, J. Phys. Chem. 98, 11623 (1994)

    Deriv               =              1
    GGA                 =           TRUE
    Meta                =          FALSE

    Exchange Hybrid     =           TRUE
    MP2 Hybrid          =          FALSE

   => Exchange Functionals <=

    0.0800   Slater exchange
    0.7200         Becke 88

   => Exact (HF) Exchange <=

    0.2000               HF 

   => Correlation Functionals <=

    0.1900   Vosko, Wilk & Nusair (VWN5_RPA)
    0.8100   Lee, Yang & Parr

   => Molecular Quadrature <=

    Radial Scheme       =       TREUTLER
    Pruning Scheme      =           FLAT
    Nuclear Scheme      =       TREUTLER

    BS radius alpha     =              1
    Pruning alpha       =              1
    Radial Points       =             75
    Spherical Points    =            302
    Total Points        =         266172
    Total Blocks        =           2084
    Max Points          =            255
    Max Functions       =            114

   => Loading Basis Set <=

    Name: (CC-PVDZ AUX)
    Role: JKFIT
    Keyword: DF_BASIS_SCF
    atoms 1-6  entry C          line   121 file /Users/iwatobipen/.pyenv/versions/anaconda3-4.2.0/share/psi4/basis/cc-pvdz-jkfit.gbs 
    atoms 7-12 entry H          line    51 file /Users/iwatobipen/.pyenv/versions/anaconda3-4.2.0/share/psi4/basis/cc-pvdz-jkfit.gbs 

  ==> Pre-Iterations <==

    Irrep   Nso     Nmo     Nalpha   Nbeta   Ndocc  Nsocc
     A        114     114       0       0       0       0
    Total     114     114      21      21      21       0

  ==> Integral Setup <==

  DFHelper Memory: AOs need 0.070 [GiB]; user supplied 2.794 [GiB]. Using in-core AOs.

  ==> MemDFJK: Density-Fitted J/K Matrices <==

    J tasked:                   Yes
    K tasked:                   Yes
    wK tasked:                   No
    OpenMP threads:               4
    Memory (MB):               2861
    Algorithm:                 Core
    Schwarz Cutoff:           1E-12
    Mask sparsity (%):       0.3693
    Fitting Condition:        1E-12

   => Auxiliary Basis Set <=

  Basis Set: (CC-PVDZ AUX)
    Blend: CC-PVDZ-JKFIT
    Number of shells: 198
    Number of basis function: 558
    Number of Cartesian functions: 636
    Spherical Harmonics?: true
    Max angular momentum: 3

  Minimum eigenvalue in the overlap matrix is 3.7184237071E-04.
  Using Symmetric Orthogonalization.

  SCF Guess: Superposition of Atomic Densities via on-the-fly atomic UHF.

  ==> Iterations <==

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RKS iter   0:  -232.99504225236745   -2.32995e+02   7.10044e-02 
   @DF-RKS iter   1:  -232.10322360969730    8.91819e-01   8.92096e-03 
   @DF-RKS iter   2:  -232.08890690303650    1.43167e-02   9.91157e-03 DIIS
   @DF-RKS iter   3:  -232.26148696855563   -1.72580e-01   7.72966e-04 DIIS
   @DF-RKS iter   4:  -232.26244386501853   -9.56896e-04   1.55832e-04 DIIS
   @DF-RKS iter   5:  -232.26249567324737   -5.18082e-05   1.40053e-04 DIIS
   @DF-RKS iter   6:  -232.26252736183829   -3.16886e-05   1.12370e-05 DIIS
   @DF-RKS iter   7:  -232.26252757817264   -2.16334e-07   5.75991e-07 DIIS

  ==> Post-Iterations <==

    Orbital Energies [Eh]

    Doubly Occupied:                                                      

       1A    -10.190567     2A    -10.190357     3A    -10.190356  
       4A    -10.189878     5A    -10.189877     6A    -10.189659  
       7A     -0.851906     8A     -0.745429     9A     -0.745428  
      10A     -0.602584    11A     -0.602584    12A     -0.521871  
      13A     -0.463024    14A     -0.444273    15A     -0.421160  
      16A     -0.421159    17A     -0.365359    18A     -0.344262  
      19A     -0.344262    20A     -0.252905    21A     -0.252902  


      22A     -0.006507    23A     -0.006506    24A      0.059666  
      25A      0.099451    26A      0.099452    27A      0.133761  
      28A      0.133764    29A      0.150228    30A      0.153828  
      31A      0.277077    32A      0.277080    33A      0.295653  
      34A      0.295656    35A      0.405118    36A      0.408148  
      37A      0.464413    38A      0.465367    39A      0.512461  
      40A      0.512464    41A      0.517424    42A      0.517430  
      43A      0.519856    44A      0.519868    45A      0.526820  
      46A      0.540745    47A      0.592924    48A      0.592924  
      49A      0.647786    50A      0.647787    51A      0.650076  
      52A      0.650083    53A      0.667968    54A      0.720592  
      55A      0.788201    56A      0.828354    57A      0.857690  
      58A      0.883159    59A      0.883161    60A      0.924558  
      61A      0.992120    62A      0.992122    63A      1.005964  
      64A      1.005973    65A      1.011555    66A      1.011587  
      67A      1.059132    68A      1.071624    69A      1.071625  
      70A      1.216812    71A      1.280232    72A      1.280245  
      73A      1.456675    74A      1.476446    75A      1.476447  
      76A      1.510782    77A      1.538630    78A      1.598096  
      79A      1.598099    80A      1.606414    81A      1.606425  
      82A      1.633651    83A      1.682795    84A      1.682801  
      85A      1.692381    86A      1.692388    87A      1.725702  
      88A      1.795113    89A      1.795115    90A      1.839139  
      91A      1.839159    92A      1.844378    93A      1.848385  
      94A      1.848386    95A      1.883194    96A      1.981556  
      97A      1.981559    98A      1.989993    99A      1.990001  
     100A      2.021139   101A      2.191006   102A      2.261460  
     103A      2.371980   104A      2.426824   105A      2.426829  
     106A      2.435274   107A      2.435279   108A      2.641049  
     109A      2.641053   110A      2.695891   111A      2.795913  
     112A      2.914328   113A      2.914351   114A      3.646462  

    Final Occupation by Irrep:
    DOCC [    21 ]

  Energy converged.

  @DF-RKS Final Energy:  -232.26252757817264

   => Energetics <=

    Nuclear Repulsion Energy =            203.0193347289712733
    One-Electron Energy =                -713.2777384427657807
    Two-Electron Energy =                 306.1488686089047064
    DFT Exchange-Correlation Energy =     -28.1529924732828292
    Empirical Dispersion Energy =           0.0000000000000000
    VV10 Nonlocal Energy =                  0.0000000000000000
    Total Energy =                       -232.2625275781726373

Properties will be evaluated at   0.000000,   0.000000,   0.000000 [a0]

Properties computed using the SCF density matrix

  Nuclear Dipole Moment: [e a0]
     X:     0.0000      Y:    -0.0000      Z:    -0.0000

  Electronic Dipole Moment: [e a0]
     X:     0.0000      Y:    -0.0000      Z:     0.0000

  Dipole Moment: [e a0]
     X:     0.0000      Y:    -0.0000      Z:    -0.0000     Total:     0.0000

  Dipole Moment: [D]
     X:     0.0000      Y:    -0.0000      Z:    -0.0000     Total:     0.0000

*** tstop() called on takayukis-MacBook-Pro.local at Mon Aug 27 22:27:00 2018
Module time:
	user time   =      13.71 seconds =       0.23 minutes
	system time =       0.22 seconds =       0.00 minutes
	total time  =          4 seconds =       0.07 minutes
Total time:
	user time   =      27.58 seconds =       0.46 minutes
	system time =       0.47 seconds =       0.01 minutes
	total time  =         32 seconds =       0.53 minutes

That’s all. I am happy because I can get many response through the my blog post and I can have many opportunity to learn many things.