Calculate HOMO and LUMO with Psi4 #RDKit #Psi4

You know Psi4 is an open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties. It is very easy to use and has an optional Python interface.
It is useful for us I think. Because Psi4 can use in python, it means we can integrate many libraries in python!  And it is worth to know that, to communicate numpy and psi4 is very easy.

Today, I conducted HOMO LUMO calculation with Psi4 and RDKit.

HOMO-LUMO gap is often used to estimate risk of drug-induced phototoxicity.

Let's start! ;-)

At first, import libraries and define the mol2xyz function. The function generates a conformation and converts RDKit mol object to xyz format.

psi4.core.set_output_file(“output1.dat”, True) is used to logging.

import psi4
import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
psi4.core.set_output_file("output1.dat", True)
def mol2xyz(mol):
    mol = Chem.AddHs(mol)
    AllChem.EmbedMolecule(mol, useExpTorsionAnglePrefs=True,useBasicKnowledge=True)
    AllChem.UFFOptimizeMolecule(mol)
    atoms = mol.GetAtoms()
    string = "\n"
    for i, atom in enumerate(atoms):
        pos = mol.GetConformer().GetAtomPosition(atom.GetIdx())
        string += "{} {} {} {}\n".format(atom.GetSymbol(), pos.x, pos.y, pos.z)
    string += "units angstrom\n"
    return string, mol

Next, calculate HOMO-LUMO of benzene with the function and psi4.

mol = Chem.MolFromSmiles("c1ccccc1")
xyz, mol=mol2xyz(mol)
psi4.set_memory('4 GB')
psi4.set_num_threads(4)
benz = psi4.geometry(xyz)
%time scf_e, scf_wfn = psi4.energy("B3LYP/cc-pVDZ", return_wfn=True)
>CPU times: user 13.5 s, sys: 207 ms, total: 13.7 s
>Wall time: 4.46 s

I set return_wfn argument is True because I want to get wave function information.
Energy calculation need many CPU time. To perform many compound calculation, I need more machine power!

After the calculation, I could access HOMO-LUMO, the code is below.

HOMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha()]
LUMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha() + 1]
print(HOMO, LUMO, scf_e)
>-0.006507529999155065 -0.006506586740442874 -232.26253075556204

Yah, easy… But the value is quite different from some literatures. Hmm Am I wrong ? Maybe yes…
Any advice and suggestions will be greatly appreciated.

It is very easy to check log.
I am reading API and reference of Psi4 now!

!cat out.dat

 Memory set to   3.725 GiB by Python driver.
  Threads set to 4 by Python driver.

  Memory set to   3.725 GiB by Python driver.
  Threads set to 4 by Python driver.

*** tstart() called on ********
*** at Fri Aug 24 22:21:33 2018

   => Loading Basis Set <=

    Name: CC-PVDZ
    Role: ORBITAL
    Keyword: BASIS
---------snip------------

    There are an even number of electrons - assuming singlet.
    Specify the multiplicity in the molecule input block.


         ---------------------------------------------------------
                                   SCF
            by Justin Turney, Rob Parrish, Andy Simmonett
                             and Daniel Smith
                              RKS Reference
                        4 Threads,   3814 MiB Core
         ---------------------------------------------------------

  ==> Geometry <==

    Molecular point group: c1
    Full point group: C1

    Geometry (in Angstrom), charge = 0, multiplicity = 1:

       Center              X                  Y                   Z               Mass       
    ------------   -----------------  -----------------  -----------------  -----------------
         C            0.806497703012    -1.143092245995     0.014915282772    12.000000000000
         C            1.393280339340     0.126835003153    -0.002136692164    12.000000000000
         C            0.586782120312     1.269928194219    -0.017051393298    12.000000000000
         C           -0.806497629817     1.143092609376    -0.014913861108    12.000000000000
         C           -1.393280087652    -0.126835526704     0.002137132311    12.000000000000
         C           -0.586782336814    -1.269927969465     0.017053614836    12.000000000000
         H            1.430428517155    -2.027419681898     0.026439293008     1.007825032070
         H            2.471161063731     0.224955697903    -0.003795091078     1.007825032070
         H            1.040731718362     2.252381167101    -0.030251698606     1.007825032070
         H           -1.430425978428     2.027420563468    -0.026458153907     1.007825032070
         H           -2.471160894666    -0.224959372018     0.003781521713     1.007825032070
         H           -1.040735716617    -2.252379143548     0.030235509138     1.007825032070

  Running in c1 symmetry.

  Rotational constants: A =      0.18924  B =      0.18924  C =      0.09462 [cm^-1]
  Rotational constants: A =   5673.32470  B =   5673.32274  C =   2836.66186 [MHz]
  Nuclear repulsion =  203.019333245138824

  Charge       = 0
  Multiplicity = 1
  Electrons    = 42
  Nalpha       = 21
  Nbeta        = 21

  ==> Algorithm <==

  SCF Algorithm Type is DF.
  DIIS enabled.
  MOM disabled.
  Fractional occupation disabled.
  Guess Type is SAD.
  Energy threshold   = 1.00e-06
  Density threshold  = 1.00e-06
  Integral threshold = 0.00e+00

  ==> Primary Basis <==

  Basis Set: CC-PVDZ
    Blend: CC-PVDZ
    Number of shells: 54
    Number of basis function: 114
    Number of Cartesian functions: 120
    Spherical Harmonics?: true
    Max angular momentum: 2

  ==> DFT Potential <==

   => Composite Functional: B3LYP <= 

    B3LYP Hyb-GGA Exchange-Correlation Functional

    P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, J. Phys. Chem. 98, 11623 (1994)

    Deriv               =              1
    GGA                 =           TRUE
    Meta                =          FALSE

    Exchange Hybrid     =           TRUE
    MP2 Hybrid          =          FALSE

   => Exchange Functionals <=

    0.0800   Slater exchange
    0.7200         Becke 88

   => Exact (HF) Exchange <=

    0.2000               HF 

   => Correlation Functionals <=

    0.1900   Vosko, Wilk & Nusair (VWN5_RPA)
    0.8100   Lee, Yang & Parr

   => Molecular Quadrature <=

    Radial Scheme       =       TREUTLER
    Pruning Scheme      =           FLAT
    Nuclear Scheme      =       TREUTLER

    BS radius alpha     =              1
    Pruning alpha       =              1
    Radial Points       =             75
    Spherical Points    =            302
    Total Points        =         266204
    Total Blocks        =           2062
    Max Points          =            256
    Max Functions       =            114

   => Loading Basis Set <=

    Name: (CC-PVDZ AUX)
    Role: JKFIT
    Keyword: DF_BASIS_SCF
-----snip--------
  ==> Pre-Iterations <==

   -------------------------------------------------------
    Irrep   Nso     Nmo     Nalpha   Nbeta   Ndocc  Nsocc
   -------------------------------------------------------
     A        114     114       0       0       0       0
   -------------------------------------------------------
    Total     114     114      21      21      21       0
   -------------------------------------------------------

  ==> Integral Setup <==

  DFHelper Memory: AOs need 0.070 [GiB]; user supplied 2.794 [GiB]. Using in-core AOs.

  ==> MemDFJK: Density-Fitted J/K Matrices <==

    J tasked:                   Yes
    K tasked:                   Yes
    wK tasked:                   No
    OpenMP threads:               4
    Memory (MB):               2861
    Algorithm:                 Core
    Schwarz Cutoff:           1E-12
    Mask sparsity (%):       0.3693
    Fitting Condition:        1E-12

   => Auxiliary Basis Set <=

  Basis Set: (CC-PVDZ AUX)
    Blend: CC-PVDZ-JKFIT
    Number of shells: 198
    Number of basis function: 558
    Number of Cartesian functions: 636
    Spherical Harmonics?: true
    Max angular momentum: 3

  Minimum eigenvalue in the overlap matrix is 3.7184246400E-04.
  Using Symmetric Orthogonalization.

  SCF Guess: Superposition of Atomic Densities via on-the-fly atomic UHF.

  ==> Iterations <==

                           Total Energy        Delta E     RMS |[F,P]|

   @DF-RKS iter   0:  -232.98648910745848   -2.32986e+02   7.09616e-02 
   @DF-RKS iter   1:  -232.10265029904261    8.83839e-01   8.93885e-03 
   @DF-RKS iter   2:  -232.08812748497797    1.45228e-02   9.76151e-03 DIIS
   @DF-RKS iter   3:  -232.26148987002321   -1.73362e-01   7.73194e-04 DIIS
   @DF-RKS iter   4:  -232.26244676165251   -9.56892e-04   2.20816e-04 DIIS
   @DF-RKS iter   5:  -232.26249884952190   -5.20879e-05   1.40071e-04 DIIS
   @DF-RKS iter   6:  -232.26253054068945   -3.16912e-05   1.12245e-05 DIIS
   @DF-RKS iter   7:  -232.26253075556204   -2.14873e-07   5.66817e-07 DIIS

  ==> Post-Iterations <==

    Orbital Energies [Eh]
    ---------------------

    Doubly Occupied:                                                      

       1A    -10.190570     2A    -10.190360     3A    -10.190358  
       4A    -10.189881     5A    -10.189879     6A    -10.189662  
       7A     -0.851912     8A     -0.745431     9A     -0.745426  
      10A     -0.602585    11A     -0.602585    12A     -0.521876  
      13A     -0.463024    14A     -0.444275    15A     -0.421161  
      16A     -0.421155    17A     -0.365362    18A     -0.344264  
      19A     -0.344262    20A     -0.252907    21A     -0.252901  

    Virtual:                                                              

      22A     -0.006508    23A     -0.006507    24A      0.059658  
      25A      0.099450    26A      0.099455    27A      0.133761  
      28A      0.133763    29A      0.150228    30A      0.153828  
      31A      0.277076    32A      0.277078    33A      0.295654  
      34A      0.295658    35A      0.405112    36A      0.408146  
      37A      0.464412    38A      0.465356    39A      0.512461  
      40A      0.512461    41A      0.517423    42A      0.517433  
      43A      0.519852    44A      0.519872    45A      0.526750  
      46A      0.540745    47A      0.592922    48A      0.592924  
      49A      0.647783    50A      0.647787    51A      0.650079  
      52A      0.650080    53A      0.667968    54A      0.720591  
      55A      0.788173    56A      0.828353    57A      0.857688  
      58A      0.883160    59A      0.883161    60A      0.924558  
      61A      0.992121    62A      0.992123    63A      1.005962  
      64A      1.005979    65A      1.011551    66A      1.011617  
      67A      1.059133    68A      1.071623    69A      1.071625  
      70A      1.216802    71A      1.280227    72A      1.280249  
      73A      1.456674    74A      1.476443    75A      1.476447  
      76A      1.510781    77A      1.538598    78A      1.598094  
      79A      1.598098    80A      1.606419    81A      1.606454  
      82A      1.633651    83A      1.682793    84A      1.682805  
      85A      1.692380    86A      1.692389    87A      1.725701  
      88A      1.795112    89A      1.795115    90A      1.839139  
      91A      1.839180    92A      1.844376    93A      1.848382  
      94A      1.848384    95A      1.883194    96A      1.981552  
      97A      1.981560    98A      1.989989    99A      1.990002  
     100A      2.021138   101A      2.191006   102A      2.261459  
     103A      2.371976   104A      2.426825   105A      2.426828  
     106A      2.435277   107A      2.435291   108A      2.641052  
     109A      2.641053   110A      2.695891   111A      2.795915  
     112A      2.914318   113A      2.914347   114A      3.646453  

    Final Occupation by Irrep:
              A 
    DOCC [    21 ]

  Energy converged.

  @DF-RKS Final Energy:  -232.26253075556204

   => Energetics <=

    Nuclear Repulsion Energy =            203.0193332451388244
    One-Electron Energy =                -713.2776809379226961
    Two-Electron Energy =                 306.1488028199291875
    DFT Exchange-Correlation Energy =     -28.1529858827073518
    Empirical Dispersion Energy =           0.0000000000000000
    VV10 Nonlocal Energy =                  0.0000000000000000
    Total Energy =                       -232.2625307555620680



Properties will be evaluated at   0.000000,   0.000000,   0.000000 [a0]

Properties computed using the SCF density matrix

  Nuclear Dipole Moment: [e a0]
     X:    -0.0000      Y:    -0.0000      Z:    -0.0000

  Electronic Dipole Moment: [e a0]
     X:     0.0000      Y:    -0.0000      Z:     0.0000

  Dipole Moment: [e a0]
     X:    -0.0000      Y:    -0.0000      Z:    -0.0000     Total:     0.0000

  Dipole Moment: [D]
     X:    -0.0000      Y:    -0.0000      Z:    -0.0000     Total:     0.0000


*** tstop() ********l at Fri Aug 24 22:21:37 2018
Module time:
	user time   =      13.48 seconds =       0.22 minutes
	system time =       0.21 seconds =       0.00 minutes
	total time  =          4 seconds =       0.07 minutes
Total time:
	user time   =      13.48 seconds =       0.22 minutes
	system time =       0.21 seconds =       0.00 minutes
	total time  =          4 seconds =       0.07 minutes

​

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

7 thoughts on “Calculate HOMO and LUMO with Psi4 #RDKit #Psi4

  1. This is interesting since I get the same result. At first I thought perhaps the geometry should be optimized under more rigorous conditions but that made only a minor difference to the result.

    Looking at the result with methane, the numbers are also wrong compared to literature so I went digging through the log file. The highest doubly occupied (HOMO) orbital is 5A at -0.388416 hartrees. The lowest virtual orbital (LUMO) is 6A at +0.072883 hartrees. Converting these into eV:

    HOMO = -0.388416 hartrees = -10.569426 eV (literature is -12.7 eV)
    LUMO = +072883 hartrees = +1.983250 eV (literature is +1.9 eV)

    The HOMO (5A) corresponds to ‘scf_wfn.nalpha() – 1’ in the vector that Psi4 returns, and the LUMO (6A) corresponds to ‘scf_wfn.nalpha()’ in the same vector.

    Using these lines in the code:
    HOMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() – 1]
    LUMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() ]

    Gives me the following for your benzene example:
    HOMO = -0.254908 hartrees = -6.936444 eV (literature values vary between -9 and -4 eV)
    LUMO = -0.003152 hartrees = -0.085761 eV (literature values vary between -0.7 and +6 eV)
    https://www.researchgate.net/figure/HOMO-and-LUMO-level-positions-and-energy-gap-in-eV-for-the-benzene-and-PTCDA_tbl1_44579330

    The energy gap for benzene is around 6 eV so this falls in line with that too.

    In the methane case the gap is 12.3 eV from the Psi4 calculation and literature shows that DFT for this molecule is around this figure with various basis sets.

    1. Hi Andy,
      Thank you for your comments.
      Sure, I agree that my calculation is based on roughly optimized geometry and it has still room for improve.

      Benzene is very simple structure for organic and medicinal chemist however there are many HOMO – LUMO values are reported… It’s interesting for me.

      I would like to ask you why you use following command for HOMO/LUMO calculation.

      HOMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() – 1]
      LUMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() ]

      By my understanding, scf_wfn.nalpha() returns number of Returns the number alpha electrons, so highest occupied molecule orbital should be “scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha()]”. Or do I need to convert number of electrons to python index?

      Any comments and/or advices are appreciated.
      Thanks

      1. Your understanding is correct regarding what scf_wfn.nalpha() is supposed to give you. However, I think this is a bug in the code.

        Looking at the section of your logfile that deals with orbitals we see the following (some lines removed for clarity):

        Doubly Occupied:                                                      
          19A     -0.344262    20A     -0.252907    21A     -0.252901 
        
        Virtual:                                                              
          22A     -0.006508    23A     -0.006507    24A      0.059658 
        

        In other words, the HOMO is 21A, and the LUMO is 22A.

        If we ask Psi4 for the value of scf_wfn.nalpha() and scf_wfn.nalpha() + 1:

        scf_wfn.nalpha()
        21L
        scf_wfn.nalpha() + 1
        22L

        So this is correct.

        However, if we now ask for the energy with the commands you used:

        HOMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha()]
        LUMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() + 1]
        print(HOMO)
        -0.006507359007532365
        print(LUMO)
        -0.006505885540737663

        This is incorrect, since this is the value for 22A (LUMO) and 23A.

        Running a short python loop to check the values of each:

        for x in range(scf_wfn.nalpha() – 5, scf_wfn.nalpha() + 5):
        … print (x,scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[x])

        (16, ‘-0.36536139997427’)
        (17, ‘-0.3442637226030818’)
        (18, ‘-0.3442627202770589’)
        (19, ‘-0.2529058168266043’)
        (20, ‘-0.25290025580303666’)
        (21, ‘-0.006507359007532365’)
        (22, ‘-0.006505885540737663’)
        (23, ‘0.059658198769279545’)
        (24, ‘0.09945063457011293’)
        (25, ‘0.09945379639349274’)

        Note that the correct energy for 21 should be -0.252901 (according to the log file) but here it’s 20.

        Based upon this, I think there is a bug in the way that Psi4 interfaces with numpy. I note that the numpy/Psi4 interface was changed between versions 1.1 and 1.2.

        Converting the Psi4 core.Vector into a numpy array and indexing this:

        b = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’)
        np.asarray(b)[21]
        ‘-0.006507359007532365’

        We see that this is what is causing the issue, and I think it’s due to numpy numbering from zero whereas the electrons number from one. In other words the first (zero) element of the array shouldn’t have a value but does:

        np.asarray(b)[0]
        ‘-10.190569457449559’

      2. Thanks for your prompt reply! That’s make sense. To access the data I need to use python indexing. :-)
        I will revive my code ASAP.

      3. I still think it’s a bug in Psi4 and will report it to the group.

        You have a great blog – keep up the good work!

  2. It is quite interesting to discuss the issue here. I am wondering whether this has been fixed or not? I mean if we stick to your code, whether the energy difference between LUMO and HOMO is similar to the reference?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.