```
You know Psi4 is an open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties. It is very easy to use and has an optional Python interface.
It is useful for us I think. Because Psi4 can use in python, it means we can integrate many libraries in python! And it is worth to know that, to communicate numpy and psi4 is very easy.
Today, I conducted HOMO LUMO calculation with Psi4 and RDKit.
```

HOMO-LUMO gap is often used to estimate risk of drug-induced phototoxicity.

```
Let's start! ;-)
```

At first, import libraries and define the mol2xyz function. The function generates a conformation and converts RDKit mol object to xyz format.

psi4.core.set_output_file(“output1.dat”, True) is used to logging.

import psi4 import numpy as np from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem.Draw import IPythonConsole psi4.core.set_output_file("output1.dat", True) def mol2xyz(mol): mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol, useExpTorsionAnglePrefs=True,useBasicKnowledge=True) AllChem.UFFOptimizeMolecule(mol) atoms = mol.GetAtoms() string = "\n" for i, atom in enumerate(atoms): pos = mol.GetConformer().GetAtomPosition(atom.GetIdx()) string += "{} {} {} {}\n".format(atom.GetSymbol(), pos.x, pos.y, pos.z) string += "units angstrom\n" return string, mol

Next, calculate HOMO-LUMO of benzene with the function and psi4.

mol = Chem.MolFromSmiles("c1ccccc1") xyz, mol=mol2xyz(mol) psi4.set_memory('4 GB') psi4.set_num_threads(4) benz = psi4.geometry(xyz) %time scf_e, scf_wfn = psi4.energy("B3LYP/cc-pVDZ", return_wfn=True) >CPU times: user 13.5 s, sys: 207 ms, total: 13.7 s >Wall time: 4.46 s

I set return_wfn argument is True because I want to get wave function information.

Energy calculation need many CPU time. To perform many compound calculation, I need more machine power!

After the calculation, I could access HOMO-LUMO, the code is below.

HOMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha()] LUMO = scf_wfn.epsilon_a_subset('AO', 'ALL').np[scf_wfn.nalpha() + 1] print(HOMO, LUMO, scf_e) >-0.006507529999155065 -0.006506586740442874 -232.26253075556204

Yah, easy… But the value is quite different from some literatures. Hmm Am I wrong ? Maybe yes…

Any advice and suggestions will be greatly appreciated.

It is very easy to check log.

I am reading API and reference of Psi4 now!

!cat out.dat Memory set to 3.725 GiB by Python driver. Threads set to 4 by Python driver. Memory set to 3.725 GiB by Python driver. Threads set to 4 by Python driver. *** tstart() called on ******** *** at Fri Aug 24 22:21:33 2018 => Loading Basis Set <= Name: CC-PVDZ Role: ORBITAL Keyword: BASIS ---------snip------------ There are an even number of electrons - assuming singlet. Specify the multiplicity in the molecule input block. --------------------------------------------------------- SCF by Justin Turney, Rob Parrish, Andy Simmonett and Daniel Smith RKS Reference 4 Threads, 3814 MiB Core --------------------------------------------------------- ==> Geometry <== Molecular point group: c1 Full point group: C1 Geometry (in Angstrom), charge = 0, multiplicity = 1: Center X Y Z Mass ------------ ----------------- ----------------- ----------------- ----------------- C 0.806497703012 -1.143092245995 0.014915282772 12.000000000000 C 1.393280339340 0.126835003153 -0.002136692164 12.000000000000 C 0.586782120312 1.269928194219 -0.017051393298 12.000000000000 C -0.806497629817 1.143092609376 -0.014913861108 12.000000000000 C -1.393280087652 -0.126835526704 0.002137132311 12.000000000000 C -0.586782336814 -1.269927969465 0.017053614836 12.000000000000 H 1.430428517155 -2.027419681898 0.026439293008 1.007825032070 H 2.471161063731 0.224955697903 -0.003795091078 1.007825032070 H 1.040731718362 2.252381167101 -0.030251698606 1.007825032070 H -1.430425978428 2.027420563468 -0.026458153907 1.007825032070 H -2.471160894666 -0.224959372018 0.003781521713 1.007825032070 H -1.040735716617 -2.252379143548 0.030235509138 1.007825032070 Running in c1 symmetry. Rotational constants: A = 0.18924 B = 0.18924 C = 0.09462 [cm^-1] Rotational constants: A = 5673.32470 B = 5673.32274 C = 2836.66186 [MHz] Nuclear repulsion = 203.019333245138824 Charge = 0 Multiplicity = 1 Electrons = 42 Nalpha = 21 Nbeta = 21 ==> Algorithm <== SCF Algorithm Type is DF. DIIS enabled. MOM disabled. Fractional occupation disabled. Guess Type is SAD. Energy threshold = 1.00e-06 Density threshold = 1.00e-06 Integral threshold = 0.00e+00 ==> Primary Basis <== Basis Set: CC-PVDZ Blend: CC-PVDZ Number of shells: 54 Number of basis function: 114 Number of Cartesian functions: 120 Spherical Harmonics?: true Max angular momentum: 2 ==> DFT Potential <== => Composite Functional: B3LYP <= B3LYP Hyb-GGA Exchange-Correlation Functional P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, J. Phys. Chem. 98, 11623 (1994) Deriv = 1 GGA = TRUE Meta = FALSE Exchange Hybrid = TRUE MP2 Hybrid = FALSE => Exchange Functionals <= 0.0800 Slater exchange 0.7200 Becke 88 => Exact (HF) Exchange <= 0.2000 HF => Correlation Functionals <= 0.1900 Vosko, Wilk & Nusair (VWN5_RPA) 0.8100 Lee, Yang & Parr => Molecular Quadrature <= Radial Scheme = TREUTLER Pruning Scheme = FLAT Nuclear Scheme = TREUTLER BS radius alpha = 1 Pruning alpha = 1 Radial Points = 75 Spherical Points = 302 Total Points = 266204 Total Blocks = 2062 Max Points = 256 Max Functions = 114 => Loading Basis Set <= Name: (CC-PVDZ AUX) Role: JKFIT Keyword: DF_BASIS_SCF -----snip-------- ==> Pre-Iterations <== ------------------------------------------------------- Irrep Nso Nmo Nalpha Nbeta Ndocc Nsocc ------------------------------------------------------- A 114 114 0 0 0 0 ------------------------------------------------------- Total 114 114 21 21 21 0 ------------------------------------------------------- ==> Integral Setup <== DFHelper Memory: AOs need 0.070 [GiB]; user supplied 2.794 [GiB]. Using in-core AOs. ==> MemDFJK: Density-Fitted J/K Matrices <== J tasked: Yes K tasked: Yes wK tasked: No OpenMP threads: 4 Memory (MB): 2861 Algorithm: Core Schwarz Cutoff: 1E-12 Mask sparsity (%): 0.3693 Fitting Condition: 1E-12 => Auxiliary Basis Set <= Basis Set: (CC-PVDZ AUX) Blend: CC-PVDZ-JKFIT Number of shells: 198 Number of basis function: 558 Number of Cartesian functions: 636 Spherical Harmonics?: true Max angular momentum: 3 Minimum eigenvalue in the overlap matrix is 3.7184246400E-04. Using Symmetric Orthogonalization. SCF Guess: Superposition of Atomic Densities via on-the-fly atomic UHF. ==> Iterations <== Total Energy Delta E RMS |[F,P]| @DF-RKS iter 0: -232.98648910745848 -2.32986e+02 7.09616e-02 @DF-RKS iter 1: -232.10265029904261 8.83839e-01 8.93885e-03 @DF-RKS iter 2: -232.08812748497797 1.45228e-02 9.76151e-03 DIIS @DF-RKS iter 3: -232.26148987002321 -1.73362e-01 7.73194e-04 DIIS @DF-RKS iter 4: -232.26244676165251 -9.56892e-04 2.20816e-04 DIIS @DF-RKS iter 5: -232.26249884952190 -5.20879e-05 1.40071e-04 DIIS @DF-RKS iter 6: -232.26253054068945 -3.16912e-05 1.12245e-05 DIIS @DF-RKS iter 7: -232.26253075556204 -2.14873e-07 5.66817e-07 DIIS ==> Post-Iterations <== Orbital Energies [Eh] --------------------- Doubly Occupied: 1A -10.190570 2A -10.190360 3A -10.190358 4A -10.189881 5A -10.189879 6A -10.189662 7A -0.851912 8A -0.745431 9A -0.745426 10A -0.602585 11A -0.602585 12A -0.521876 13A -0.463024 14A -0.444275 15A -0.421161 16A -0.421155 17A -0.365362 18A -0.344264 19A -0.344262 20A -0.252907 21A -0.252901 Virtual: 22A -0.006508 23A -0.006507 24A 0.059658 25A 0.099450 26A 0.099455 27A 0.133761 28A 0.133763 29A 0.150228 30A 0.153828 31A 0.277076 32A 0.277078 33A 0.295654 34A 0.295658 35A 0.405112 36A 0.408146 37A 0.464412 38A 0.465356 39A 0.512461 40A 0.512461 41A 0.517423 42A 0.517433 43A 0.519852 44A 0.519872 45A 0.526750 46A 0.540745 47A 0.592922 48A 0.592924 49A 0.647783 50A 0.647787 51A 0.650079 52A 0.650080 53A 0.667968 54A 0.720591 55A 0.788173 56A 0.828353 57A 0.857688 58A 0.883160 59A 0.883161 60A 0.924558 61A 0.992121 62A 0.992123 63A 1.005962 64A 1.005979 65A 1.011551 66A 1.011617 67A 1.059133 68A 1.071623 69A 1.071625 70A 1.216802 71A 1.280227 72A 1.280249 73A 1.456674 74A 1.476443 75A 1.476447 76A 1.510781 77A 1.538598 78A 1.598094 79A 1.598098 80A 1.606419 81A 1.606454 82A 1.633651 83A 1.682793 84A 1.682805 85A 1.692380 86A 1.692389 87A 1.725701 88A 1.795112 89A 1.795115 90A 1.839139 91A 1.839180 92A 1.844376 93A 1.848382 94A 1.848384 95A 1.883194 96A 1.981552 97A 1.981560 98A 1.989989 99A 1.990002 100A 2.021138 101A 2.191006 102A 2.261459 103A 2.371976 104A 2.426825 105A 2.426828 106A 2.435277 107A 2.435291 108A 2.641052 109A 2.641053 110A 2.695891 111A 2.795915 112A 2.914318 113A 2.914347 114A 3.646453 Final Occupation by Irrep: A DOCC [ 21 ] Energy converged. @DF-RKS Final Energy: -232.26253075556204 => Energetics <= Nuclear Repulsion Energy = 203.0193332451388244 One-Electron Energy = -713.2776809379226961 Two-Electron Energy = 306.1488028199291875 DFT Exchange-Correlation Energy = -28.1529858827073518 Empirical Dispersion Energy = 0.0000000000000000 VV10 Nonlocal Energy = 0.0000000000000000 Total Energy = -232.2625307555620680 Properties will be evaluated at 0.000000, 0.000000, 0.000000 [a0] Properties computed using the SCF density matrix Nuclear Dipole Moment: [e a0] X: -0.0000 Y: -0.0000 Z: -0.0000 Electronic Dipole Moment: [e a0] X: 0.0000 Y: -0.0000 Z: 0.0000 Dipole Moment: [e a0] X: -0.0000 Y: -0.0000 Z: -0.0000 Total: 0.0000 Dipole Moment: [D] X: -0.0000 Y: -0.0000 Z: -0.0000 Total: 0.0000 *** tstop() ********l at Fri Aug 24 22:21:37 2018 Module time: user time = 13.48 seconds = 0.22 minutes system time = 0.21 seconds = 0.00 minutes total time = 4 seconds = 0.07 minutes Total time: user time = 13.48 seconds = 0.22 minutes system time = 0.21 seconds = 0.00 minutes total time = 4 seconds = 0.07 minutes

This is interesting since I get the same result. At first I thought perhaps the geometry should be optimized under more rigorous conditions but that made only a minor difference to the result.

Looking at the result with methane, the numbers are also wrong compared to literature so I went digging through the log file. The highest doubly occupied (HOMO) orbital is 5A at -0.388416 hartrees. The lowest virtual orbital (LUMO) is 6A at +0.072883 hartrees. Converting these into eV:

HOMO = -0.388416 hartrees = -10.569426 eV (literature is -12.7 eV)

LUMO = +072883 hartrees = +1.983250 eV (literature is +1.9 eV)

The HOMO (5A) corresponds to ‘scf_wfn.nalpha() – 1’ in the vector that Psi4 returns, and the LUMO (6A) corresponds to ‘scf_wfn.nalpha()’ in the same vector.

Using these lines in the code:

HOMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() – 1]

LUMO = scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha() ]

Gives me the following for your benzene example:

HOMO = -0.254908 hartrees = -6.936444 eV (literature values vary between -9 and -4 eV)

LUMO = -0.003152 hartrees = -0.085761 eV (literature values vary between -0.7 and +6 eV)

https://www.researchgate.net/figure/HOMO-and-LUMO-level-positions-and-energy-gap-in-eV-for-the-benzene-and-PTCDA_tbl1_44579330

The energy gap for benzene is around 6 eV so this falls in line with that too.

In the methane case the gap is 12.3 eV from the Psi4 calculation and literature shows that DFT for this molecule is around this figure with various basis sets.

Hi Andy,

Thank you for your comments.

Sure, I agree that my calculation is based on roughly optimized geometry and it has still room for improve.

Benzene is very simple structure for organic and medicinal chemist however there are many HOMO – LUMO values are reported… It’s interesting for me.

I would like to ask you why you use following command for HOMO/LUMO calculation.

By my understanding, scf_wfn.nalpha() returns number of Returns the number alpha electrons, so highest occupied molecule orbital should be “scf_wfn.epsilon_a_subset(‘AO’, ‘ALL’).np[scf_wfn.nalpha()]”. Or do I need to convert number of electrons to python index?

Any comments and/or advices are appreciated.

Thanks

Your understanding is correct regarding what scf_wfn.nalpha() is

supposedto give you. However, I think this is a bug in the code.Looking at the section of your logfile that deals with orbitals we see the following (some lines removed for clarity):

In other words, the HOMO is 21A, and the LUMO is 22A.

If we ask Psi4 for the value of scf_wfn.nalpha() and scf_wfn.nalpha() + 1:

So this is correct.

However, if we now ask for the energy with the commands you used:

This is incorrect, since this is the value for 22A (LUMO) and 23A.

Running a short python loop to check the values of each:

Note that the correct energy for 21 should be -0.252901 (according to the log file) but here it’s 20.

Based upon this, I think there is a bug in the way that Psi4 interfaces with numpy. I note that the numpy/Psi4 interface was changed between versions 1.1 and 1.2.

Converting the Psi4 core.Vector into a numpy array and indexing this:

We see that this is what is causing the issue, and I think it’s due to numpy numbering from zero whereas the electrons number from one. In other words the first (zero) element of the array shouldn’t have a value but does:

Thanks for your prompt reply! That’s make sense. To access the data I need to use python indexing. :-)

I will revive my code ASAP.

I still think it’s a bug in Psi4 and will report it to the group.

You have a great blog – keep up the good work!

Thank you for your comment again.

I keep my eyes open for checking psi4 community!

It is quite interesting to discuss the issue here. I am wondering whether this has been fixed or not? I mean if we stick to your code, whether the energy difference between LUMO and HOMO is similar to the reference?