Quantum Chemistry data of drug bank #QCportal #Quantum_Chemistry

I’m still learning QCArchive. I posted qcportal with reaction dataset. And today I tried to retrieve of drug bank from qcportal. QCportal provides not only calculated numeric data but also 3D mol view by using py3Dmol.

OK let’s go to code. get_molecule method provides many data from qcportal web server.

import qcportal as ptl
client = ptl.FractalClient()
ds = client.get_collection("Dataset", "COMP6 DrugBank")
mols = ds.get_molecules()
mols.shape
> (13379, 1)

What kinds of data in the dataset? It is easy to do it, just call some methods.

ds.list_values().reset_index()['method'].unique()
> array(['ωB97x', 'b3lyp', 'b3lyp-d3m(bj)', 'hf', 'pbe', 'pbe-d3m(bj)',
       'svwn', 'wb97m', 'wb97m-d3(bj)'], dtype=object)
ds.list_values().reset_index()['basis'].unique()
> array(['6-31g*', 'def2-tzvp'], dtype=object)

ds.list_values()

This dataset has not only data from psi4 but also gaussian!.

I got data from method=’wB97x’

val = ds.get_values(method='ωB97x')
val.columns
> Index(['CM5 Charges', 'Hirshfeld Charges', 'Energy', 'Gradient',
       'Hirshfeld Dipole', 'Spin Density'],
      dtype='object')

I got energy from the data and visualize molecules.

energy = val['Energy']
mols['molecule'][0].show()
energy[0]
> -636107.9519541461

Py3Dmol works very well. I could get QC energy of molecule in drug bank and could render molecule as 3D object.

It is very cool!

My whole code is uploaded following URL.

Have a nice week end! ;)

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/drug_bank.ipynb

Open data source of Quantum chemistry! #qcportal #rdkit #cheminformatics #quantum_chemisry

In RDKit UGM 2019, I had interest about QCArchive. QCArchive is MolSSI quantum chemistry archive. It provides useful data and python packages.

By using one package named qcportal, we can access huge data source of quantum chemistry. It is very useful because QC calculation is useful but it requires computational cost. QC data is useful for drug design and machine learning (i.e. building machine learning based force field etc…..).

I used the package. At first I installed qcportal via conda in my env. It isn’t good choice because I couldn’t install new version of the package. Old version of qcportal causes error. So I installed via pip. It worked fine.

Following code is almost same as original document. But I tried it for my memorandum. At first import packages and make client object. I used datasource from MolSSI.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
import qcportal as ptl
client = ptl.FractalClient()

Then checked the list of torsion drive dataset. There are many dataset is available.

client.list_collections("TorsionDriveDataset")

>Fragment Stability Benchmark	None
>OpenFF Fragmenter Phenyl Benchmark	Phenyl substituent torsional barrier heights.
>OpenFF Full TorsionDrive Benchmark 1	None
>OpenFF Group1 Torsions	None
>OpenFF Primary TorsionDrive Benchmark 1	None
>OpenFF Substituted Phenyl Set 1	None
>Pfizer Discrepancy Torsion Dataset 1	None
>SMIRNOFF Coverage Torsion Set 1	None
>TorsionDrive Paper	None

ds = client.get_collection("TorsionDriveDataset", "OpenFF Fragmenter Phenyl Benchmark")
ds.df.head()

>c1c[cH:1][c:2](cc1)[C:3](=[O:4])O
>c1[cH:1][c:2](cnc1)[C:3](=[O:4])O
>[cH:1]1cncc[c:2]1[C:3](=[O:4])O
>[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-]
>Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O

OK I succeeded to loading data. Let’s visualize some completed dataset. RDKit is very useful package for drawing molecules!!!!!

complete_data = ds.status(["b3lyp-d3"], collapse=False, status="COMPLETE")
Draw.MolsToGridImage([Chem.MolFromSmiles(complete_data['B3LYP-D3'].index[i]) for i in range(10)],
                    molsPerRow=5)

Finally visualize torsion energy!

ds.visualize([complete_data['B3LYP-D3'].index[i] for i in range(10)],"B3LYP-D3", units="kJ / mol")

Purple line (4th structure) has highest torsion energy at -90, 90 degree.
The molecule is 5-Hydroxynicotinic acid. Hydroxyl group is located para-positon of carboxylic group. So conjugation effect to make relative energy higher than other structures.

The package is useful for not only data source of QC but also visualization and analysis of molecules.

I uploaded today’s code on my gist.