In RDKit UGM 2019, I had interest about QCArchive. QCArchive is MolSSI quantum chemistry archive. It provides useful data and python packages.
By using one package named qcportal, we can access huge data source of quantum chemistry. It is very useful because QC calculation is useful but it requires computational cost. QC data is useful for drug design and machine learning (i.e. building machine learning based force field etc…..).
I used the package. At first I installed qcportal via conda in my env. It isn’t good choice because I couldn’t install new version of the package. Old version of qcportal causes error. So I installed via pip. It worked fine.
Following code is almost same as original document. But I tried it for my memorandum. At first import packages and make client object. I used datasource from MolSSI.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole import qcportal as ptl client = ptl.FractalClient()
Then checked the list of torsion drive dataset. There are many dataset is available.
client.list_collections("TorsionDriveDataset") >Fragment Stability Benchmark None >OpenFF Fragmenter Phenyl Benchmark Phenyl substituent torsional barrier heights. >OpenFF Full TorsionDrive Benchmark 1 None >OpenFF Group1 Torsions None >OpenFF Primary TorsionDrive Benchmark 1 None >OpenFF Substituted Phenyl Set 1 None >Pfizer Discrepancy Torsion Dataset 1 None >SMIRNOFF Coverage Torsion Set 1 None >TorsionDrive Paper None ds = client.get_collection("TorsionDriveDataset", "OpenFF Fragmenter Phenyl Benchmark") ds.df.head() >c1c[cH:1][c:2](cc1)[C:3](=[O:4])O >c1[cH:1][c:2](cnc1)[C:3](=[O:4])O >[cH:1]1cncc[c:2]1[C:3](=[O:4])O >[cH:1]1cc(nc[c:2]1[C:3](=[O:4])O)[O-] >Cc1c[cH:1][c:2](cn1)[C:3](=[O:4])O
OK I succeeded to loading data. Let’s visualize some completed dataset. RDKit is very useful package for drawing molecules!!!!!
complete_data = ds.status(["b3lyp-d3"], collapse=False, status="COMPLETE") Draw.MolsToGridImage([Chem.MolFromSmiles(complete_data['B3LYP-D3'].index[i]) for i in range(10)], molsPerRow=5)
Finally visualize torsion energy!
ds.visualize([complete_data['B3LYP-D3'].index[i] for i in range(10)],"B3LYP-D3", units="kJ / mol")
Purple line (4th structure) has highest torsion energy at -90, 90 degree.
The molecule is 5-Hydroxynicotinic acid. Hydroxyl group is located para-positon of carboxylic group. So conjugation effect to make relative energy higher than other structures.
The package is useful for not only data source of QC but also visualization and analysis of molecules.
I uploaded today’s code on my gist.