Build accurate model with small training data and quantum chemistry #memo #from_ChemRxiv

Recently I read the nice article from ChemRxiv.
Here is the link ;)

The title is ‘Machine Learning Meets Mechanistic Modelling for Accurate Prediction of Experimental Activation Energies’.

I don’t have experience there area but I found and read publications which use Mechanistic DFT. The author mentioned that DFT based approach has difficulties to calculate reaction such as ionic reactions in solution. On the other side, machine learning approach can solve the issue (soluvation) however many training data is required to build accurate model.

So the article proposed hybrid models, it means mechanistic DFT and QSPR combination approach. The article focused on SNAr reaction which is one of the popular reaction in pharma.

To build the predictive model, they used reaction SMILES as input then optimize ground state and transition state and calculate reaction features. Details are shown in Fig. 5.

They compared performance of several models and found that Gaussian process regressor with Full descriptor works very well. Also the model which is trained with descriptors without transition state(TS) data works well. It is interesting for me because I thought TS is key of reaction prediction but difficult to guess the TS.

In the research they revealed the key feature of the SNAr reaction from noTS features.

I think Fig. 8(a) shows the power of the proposed approach. This figure shows learning curve giving the MAE as a function of number of reactions in the training set. MAE of hybrid models lower than chemical accuracy even if the training samples are small (-150). Deep learning based model(BERT) also worked well if more data set is available (350-).

The article not only describes the model performance but also describes AD(applicability domain) and showed that the model has good accuracy to not only interpolate but also extrapolate.

I could learn lots from the publication. It really helpful for me.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: