I read interesting article from JCIM.
Dissecting Machine-Learning Prediction of Molecular Activity: Is an
Applicability Domain Needed for Quantitative Structure−Activity
Relationship Models Based on Deep Neural Networks?
URL is below.
The pros of DNN is feature extraction. And there are many articles which use DNN for molecular activity prediction. BTW, is it true that DNN is outperform any other machine learning methods?
The authors of the article analyzed the performance of DNN. They used ECFP4 as an input feature and predicted biological activities extracted from CHEMBL DB.
Their approach was reasonable, they built model with training set and check the performance with test data and evaluate RMSE in several layers which are defined by molecular similarity. Layer1 means that dataset is similar to training data and Layer6 means that dataset is not similar to training set.
They analyzed performance of predictive method such as KNN, RF and DNN and their analysis revealed that DNN showed similar performance with RF and KNN. And also Fig 5 shows that DNN can not predict objective value when query molecule is not similar to training set.
It indicates that DNN does not learn feature of molecule from finger print but learned pattern of fingerprint.
More details are described in this article.
In the real drug discovery project, MedChem sometime designs not seed similar compounds. For the chemist, this is not so special. But it is difficult point for AI to learn sense of MedChem.
Biological activity prediction is challenging area I think, And ECFP is still de fact standard for chemoinformatics. I would like to develop new concept of molecular descriptors.