Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics

There are lots of Graph convolutional network(GCN) models are applied for QSAR tasks instead of traditional descriptor based model. The interesting point of GCN is that we don’t need feature engineering I think. It means that during the learning process, GCN learns molecular feature from given molecular graph. On the other side, descriptor based model learns compound properties from compound descriptors or fingerprint such as MW, logP, NumRotBond, ECFP4 etc….

And many publications describe that GCN works better than descriptor based models. I’m my experience, GCN is very useful because of its flexibility but not often outperform than other models.

Yesterday, I read interesting article in Journal of chemoinformatics. The title is Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models.

The journal is open access!!! You can read the article following URL.

The authors compared descriptor based and graph based model with 11 public data set.

Compared models are XGBoost, RF, SVM and GCN, GAT, AttentiveFP, MPNN. MOE’s molecular descriptors are used for training.

It’s worth to know that they did 50 independent training and validation with each model to avoid random split data biases. Some time the model performance is affected by random seed.

And their work is summarized in Table6 – 9.

In many cases descriptor based model works better than graph based model and also computational cost of these models are low compared to graph based approach. In case of multi-task or task which has large data set (over 1000 data points), graph-based model outperformed.

In the early stage of drug discovery project, chemoinformatician often should build model from few data points I think so regarding the article, descriptor based model should be used at first trial.

I think graph based model is still attractive and useful approach. But it’s not always effective for molecular property prediction.

Which model should we use is always difficult and important problem ;)

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

2 thoughts on “Which is better Graph based or descriptor based model for QSAR prediction? #journal #memo #chemoinformatics

  1. Hello Taka, I’ve adding recently AttentivFP in Pytorch Geometric python package. coming soon Dimenet++


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: