Try to use exmol to explain why the model predicts it #chemoinfomratics #RDKit #exmol

One of the difficult point of ML predictive model for chemoinformatics task is explainability of the model, why the model predicts these molecules the class. Especially if we use non liner model such as SVM, RF, NN, the problem is very important to have discussion with chemists because chemists would like to know that why we should to synthesis them.

There are many approaches are developed to tackle the problem. @rkakamilan wrote really informative blog post about explainable models. The url is below. The post is written in Japanese but has code with document. So I think it worth to read it ;)

However, these approach is depends on type of predictive models which we use, so it’s not model agnostic. Recently I found really cool package named ‘exmol‘ which uses model agnostic explanations to help users understand why a molecule is predicted to have a property. It sounds cool. I tried to use it on my env.

At first I installed exmol with pip command. The package will install rdkit with rdkit-pypi but I would like to use my env rdkit (from conda) so I removed ‘rdkit-pypi’ from and installed exmol.

$ gh repo clone ur-whitelab/exmol
$ cd exmol
# modify
$ pip install -e .

Now ready to use exmol. I used Randomforest classifier for my test. Some examples are provided in original repository as jupyter-notebook so if reader has interest exmol, I recommend to visit original repository.

Following example, I used solubility data. And uploaded my code on my gist. Interesting points of exmol are…

first, it generates similar molecules with STONED so exmol can generates molecules very fast witout GPUs,

second, it can generate counterfactual molecules and maps the point so user can understand easily why the model predict so,

third, it can sample molecules and project them to chemical space visualization will be help for analysis.

Here is an my example, most of code is borrowed from original repo. I used left molecule as an input(low solubility) and three molecules are counterfactual molecules(medium or high solubility). Three sampled molecules have hetro atoms and/or rotatable bonds so the prediction seems reasonable.

result from sampling
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

plot_space function is also useful, the function plots chemical space of sample molecules with ECPF(morgan FP) and add information with DBSCAN and return image as high quality SVG format like below.

It’s cool!!!!! Now we can many useful open source packages and can get useful information from not only publication but also SNS. I really appreciate developers effort. If reader has interest exmol let’s install and use it!


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: