Can machine learn important feature from SMILES?

Today I found challenging article in arxiv.
It describes about SMILES2Vec. https://arxiv.org/pdf/1712.02034.pdf
You know word2vec is very attractive and major application for ML area and SMILES2Vec has same concept.
It converts smiles to vector and learn which character is important. The author use “black box” models for building model. I am not sure about “black box” model but I think it likes leave one out. The method masks some features, builds model and finds important features.

To use the method, SMILES2Vec can find important characters in the given smiles.
They found CNN-GRU model gives best result for solubility prediction. My question is … Why convolution of SMILES work fine???
My opinion is that solubility or logP depends on the presence or absence of substituents such as hydroxyl or amino groups, they do not strongly depend on the position some case. So I think the method is interesting but difficult to predict biological affinity.

SMILES strings is major input format for deep learning area. Also I often use SMILES. ;-) But I want to find another format for DL.

ref for black box model
https://arxiv.org/pdf/1602.07043.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.