Recently Graph based predictive model and generative model are attractive topic in chemoinformatics area. Because, Graph based model is not need learn grammar such like a SMILES based model. It seems more primitive representation of molecule. Of course to use Graph based model, used need to convert molecule to graph object.
Pytorch_geometric(PyG) and Deep Graph Library(DGL) are very useful package for graph based deep learning. Today, I got comment about my post from DGL developer. It is honor to me for getting a comment. And I could know that new version of DGL supports many methods in chemistry. It’s awesome work isn’t it!!!!
I try to use it. If reader can read Japanese (or can use translation module), there is a nice article. URL is below.
This post describes Junction Tree VAE.
So today, I used different model of DGL.
Following example is molecular generation with DGMG.
Fortunately, DGL provides pre-trained model. So user can use generative model without train model by yourself. Let’s start!!
At first import several packages and define splitsmi function. Because generative model sometime generates SMILES which has ‘.’ . So I would like to retrieve the largest strings from generated SMILES.
from rdkit import Chem from rdkit.Chem import QED from dgl.model_zoo.chem import load_pretrained from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import Draw import os import math import numpy as np mpy as np def splitsmi(smiles): smiles_list = smiles.split('.') length = [len(s) for s in smiles_list] return smiles_list[np.argmax(length)]
Then load pre trained model. It is very easy. Just call load_pretrained function! Following code load two models, one is trained with ChEMBL and the other is trained with ZINC. I picked up 30 molecules which are Sanitizable and QED is over 0.6.
chembl_model = load_pretrained('DGMG_ChEMBL_canonical') chembl_model.eval() chembl_mols =  chembl_qeds =  while len(chembl_mols) 0.6: chembl_mols.append(mol) chembl_qeds.append(str(np.round(qed, 2))) except: pass Draw.MolsToGridImage(chembl_mols, legends=chembl_qeds, molsPerRow=5)
zinc_model = load_pretrained(‘DGMG_ZINC_canonical’)
[/sourzinc_mols = 
zinc_qeds = 
while len(zinc_mols) 0.6:
Draw.MolsToGridImage(zinc_mols, legends=zinc_qeds, molsPerRow=5)cecode]
Generated molecules are….
Generated molecules are diverse, but not so undruggable structure I think.
Also user can build your own generative model from your own data set.
Recently we can access lots of information from many sources twitter, github, arxiv, blog and etc…
Many codes are freely available. It’s worth because you can evaluate the code if you want. And you can have chance for new findings.
I really respect all developer and feel I have to learn more and more…
Any way DGL is very useful package for chemoinformatian who has interest to Graph based DL I think. ;-)
Today’s code can check from following URL.