Structure generation from query molecule using rdkit

I’m thinking about making app for auto structure generator that generate compounds from query_mol and fragment list.

Query mol is the molecule that will be changed and fragment list is the fragment molecule stocker.
I tried to code simple test script.
Following code is not smart I’ll refine ASAP.
(It’s difficult to handle large dataset.)

My strategy is that reaction definition using “Xe” as bond formation point.
And analyse molecule using FragmentOnBRICSBonds function of rdkit.

from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from rdkit.Chem import Descriptors
import numpy as np
from rdkit.Chem.Draw import IPythonConsole

rxn = AllChem.ReactionFromSmarts('[*:1]-[Xe].[*:2]-[Xe]>>[*:1]-[*:2]')
#Tc calculator. Tempolary function. I'll planned to use Reduced Graph Fingerprint.
def calc_tanimoto(m1,m2):
    fp1 = AllChem.GetMorganFingerprintAsBitVect( m1,2 )
    fp2 = AllChem.GetMorganFingerprintAsBitVect( m2,2 )
    tc = DataStructs.TanimotoSimilarity( fp1, fp2 )
    return tc

#fragment mol using BRICS and return smiles list
def fragmenter( mol ):
    fgs = AllChem.FragmentOnBRICSBonds( mol )
    fgs_smi = Chem.MolToSmiles( fgs ).replace( "*", "Xe" ).split( "." )
    return fgs_smi

# check structure of start molecule
def check_querymol( fgs_smi ):
    res = [ smi.count("Xe") for smi in fgs_smi ]
    return res 

# generate fragment dictionary for design.
def gen_frag_dict( mol_list ):
    frag_dict = {}
    for mol in mol_list:
        fgs = fragmenter( mol )
        qmol = check_querymol( fgs )
        for i, j in enumerate( qmol ):
            if j in frag_dict.keys():
                frag_dict[ j ].add( str(fgs[i]) )
                frag_dict[ j ] = set( [str(fgs[i])] )
    keys = frag_dict.keys()
    for key in keys:
        frag_dict[ key ] = list( frag_dict[key] )
    return frag_dict

# generate molecules
def struct_gen( query_mol, frag_dict ):
    q_frgs = fragmenter( query_mol )
    # get query molecule's infromation.
    q_des = check_querymol( q_frgs )
    q_des.sort( reverse=True )
    # select starting point as random
    print( frag_dict[ q_des[0] ] )
    ps =  frag_dict[ q_des[0] ][ np.random.randint( len( frag_dict[ q_des[0] ] ) ) ] 
    ps = [ Chem.MolFromSmiles( ps ) ]
    for i in range( 1,len( q_des ) ):
        print( str(i)+" STEP" )
        #print(frag_dict[ q_des[i] ])
        ps = AllChem.EnumerateLibraryFromReaction( rxn, (ps, [Chem.MolFromSmiles(smi) for smi in frag_dict[ q_des[i] ]] ) )
        res = set()
        for p in ps:
                m = p[0]
                s = Chem.MolToSmiles(m)
                res.add( s )
        ps = [ Chem.MolFromSmiles( smi ) for smi in res ][:20]
    ps = [ mol for mol in ps if calc_tanimoto(query_mol,mol) <= 0.6 and Descriptors.MolWt(mol) <= 500 ]
    return ps

Then I tested the code.

from rdkit.Chem import Draw
mol = Chem.MolFromSmiles('CCCS(=O)(=O)Nc1ccc(F)c(c1F)C(=O)c2c[nH]c3c2cc(cn3)c4ccc(Cl)cc4')
mol2 = Chem.MolFromSmiles('Cc1ccccc1c1ccncc1')
mol3 = Chem.MolFromSmiles('CN1CCN(CC2=CC=C(C=C2)C(=O)NC2=CC(NC3=NC=CC(=N3)C3=CN=CC=C3)=C(C)C=C2)CC1')
a = gen_frag_dict([mol,mol2,mol3])
p = struct_gen(mol,a)

Now I got following image.

Test code seems working well.
Next step, I’ll implement some functions.

1) Random selection of fragments. ( Current code select molecule using indexing it’s not good for diversity.)
2) Fragment database builder instead of dictionary.
3) Suitable filter and optimiser for novel molecule generation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s