Structure generation from query molecule using rdkit

I’m thinking about making app for auto structure generator that generate compounds from query_mol and fragment list.

Query mol is the molecule that will be changed and fragment list is the fragment molecule stocker.
I tried to code simple test script.
Following code is not smart I’ll refine ASAP.
(It’s difficult to handle large dataset.)

My strategy is that reaction definition using “Xe” as bond formation point.
And analyse molecule using FragmentOnBRICSBonds function of rdkit.

from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from rdkit.Chem import Descriptors
import numpy as np
from rdkit.Chem.Draw import IPythonConsole

rxn = AllChem.ReactionFromSmarts('[*:1]-[Xe].[*:2]-[Xe]>>[*:1]-[*:2]')
#Tc calculator. Tempolary function. I'll planned to use Reduced Graph Fingerprint.
def calc_tanimoto(m1,m2):
    fp1 = AllChem.GetMorganFingerprintAsBitVect( m1,2 )
    fp2 = AllChem.GetMorganFingerprintAsBitVect( m2,2 )
    tc = DataStructs.TanimotoSimilarity( fp1, fp2 )
    return tc

#fragment mol using BRICS and return smiles list
def fragmenter( mol ):
    fgs = AllChem.FragmentOnBRICSBonds( mol )
    fgs_smi = Chem.MolToSmiles( fgs ).replace( "*", "Xe" ).split( "." )
    return fgs_smi

# check structure of start molecule
def check_querymol( fgs_smi ):
    res = [ smi.count("Xe") for smi in fgs_smi ]
    return res 

# generate fragment dictionary for design.
def gen_frag_dict( mol_list ):
    frag_dict = {}
    for mol in mol_list:
        fgs = fragmenter( mol )
        qmol = check_querymol( fgs )
        for i, j in enumerate( qmol ):
            if j in frag_dict.keys():
                frag_dict[ j ].add( str(fgs[i]) )
                frag_dict[ j ] = set( [str(fgs[i])] )
    keys = frag_dict.keys()
    for key in keys:
        frag_dict[ key ] = list( frag_dict[key] )
    return frag_dict

# generate molecules
def struct_gen( query_mol, frag_dict ):
    q_frgs = fragmenter( query_mol )
    # get query molecule's infromation.
    q_des = check_querymol( q_frgs )
    q_des.sort( reverse=True )
    # select starting point as random
    print( frag_dict[ q_des[0] ] )
    ps =  frag_dict[ q_des[0] ][ np.random.randint( len( frag_dict[ q_des[0] ] ) ) ] 
    ps = [ Chem.MolFromSmiles( ps ) ]
    for i in range( 1,len( q_des ) ):
        print( str(i)+" STEP" )
        #print(frag_dict[ q_des[i] ])
        ps = AllChem.EnumerateLibraryFromReaction( rxn, (ps, [Chem.MolFromSmiles(smi) for smi in frag_dict[ q_des[i] ]] ) )
        res = set()
        for p in ps:
                m = p[0]
                s = Chem.MolToSmiles(m)
                res.add( s )
        ps = [ Chem.MolFromSmiles( smi ) for smi in res ][:20]
    ps = [ mol for mol in ps if calc_tanimoto(query_mol,mol) <= 0.6 and Descriptors.MolWt(mol) <= 500 ]
    return ps

Then I tested the code.

from rdkit.Chem import Draw
mol = Chem.MolFromSmiles('CCCS(=O)(=O)Nc1ccc(F)c(c1F)C(=O)c2c[nH]c3c2cc(cn3)c4ccc(Cl)cc4')
mol2 = Chem.MolFromSmiles('Cc1ccccc1c1ccncc1')
mol3 = Chem.MolFromSmiles('CN1CCN(CC2=CC=C(C=C2)C(=O)NC2=CC(NC3=NC=CC(=N3)C3=CN=CC=C3)=C(C)C=C2)CC1')
a = gen_frag_dict([mol,mol2,mol3])
p = struct_gen(mol,a)

Now I got following image.

Test code seems working well.
Next step, I’ll implement some functions.

1) Random selection of fragments. ( Current code select molecule using indexing it’s not good for diversity.)
2) Fragment database builder instead of dictionary.
3) Suitable filter and optimiser for novel molecule generation.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

4 thoughts on “Structure generation from query molecule using rdkit

  1. Hi Pen,

    Very impressive blog although it was almost 4 years ago. I wonder if you have new updates for this types of molecule generator, such as handling larger fragment library, or using new functions in rdkit etc.


    1. Hi,

      I see! Sounds great. I will give a try from my end with BRICS. BTW, I enjoy and benefit a lot from your blog. Thank you for impressive work!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: