Create matched molecular series with RDKit.

New version of rdkit implemented new function about MMP named rdMMPA.
The class has FragmentMol function that returns fragments for MMP.
The function can set max number of Cut, and also can set cutting rules.
That’s means rdkit provide flexibility to chemo-informatician.
Now I’m trying to develop web app about mmpa and mmps.
So, I tested new function.
Following code was written in python3.( Dictionary object dose not have has_key method.)

from rdkit import Chem
from rdkit.Chem import rdMMPA
# I used sample files about cdk2.sdf in RDKit
mols = [ mol for mol in Chem.SDMolSupplier("cdk2.sdf") ]
# generate Fragment list using rdMMPA module.
# Condition 1) single cut, 2) get results as smiles.
fragmentlist = [ rdMMPA.FragmentMol( mol, maxCuts=1, resultsAsMols=False ) for mol in mols ]

Now I got fragmentlist.
Check it.

In [43]:fragmentlist[1]
Out[43]:(('', 'C1COC(C1)CO[*:1].Nc1nc(c2nc[nH]c2n1)[*:1]'),
 ('', 'N[*:1].c1nc2c(nc(nc2[nH]1)[*:1])OCC1CCCO1'),
 ('', 'C1COC(C1)C[*:1].Nc1nc(O[*:1])c2nc[nH]c2n1'),
 ('', 'C1COC(C1)[*:1].Nc1nc(OC[*:1])c2nc[nH]c2n1'))

OK!

Next, I made MMS(?) as python dictionary object.

fragdict = dict()
for fragments in fragmentlist:
    for fragment in fragments:
        core = fragment[1].split('.')[1]
        chain =  fragment[1].split('.')[0]
        if core in fragdict:
            fragdict[core].append( chain )
        else:
            fragdict.setdefault( core, [ chain ] )        

Results was….

In[70]:
# print result that has more than 3 fragments.
for k,v in fragdict.items():
    if len(v) >= 3:
        print(k, v)
        print( "="*20 )


OC[*:1] ['CC(C)C(Nc1nc(Nc2ccc(C(=O)[O-])c(Cl)c2)c2ncn(c2n1)C(C)C)[*:1]', 'CC(C)C(Nc1nc(Nc2cccc(Cl)c2)c2ncn(c2n1)C(C)C)[*:1]', 'CCC(Nc1nc(NCc2ccccc2)c2ncn(c2n1)C(C)C)[*:1]', 'COc1ccc(cc1)CNc1nc(nc2c1ncn2C(C)C)N(CCO)C[*:1]', 'Cn1cnc2c(nc(nc21)NC[*:1])NCc1ccccc1']
====================
Nc1nc(OC[*:1])c2nc[nH]c2n1 ['C1=CCC(CC1)[*:1]', 'C1CCC(CC1)[*:1]', 'C1COC(C1)[*:1]', 'CC(C)C(=O)[*:1]']
====================
c1ccc(cc1)C[*:1] ['CCC(CO)Nc1nc(N[*:1])c2ncn(c2n1)C(C)C', 'Cn1cnc2c(nc(nc21)NCCO)N[*:1]', 'O=C(c1ccccc1)c1cnc2n[nH]cc2c1O[*:1]', '[NH3+]C1CCC(CC1)Nc1nc(N[*:1])c2ncn(c2n1)C1CCCC1']
====================
C[*:1] ['CC(C(=O)COc1nc(N)nc2[nH]cnc12)[*:1]', 'CC(C)C(=O)Nc1ncc(SCc2ncc(C[*:1])o2)s1', 'CC(C)C(CO)Nc1nc(Nc2ccc(C(=O)[O-])c(Cl)c2)c2ncn(c2n1)C(C)[*:1]', 'CC(C)C(CO)Nc1nc(Nc2cccc(Cl)c2)c2ncn(c2n1)C(C)[*:1]', 'CC(C)n1cnc2c(nc(nc21)N(CCO)CCO)NCc1ccc(cc1)O[*:1]', 'CC(C)n1cnc2c(nc(nc21)NC(CO)C(C)[*:1])Nc1ccc(C(=O)[O-])c(Cl)c1', 'CC(C)n1cnc2c(nc(nc21)NC(CO)C(C)[*:1])Nc1cccc(Cl)c1', 'CC(C)n1cnc2c(nc(nc21)NC(CO)C[*:1])NCc1ccccc1', 'CCC(CO)Nc1nc(NCc2ccccc2)c2ncn(c2n1)C(C)[*:1]', 'CCc1cnc(CSc2cnc(NC(=O)C(C)[*:1])s2)o1', 'CN(C)NC(=O)Nc1cccc2-c3n[nH]c(-c4ccc(cc4)O[*:1])c3C(=O)c12', 'COc1ccc(cc1)-c1[nH]nc2-c3cccc(NC(=O)NN(C)[*:1])c3C(=O)c21', 'COc1ccc(cc1)CNc1nc(nc2c1ncn2C(C)[*:1])N(CCO)CCO']
====================
Cl[*:1] ['CC(C)C(CO)Nc1nc(Nc2ccc(C(=O)[O-])c(c2)[*:1])c2ncn(c2n1)C(C)C', 'CC(C)C(CO)Nc1nc(Nc2cccc(c2)[*:1])c2ncn(c2n1)C(C)C', 'C[NH+]1CCC(c2c(O)cc(O)c3c(=O)cc(oc23)-c2ccccc2[*:1])C(O)C1']
====================
c1ccc(cc1)[*:1] ['CCC(CO)Nc1nc(NC[*:1])c2ncn(c2n1)C(C)C', 'Cc1nc2ccccn2c1-c1ccnc(n1)N[*:1]', 'Cn1cnc2c(nc(nc21)NCCO)NC[*:1]', 'O=C(c1ccccc1)c1cnc2n[nH]cc2c1OC[*:1]', 'O=C(c1cnc2n[nH]cc2c1OCc1ccccc1)[*:1]', '[NH3+]C1CCC(CC1)Nc1nc(NC[*:1])c2ncn(c2n1)C1CCCC1']
====================
c1ccc(cc1)CN[*:1] ['CCC(CO)Nc1nc(c2ncn(c2n1)C(C)C)[*:1]', 'Cn1cnc2c(nc(nc21)NCCO)[*:1]', '[NH3+]C1CCC(CC1)Nc1nc(c2ncn(c2n1)C1CCCC1)[*:1]']
====================
F[*:1] ['CCCCOc1c(cnc2[nH]ncc12)C(=O)c1c(F)cc(Br)cc1[*:1]', 'COc1cc(-c2ccc[nH]2)c2C(=O)Nc3ccc(c1c32)[*:1]', 'Cc1ccc(c(c1)Nc1ccnc(n1)Nc1ccc(cc1)S(N)(=O)=O)[*:1]']
====================
N[*:1] ['C1=CCC(CC1)COc1nc(nc2[nH]cnc12)[*:1]', 'CC(C)C(=O)COc1nc(nc2[nH]cnc12)[*:1]', 'NC(=O)c1ccc(cc1)Nc1nc(OCC2CCCCC2)c(N=O)c(n1)[*:1]']
====================
O=[N+]([O-])[*:1] ['COc1cc[nH]c1/C=C1\\C(=O)Nc2ccc(c(c21)N1CCCC(C1)C(N)=O)[*:1]', 'COc1cc[nH]c1/C=C1\\C(=O)Nc2ccc(cc21)[*:1]', 'NS(=O)(=O)c1ccc(cc1)Nc1cc([nH]n1)-c1ccc(cc1)[*:1]']
====================

It seems work fine.
This is very simple example, I’ll make mms with assay data, and assist medchem data analysis more easily.

ref…
https://www.nextmovesoftware.com/matsy.html
I think MATSY is interesting and familiar for medchem.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s