Get bit information with RDKit

RDKit is updated constantly not only function but also document.
Today I read “Getting started with python” and found interesting code.
http://www.rdkit.org/docs/GettingStartedInPython.html
Now I can get bit information as smiles strings. Sample code is following.

FindAtomEnvironmentOfRadiusN finds the bonds within a certain radius of an atom in a molecule.
PathToSubmol returns molecule around environment.

from rdkit import Chem
from rdkit.Chem import AllChem
from collections import defaultdict
# define a function that return bit information as smiles.
def circularfrag( mol, radius=2,nBits = 1024, useFeatures = True ):
    bitInfo = {}
    fragmol = defaultdict( list )
    fp = AllChem.GetMorganFingerprintAsBitVect( mol,
                                                                                     radius=radius,
                                                                                     nBits=nBits,
                                                                                     bitInfo=bitInfo,
                                                                                     useFeatures = useFeatures
                                                                                     )
    for bit, info in bitInfo.items():
        for atmidx, rad in info:
            env = Chem.FindAtomEnvironmentOfRadiusN( mol, atmidx, rad )
            amap = {}
            submol = Chem.PathToSubmol( mol, env, atomMap = amap  )
            smi = Chem.MolToSmiles( submol )
            if  smi != '':
                fragmol[ bit ].append( smi )
    return fragmol

Test the code. I used Sitagliptin as an example.

mol = Chem.MolFromSmiles( "Fc1cc(c(F)cc1F)C[C@@H](N)CC(=O)N3Cc2nnc(n2CC3)C(F)(F)F" )
frag=circularfrag(mol)

frag is dictionary with bit as key.

frag
Out [ ]:

defaultdict(list,
            {0: ['CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F',
              'cCN(CC)C(=O)CC(N)Cc1cc(F)c(F)cc1F',
              'cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F',
              'Cc1nnc2CN(CCn12)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             2: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             4: ['cF',
              'cc(c)F',
              'ccc(F)c(c)F',
              'Cc1ccc(F)c(F)c1',
              'CC(N)Cc1cc(F)c(F)cc1F',
              'CCC(N)Cc1cc(F)c(F)cc1F'],
             8: ['[CH]Cc1cc(F)c(F)cc1F', 'NC(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             19: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             152: ['cc(C)cc(c)F'],
             349: ['[CH]Cc1cc(F)c(F)cc1F',
              'cc(c)F',
              'NC(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             428: ['Cc1ccc(F)c(F)c1'],
             495: ['NC(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             615: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             671: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             699: ['CC(N)Cc1cc(F)c(F)cc1F', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             708: ['CC(N)Cc1cc(F)c(F)cc1F'],
             713: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             758: ['ccc', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             792: ['Cc1nnc2CN(CCn12)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             801: ['[CH]Cc1cc(F)c(F)cc1F'],
             804: ['cCN(CC)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             901: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             970: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             993: ['ccc(F)c(c)F', 'CCC(N)Cc1cc(F)c(F)cc1F'],
             1005: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F']})

https://github.com/iwatobipen/ipynotebooks/blob/master/morgan_fragmentation.ipynb

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: