Get bit information with RDKit

RDKit is updated constantly not only function but also document.
Today I read “Getting started with python” and found interesting code.
Now I can get bit information as smiles strings. Sample code is following.

FindAtomEnvironmentOfRadiusN finds the bonds within a certain radius of an atom in a molecule.
PathToSubmol returns molecule around environment.

from rdkit import Chem
from rdkit.Chem import AllChem
from collections import defaultdict
# define a function that return bit information as smiles.
def circularfrag( mol, radius=2,nBits = 1024, useFeatures = True ):
    bitInfo = {}
    fragmol = defaultdict( list )
    fp = AllChem.GetMorganFingerprintAsBitVect( mol,
                                                                                     useFeatures = useFeatures
    for bit, info in bitInfo.items():
        for atmidx, rad in info:
            env = Chem.FindAtomEnvironmentOfRadiusN( mol, atmidx, rad )
            amap = {}
            submol = Chem.PathToSubmol( mol, env, atomMap = amap  )
            smi = Chem.MolToSmiles( submol )
            if  smi != '':
                fragmol[ bit ].append( smi )
    return fragmol

Test the code. I used Sitagliptin as an example.

mol = Chem.MolFromSmiles( "Fc1cc(c(F)cc1F)C[C@@H](N)CC(=O)N3Cc2nnc(n2CC3)C(F)(F)F" )

frag is dictionary with bit as key.

Out [ ]:

            {0: ['CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F',
             2: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             4: ['cF',
             8: ['[CH]Cc1cc(F)c(F)cc1F', 'NC(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             19: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             152: ['cc(C)cc(c)F'],
             349: ['[CH]Cc1cc(F)c(F)cc1F',
             428: ['Cc1ccc(F)c(F)c1'],
             495: ['NC(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             615: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             671: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             699: ['CC(N)Cc1cc(F)c(F)cc1F', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             708: ['CC(N)Cc1cc(F)c(F)cc1F'],
             713: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             758: ['ccc', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             792: ['Cc1nnc2CN(CCn12)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             801: ['[CH]Cc1cc(F)c(F)cc1F'],
             804: ['cCN(CC)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             901: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F'],
             970: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'],
             993: ['ccc(F)c(c)F', 'CCC(N)Cc1cc(F)c(F)cc1F'],
             1005: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F']})


