RDKit is updated constantly not only function but also document.
Today I read “Getting started with python” and found interesting code.
http://www.rdkit.org/docs/GettingStartedInPython.html
Now I can get bit information as smiles strings. Sample code is following.
FindAtomEnvironmentOfRadiusN finds the bonds within a certain radius of an atom in a molecule.
PathToSubmol returns molecule around environment.
from rdkit import Chem from rdkit.Chem import AllChem from collections import defaultdict # define a function that return bit information as smiles. def circularfrag( mol, radius=2,nBits = 1024, useFeatures = True ): bitInfo = {} fragmol = defaultdict( list ) fp = AllChem.GetMorganFingerprintAsBitVect( mol, radius=radius, nBits=nBits, bitInfo=bitInfo, useFeatures = useFeatures ) for bit, info in bitInfo.items(): for atmidx, rad in info: env = Chem.FindAtomEnvironmentOfRadiusN( mol, atmidx, rad ) amap = {} submol = Chem.PathToSubmol( mol, env, atomMap = amap ) smi = Chem.MolToSmiles( submol ) if smi != '': fragmol[ bit ].append( smi ) return fragmol
Test the code. I used Sitagliptin as an example.
mol = Chem.MolFromSmiles( "Fc1cc(c(F)cc1F)C[C@@H](N)CC(=O)N3Cc2nnc(n2CC3)C(F)(F)F" ) frag=circularfrag(mol)
frag is dictionary with bit as key.
frag Out [ ]: defaultdict(list, {0: ['CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F', 'cCN(CC)C(=O)CC(N)Cc1cc(F)c(F)cc1F', 'cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F', 'Cc1nnc2CN(CCn12)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 2: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'], 4: ['cF', 'cc(c)F', 'ccc(F)c(c)F', 'Cc1ccc(F)c(F)c1', 'CC(N)Cc1cc(F)c(F)cc1F', 'CCC(N)Cc1cc(F)c(F)cc1F'], 8: ['[CH]Cc1cc(F)c(F)cc1F', 'NC(=O)CC(N)Cc1cc(F)c(F)cc1F'], 19: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 152: ['cc(C)cc(c)F'], 349: ['[CH]Cc1cc(F)c(F)cc1F', 'cc(c)F', 'NC(=O)CC(N)Cc1cc(F)c(F)cc1F'], 428: ['Cc1ccc(F)c(F)c1'], 495: ['NC(=O)CC(N)Cc1cc(F)c(F)cc1F'], 615: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 671: ['nc1CN(CCn1)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 699: ['CC(N)Cc1cc(F)c(F)cc1F', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 708: ['CC(N)Cc1cc(F)c(F)cc1F'], 713: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'], 758: ['ccc', 'CN(C)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 792: ['Cc1nnc2CN(CCn12)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 801: ['[CH]Cc1cc(F)c(F)cc1F'], 804: ['cCN(CC)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 901: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F'], 970: ['NC(CC(=O)N1CCn2c(C1)nnc2C(F)(F)F)Cc1cc(F)c(F)cc1F'], 993: ['ccc(F)c(c)F', 'CCC(N)Cc1cc(F)c(F)cc1F'], 1005: ['cn1CCN(Cc1nn)C(=O)CC(N)Cc1cc(F)c(F)cc1F']})
https://github.com/iwatobipen/ipynotebooks/blob/master/morgan_fragmentation.ipynb