Call CDK from Python

Somedays ago, I wanted to use molecular signature descriptors. But the descriptor can’t calculate from RDKit.

It was implemented in CDK.
Hmm, CDK,,, Java library, I’m not good at JAVA :-/.
I found old, but good information from CDK news and noel’s blog!
URL is following
Jython is one of solution. I tested it.
I installed jython using homebrew and downleaded cdk-1.4.19.jar.. ;-)
Set CLASSPATH of cdk-*.jar file.( cdk-1.5.*.jar did not worked.)
Let’s start.

from import IteratingMDLReader
from org.openscience.cdk import DefaultChemObjectBuilder
f = open('cdk_2.sdf','r')
mols = []
for mol in IteratingMDLReader( f, DefaultChemObjectBuilder.getInstance()):
mols.append( mol )
I got 47

Next, calculate Mol Sig.

from org.openscience.cdk import signature
for i, m in enumerate(mols):
print str(i), signature.AtomSignature( 2, m )

0 [C]([C]([C]([H][H][H])[C]([C]([H][H][O]([C]([C](=[C]([N]([C][H])[N])[N](=[C]([H])))=[N]([C](=[N][N]([H][H]))))))=[O])[H])[H][H][H])
1 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C][H][H])[H][H])[H][O]([C]([H][H])))[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H]))))
2 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C][H][H])[H][H])[H][N]([C](=[O])[H]))[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H]))))
3 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C]([H][H])[H][H])[H][H])[C]([C]([C][H][H])[H][H])[H])[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H]))))
4 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C]([H])[H][H])[H][H])[C]([C](=[C][H])[H][H])[H])[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H]))))
5 [C]([H][N]([C]([H][H][H])[C]([C]=[N]([C](=[N][N]([C]([C]([H][H][O]([H]))[H][H])[H])))))=[N]([C](=[C]([N][N]([C]([C]([C](=[C]([C]([H])[H])[H])=[C]([C](=[C][H])[H]))[H][H])[H])))))

But, I used ECFP instead of MolSig some reasons. ;-)

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

