Somedays ago, I wanted to use molecular signature descriptors. But the descriptor can’t calculate from RDKit.
ref.
http://pubs.acs.org/doi/abs/10.1021/ci020345w
It was implemented in CDK.
Hmm, CDK,,, Java library, I’m not good at JAVA :-/.
I found old, but good information from CDK news and noel’s blog!
URL is following
https://www.redbrick.dcu.ie/~noel/CDKJython.html
Jython is one of solution. I tested it.
I installed jython using homebrew and downleaded cdk-1.4.19.jar.. ;-)
Set CLASSPATH of cdk-*.jar file.( cdk-1.5.*.jar did not worked.)
Ready!
Let’s start.
from org.openscience.cdk.io.iterator import IteratingMDLReader from org.openscience.cdk import DefaultChemObjectBuilder f = open('cdk_2.sdf','r') mols = [] for mol in IteratingMDLReader( f, DefaultChemObjectBuilder.getInstance()): mols.append( mol ) """ len(mols) I got 47 """
Next, calculate Mol Sig.
</pre> from org.openscience.cdk import signature for i, m in enumerate(mols): print str(i), signature.AtomSignature( 2, m ) 0 [C]([C]([C]([H][H][H])[C]([C]([H][H][O]([C]([C](=[C]([N]([C][H])[N])[N](=[C]([H])))=[N]([C](=[N][N]([H][H]))))))=[O])[H])[H][H][H]) 1 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C][H][H])[H][H])[H][O]([C]([H][H])))[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H])))) 2 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C][H][H])[H][H])[H][N]([C](=[O])[H]))[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H])))) 3 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C]([H][H])[H][H])[H][H])[C]([C]([C][H][H])[H][H])[H])[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H])))) 4 [C]([C](=[C]([N][O]([C]([C]([C]([C]([C]([H])[H][H])[H][H])[C]([C](=[C][H])[H][H])[H])[H][H])))[N](=[C]))[N]([C]([H])[H])=[N]([C](=[N][N]([H][H])))) 5 [C]([H][N]([C]([H][H][H])[C]([C]=[N]([C](=[N][N]([C]([C]([H][H][O]([H]))[H][H])[H])))))=[N]([C](=[C]([N][N]([C]([C]([C](=[C]([C]([H])[H])[H])=[C]([C](=[C][H])[H]))[H][H])[H]))))) ...
Worked.
But, I used ECFP instead of MolSig some reasons. ;-)