Calculate descriptors using propbox

I found useful code in rdkit mailing list.
Propbox is python based tool that was written by Andrew.
It’s available from following url.
https://bitbucket.org/dalke/propbox

It is easy to run command, only download and extract zip archive, then ready to use.

iwatobipen$ pwd
/Users/iwatobipen/Desktop/dalke-propbox-f06a4a4b688e

iwatobipen$ tree
.
├── COPYING
├── COPYING.pylru
├── README
├── propbox
│   ├── __init__.py
│   ├── __init__.pyc
│   ├── nci.py
│   ├── nci.pyc
│   ├── pylru.py
│   ├── pylru.pyc
│   ├── rdkit_descriptors.py
│   ├── rdkit_descriptors.pyc
│   ├── rdkit_toolkit.py
│   ├── rdkit_toolkit.pyc
│   ├── rdprops.py
│   ├── rdprops.pyc
│   ├── simple_futures.py
│   └── simple_futures.pyc
├── rdprops
└── tests
    ├── CHEMBL11862.sdf
    ├── benzodiazepine.smi
    ├── drugs.smi
    └── test_api.py

2 directories, 22 files 

I used sample file ‘drugs.smi’ that is ./tests.

iwatobipen$ cat tests/drugs.smi 
N12CCC36C1CC(C(C2)=CCOC4CC5=O)C4C3N5c7ccccc76 Strychnine
c1ccccc1C(=O)OC2CC(N3C)CCC3C2C(=O)OC cocaine
COc1cc2c(ccnc2cc1)C(O)C4CC(CC3)C(C=C)CN34 quinine
OC(=O)C1CN(C)C2CC3=CCNc(ccc4)c3c4C2=C1 lyseric acid
CCN(CC)C(=O)C1CN(C)C2CC3=CNc(ccc4)c3c4C2=C1 LSD
C123C5C(O)C=CC2C(N(C)CC1)Cc(ccc4O)c3c4O5 morphine
C123C5C(OC(=O)C)C=CC2C(N(C)CC1)Cc(ccc4OC(=O)C)c3c4O5 heroin
c1ncccc1C1CCCN1C nicotine
CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 caffeine
C1C(C)=C(C=CC(C)=CC=CC(C)=CCO)C(C)(C)C1 vitamin a

Run script.
The columns arguments supports descriptors that is used in RDKit.
I set HBA, TPSA, logp, HA as example.

iwatobipen$ time ./rdprops tests/drugs.smi --columns id,smiles,HeavyAtomCount,TPSA,MolLogP,HeavyAtomCount
id	smiles	HeavyAtomCount	TPSA	MolLogP	HeavyAtomCount
Strychnine	O=C1CC2OCC=C3CN4CCC56c7ccccc7N1C5C2C3CC46	25	32.78	2.0925	25
cocaine	COC(=O)C1C(OC(=O)c2ccccc2)CC2CCC1N2C	22	55.84	1.8677	22
quinine	C=CC1CN2CCC1CC2C(O)c1ccnc2ccc(OC)cc12	24	45.59	3.1732	24
lyseric acid	CN1CC(C(=O)O)C=C2c3cccc4c3C(=CCN4)CC21	21	52.57	2.2974	21
LSD	CCN(CC)C(=O)C1C=C2c3cccc4[nH]cc(c34)CC2N(C)C1	24	39.34	2.906	24
morphine	CN1CCC23c4c5ccc(O)c4OC2C(O)C=CC3C1C5	21	52.93	1.1981	21
heroin	CC(=O)Oc1ccc2c3c1OC1C(OC(C)=O)C=CC4C(C2)N(C)CCC341	27	65.07	1.9886	27
nicotine	CN1CCCC1c1cccnc1	12	16.13	1.8483	12
caffeine	Cn1c(=O)c2c(ncn2C)n(C)c1=O	14	61.82	-1.0293	14
vitamin a	CC(C=CC1=C(C)CCC1(C)C)=CC=CC(C)=CCO	20	20.23	5.1202	20

real	0m0.306s
user	0m0.198s
sys	0m0.098s

Hmm, it’s easy to calculate descriptors.
The code can calculate 124 descs

 iwatobipen$ ./rdprops --list | wc
     124     124    1457

And details are…

iwatobipen$ ./rdprops --list
_chargeDescriptors
BalabanJ
BertzCT
cansmiles
chargeDescriptorVersion
Chi0
Chi0n
Chi0v
Chi1
Chi1n
Chi1v
Chi2n
Chi2v
Chi3n
Chi3v
Chi4n
Chi4v
EState_VSA1
EState_VSA10
EState_VSA11
EState_VSA2
EState_VSA3
EState_VSA4
EState_VSA5
EState_VSA6
EState_VSA7
EState_VSA8
EState_VSA9
ExactMolWt
FractionCSP3
HallKierAlpha
HeavyAtomCount
HeavyAtomMolWt
id
input_format
input_mol
input_record
Ipc
Kappa1
Kappa2
Kappa3
LabuteASA
MaxAbsEStateIndex
MaxAbsPartialCharge
MaxEStateIndex
MaxPartialCharge
MinAbsEStateIndex
MinAbsPartialCharge
MinEStateIndex
MinPartialCharge
mol
MolLogP
MolMR
MolWt
MolWt_version
nci_iupac_name
nci_names
NHOHCount
NOCount
NumAliphaticCarbocycles
NumAliphaticHeterocycles
NumAliphaticRings
NumAromaticCarbocycles
NumAromaticHeterocycles
NumAromaticRings
NumHAcceptors
NumHDonors
NumHeteroatoms
NumRadicalElectrons
NumRotatableBonds
NumSaturatedCarbocycles
NumSaturatedHeterocycles
NumSaturatedRings
NumValenceElectrons
PEOE_VSA1
PEOE_VSA10
PEOE_VSA11
PEOE_VSA12
PEOE_VSA13
PEOE_VSA14
PEOE_VSA2
PEOE_VSA3
PEOE_VSA4
PEOE_VSA5
PEOE_VSA6
PEOE_VSA7
PEOE_VSA8
PEOE_VSA9
RingCount
SlogP_VSA1
SlogP_VSA10
SlogP_VSA11
SlogP_VSA12
SlogP_VSA2
SlogP_VSA3
SlogP_VSA4
SlogP_VSA5
SlogP_VSA6
SlogP_VSA7
SlogP_VSA8
SlogP_VSA9
smiles
SMR_VSA1
SMR_VSA10
SMR_VSA2
SMR_VSA3
SMR_VSA4
SMR_VSA5
SMR_VSA6
SMR_VSA7
SMR_VSA8
SMR_VSA9
TPSA
usmsmiles
VSA_EState1
VSA_EState10
VSA_EState2
VSA_EState3
VSA_EState4
VSA_EState5
VSA_EState6
VSA_EState7
VSA_EState8
VSA_EState9

I think this tool is useful to get some descriptors for Machine Learning / QSAR.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s