Estimation of Synthetic Accessibility Score

RDKit 2013_09 was released at 2013 11 02.
Lot’s of features were implemented. 🙂
I’m interested in “Ertl and Schuffenhauer’s Synthetic Accessibility score”.
The script was “rdkit / Contrib / SA_Score / ” folder.

At first, I got some data from ChEMBL. compnd from chembl
These molecules were downloaded as sdf.
Then, I changed the line150 in from “suppl = Chem.SmilesMolSupplier(sys.argv[1])” to “suppl = Chem.SDMolSupplier(sys.argv[1])”.
Because, sometime, smiles file dose not have _Name properties, but SDF has it.
Now I can use sdfile as query.
OK, let’s calculate synthetic accessibility score.
From terminal.
I calculated 272 molecules scores.

iwatobipen$ python data/cmpd_download_1.sdf > result.txt 
Reading took 7.07 seconds. Calculating took 0.72 seconds
iwatobipen$ wc result.txt
     272     545   15318 result.txt 
iwatobipen$ head -n 20 result.txt 
smiles	Name	sa_score
CCN=C(NCC)NCCCCC(NC(=O)C(Cc1ccc(O)cc1)NC(=O)C(CO)NC(=O)C(Cc1cccnc1)NC(=O)C(Cc1ccc(Cl)cc1)NC(=O)C(Cc1ccc2ccccc2c1)NC(C)=O)C(=O)NC(CC(C)C)C(=O)NC(CCCCNC(=NCC)NCC)C(=O)N1CCCC1C(=O)NC(C)C(N)=O		6.830780
CC(=O)NC(Cc1ccc2ccccc2c1)C(=O)NC(Cc1ccc(Cl)cc1)C(=O)NC(Cc1cccnc1)C(=O)NC(CO)C(=O)N(C)C(Cc1ccc(O)cc1)C(=O)NC(CC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCCCNC(C)C)C(=O)N1CCCC1C(=O)NC(C)C(N)=O		6.301321
OCc1cc(C(O)CNCCCCCCOCCCCc2ccccc2)ccc1O		2.719981
c1ccc2c(c1)CCCC2C1=NCCN1		2.979165
CNS(=O)(=O)CCc1ccc2[nH]cc(C3CCN(C)CC3)c2c1		2.494668
CC(C)(C)NCC(O)COc1ccccc1C1CCCC1		2.566913
CC(=O)Oc1c(C)cc(OCC(O)CNC(C)C)c(C)c1C		2.837756
NC(=O)C1CCCN1C(=O)C(Cc1cnc[nH]1)NC(=O)C1CCC(=O)N1		3.498752
CC(C[N+](C)(C)C)OC(N)=O		3.513955
CN(C)CCN(Cc1ccccc1)c1ccccn1		1.946921
CCCN(CCc1cccs1)C1CCc2c(O)cccc2C1		2.923692
c1ccc(CN(CC2=NCCN2)c2ccccc2)cc1		2.238013
CN(C)CCc1c[nH]c2ccc(CC3COC(=O)N3)cc12		3.055917
CSCCC(NC(=O)C(Cc1c[nH]c2ccccc12)NC(=O)CCNC(=O)OC(C)(C)C)C(=O)NC(CC(=O)O)C(=O)NC(Cc1ccccc1)C(N)=O		4.013013
COc1ccc(CC(C)NCC(O)c2ccc(O)c(NC=O)c2)cc1		3.010938
Cc1ccc(O)c(C(CCN(C(C)C)C(C)C)c2ccccc2)c1		2.647818
CCSc1ccc2c(c1)N(CCCN1CCN(C)CC1)c1ccccc1S2		2.432617
O=c1[nH]c2ccccc2n1C1CCN(CCCC(c2ccc(F)cc2)c2ccc(F)cc2)CC1		2.366820
N=C(N)NCCCC(NC(=O)C1CCCN1C(=O)C1CSSCCC(=O)NC(Cc2ccc(O)cc2)C(=O)NC(Cc2ccccc2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(N)=O)C(=O)N1)C(=O)NCC(N)=O		5.924429

That is able to characterize molecule synthetic accessibility as a score between 1 (easy to make) and 10 (very difficult to make) is described in the ref.
Hmm it’s interesting.
These score calculated with fragmentScore and complexityPenalty.
It was interesting that the score was correlated with sence of highly experienced MedChem.
I’ll try to get more data !

Please see details here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s