As many RDKitter know that rdSubstructLibrary is one of the cool tool for conductiong substructure search. Greg Landrum introduced how to use it in his great blog post.
I love the method because it works very fast for substructure searching. So I would like to make CLI tool for making substructure library database.
To do it, I used click which is useful package for making CLI tool.
This is an example to use the code.
$ gh repo clone iwatobipen/rdsss $ cd rdsss $ pip install -e .
After installing the package, three commands will be available.
1. make_rdssslib command makes sslib from sdf.gz
2. update_rdssslib which updates sslib with new sdf.gz
3. run_rdsss which run SSS with given smarts query.
The example is shown below.
# make ssslib from sdf.gz $ make_rdssslib cdk2.sdf.gz cdk2.sslib.pkl # search with ssslib from CLI $ run_rdsss 'c1ccccc1' cdk2.sslib.pkl
After running the run_rdsss, hits.csv file will be generated.
$ cat hits.csv Cn1cnc2c(NCc3ccccc3)nc(NCCO)nc21,ZINC01641925 CC[C@H](CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1,ZINC01649340 COc1ccc(CNc2nc(N(CCO)CCO)nc3c2ncn3C(C)C)cc1,ZINC01487345 COc1ccc2c(c1)/C(=C/c1cnc[nH]1)C(=O)N2,ZINC03814467 COc1cc[nH]c1/C=C1\C(=O)Nc2ccc([N+](=O)[O-])cc21,ZINC03814470 COc1cc(-c2ccc[nH]2)c2c3c(ccc(F)c13)NC2=O,ZINC00003491 [NH3+]CCSc1cc(-c2ccc[nH]2)c2c3c(ccc(F)c13)NC2=O,ZINC03814473 NC(=O)Nc1cccc2c1C(=O)c1c-2n[nH]c1-c1cccs1,ZINC03814477
The csv file contains hit smiles and _Name props.
All process can do from CLI with the code. But to handle learge sslib. I think user should run sss on interprinter. Because IO of SSLIB will take bottle neck of the code.
This code is stil underl development. Any advice or suggestion will be greatly appreciated.