Open Source Lilly’s Chemoinformatics Package

In 2012, lilly’s researchers published Lilly-MedChem Rules in J. Med. Chem. and disclosed their code on github. After the publication, the rules are used in many applications, papers and chemoinformatics applications. Open source tool made a big impact on chemoinformatics. Several hours ago I found an interesting tweet from @jcheminf.

They reported an algorithm of retro-synthesis. Data driven retrosynthetic analysis is hot topics in chemoinformatics area I think.

The article is published from Lilly and the author uploaded source code on github. URL is below.

Their implementation is different from Segler’s approach ‘Learning plan to chemical synthesis‘. They do not use machine learning approach but use Reverse Reaction Template (RRT) based approach. RRT defines reaction rules and it extracted from mapped reaction data such as Lowe’s US patent dataset.

At first they made RRT repository and used it for analysis. After making the repository. Researcher inputs query structure to the system, the system will search RRT which is applicable for the query and recode it when matched.
Key point is how to make RRT I think. More details are described in the article.
They benchmarked their system with 919 known drug structures from drug bank. The performance of results seems depends on settings of RRT, radius and support. radi-0 RRT seems more general than radi-2 RRT, it likes ECFP.(Table 2)

After reading the article, I would like to use the code. OK let’s try it!

I have checked the repo last year but the code supports linux only at that time. However now, the code supports not only linux but also OSx. ;-)
It is easy to install the tool-kit. OK let’s install the TK and use it.
For installation, gcc >= 6.2.0 and zlib>=1.2.11 are required, so I installed them with home brew.

iwatobipen$ brew install zlib
iwatobipen$ brew install gcc

Then clone the repository and change ZLIB part in makefile.public.OSX-gcc-8. I installed zlib via Homebrew, so I changed ZLIB to ‘/usr/loca/Cellar/zlib….’.

All code are implemented in C++ and the code does not use any chemoinformatics packages such as RDKit, openbabel and CDK!! @_@

iwatobipen$ git clone
iwatobipen$ cd LillyMol
iwatobipen$ vim makefile.public.OSX-gcc-8
-- ZLIB =  /usr/local/opt/zlib/lib
++ ZLIB =  ZLIB = /usr/local/Cellar/zlib/1.2.11/lib

Now ready! After makefile change, run the After wait several minutes, installation will finish. All commands are generated in ./bin/OSX-gcc-8/. There are many commands are provided.

iwatobipen$ cd bin/OSX-gcc-8/
iwatobipen$ ls
activity_consistency	iwcut			msort			ring_extraction		rxn_substructure_search	trxn
common_names		iwdemerit		preferred_smiles	ring_trimming		smiles_mutation		tsubstructure
concat_files		mol2qry			random_smiles		rotatable_bonds		sp3_filter		unique_molecules
fetch_smiles_quick	molecular_scaffold	retrosynthesis		rxn_signature		tautomer_generation	unique_rows
fileconv		molecule_subset		rgroup			rxn_standardize		tp_first_pass

Details of the commands are described in the wiki page.

I checked retrosynthesis code with example data. It is a little difficult to set options for me.

iwatobipen$ cd ./example/retrosynthesis
iwatobipen$ cat 1Cmpds.smi 
> C(=O)(C)NC1=CC(=C(O)C=C1)CN1CCC(NC(=O)C2=CC=CC=C2)CC1.O
iwatobipen$  ../../bin/OSX-gcc-8/retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS 1Cmpds.smi >log.txt 2>err.txt

Check log.txt and err.txt.

iwatobipen$ cat log.txt
O.Oc1ccc(NC(=O)C)cc1.C=O.O=C(NC1CCNCC1)c1ccccc1  via US03992389_NA CentroidRxnSmi_1 R 1 ALL
Oc1ccc(NC(=O)C)cc1.C=O.O=C(NC1CCNCC1)c1ccccc1  via US03992389_NA CentroidRxnSmi_1 R 1 SPFRM.1
Oc1ccc(NC(=O)C)cc1  via US03992389_NA CentroidRxnSmi_1 R 1
O=C  via US03992389_NA CentroidRxnSmi_1 R 1
O=C(NC1CCNCC1)c1ccccc1  via US03992389_NA CentroidRxnSmi_1 R 1

iwatobipen$ cat err.txt
Will not write product fragments with fewer than 2 atoms
Will keep going after an individual test failure
Will preserve Kekule forms
Will use the reaction file name as the reaction name
Reading reactions took 0 seconds
read mol smi eof
Read 1 molecules, 1 deconstructed
1 molecules deconstructed at radius 1
0 deconstructions done at radius 0
1 deconstructions done at radius 1
Set_of_Reactions: CentroidRxnSmi_1 with 164 reactions
2 molecules deconstructed at radius 1
2 molecules deconstructed
Set_of_Reactions: CentroidRxnSmi_1 with 164 reactions
 1 US03947458_NA 1 searches, 0 matches found
 1 US03947473_NA 1 searches, 0 matches found
 1 US03989717_NA 1 searches, 0 matches found
 ----- snip ;-) ------
 1 US20160002218A1_0322 1 searches, 0 matches found
 1 US20160200725A1_0864 1 searches, 0 matches found
2 molecules deconstructed at radius 1
2 molecules deconstructed
163 reactions had 0 hits
1 reactions had 1 hits

It is difficult to understand smiles strings directly for me, OK let’s visualize with RDKit!

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
parent = Chem.MolFromSmiles('C(=O)(C)NC1=CC(=C(O)C=C1)CN1CCC(NC(=O)C2=CC=CC=C2)CC1.O')
mol1 = Chem.MolFromSmiles('O.Oc1ccc(NC(=O)C)cc1.C=O.O=C(NC1CCNCC1)c1ccccc1')
mol2 = Chem.MolFromSmiles('Oc1ccc(NC(=O)C)cc1.C=O.O=C(NC1CCNCC1)c1ccccc1')
mol3 = Chem.MolFromSmiles('Oc1ccc(NC(=O)C)cc1')
mol4 = Chem.MolFromSmiles('O=C')
mol5 = Chem.MolFromSmiles('O=C(NC1CCNCC1)c1ccccc1')
Draw.MolsToGridImage([parent, mol1, mol2, mol3, mol4, mol5])
Single step retro synthesis with Liily’s TK

Example code worked without any problems. But it failed when I used original molecule as a query. One of the reason is that I used very limited training data. I used default data for my test. It has only 164 rxns.

I would like to try to make RRT with large data set.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.