Lilly’s Chemoinformatics Tool Kit #memo #chemoinformatics

Almost 7 years ago, I posted a topics about Lilly’s MedChem filter which is disclosed on github.

It’s interesting for me because most of languages of code on Lilly’s repository are not python and C++, R etc. And one of interesting code is LillyMol. The licence of the code is Apache 2.0. I felt the install process seems little bit tricky, so I wrote memo for my self.

LillyMol has lots of command for chemoinformatics. So I would like to Test them. At first, install LillyMol on my PC. The code supports Linux and OSX. My experience was below. To install it, GCC>=7.2, zlib, Eigen is required. My gcc version is 7.5, zlib and eigen was installed via conda command. ( Ubuntu18.04 ) Then clone the LillyMol repo.

$ conda install -c conda-forge zlib
$ conda install -c conda-forge eigen3
$ git clone
$ cd LillyMol

The repo provides some makefiles, but there is no file for makefile such as makefile.public.Linux-gcc-7.5.0. So I renamed file, form makefile.public.Linux-gcc-7.2.1 to makefile.public.Linux-gcc-7.5.0 and edited the file. Give zilib path and eigen include path. My example was below.

# Change from 
# To (This part will be depends on user's environment )
ZLIB = /home/iwatobipen/miniconda3/pkgs/zlib-1.2.11-h7b6447c_3/lib

# Change from

# To (This part will be depends on user's environment )

And some files should be edited for installation.

# src/Foundational/cmdline/
- extern int optind;
- extern char *optarg;
- extern int opterr;

+ extern "C" int optind;
+ extern "C" char *optarg;
+ extern "C" int opterr;


# src/Molecule_Tools/

- #include "Eigen/Dense"
- #include "Eigen/Core"

+ #include "/home/iwatobipen/miniconda3/pkgs/eigen-3.3.7-hc9558a2_1001/include/eigen3/Eigen/Dense"
+ #include "/home/iwatobipen/miniconda3/pkgs/eigen-3.3.7-hc9558a2_1001/include/eigen3/Eigen/Core"

Then, run in root folder. After installation, I could run LillyMol command.

$ cd bin/Linux-gcc-7.5.0/
$./ compiled Oct  9 2020 22:17:04
Gather all the names duplicate structures together
  -a             compare graph forms - add 2nd -a option to include H count
  -c             exclude chirality information
  -x             exclude directional bonds
  -l             strip to largest fragment
  -I             remove isotopes before storing
  -D <separator> separator for when storing duplicate entries
  -f             single pass operation, smiles output only
  -s <size>      maximum number of molecules to process
  -r <number>    report progress every <number> molecules processed
  -y             write first name and count of smiles only
  -i <type>      specify input file type
  -S <name>      specify name for output
  -o <type>      specify output file type
  -T ...         standard element transformation options, enter '-T help'
  -E ...         standard element options
  -A <qualifier> Aromaticity, enter "-A help" for options
  -g <qualifier> chemical standardisations, enter "-g help" for usage
  -v             verbose output

# There are many commands will be available...
$ ls
abraham                             gfp_to_descriptors             retrosynthesis
activity_consistency                gfp_to_descriptors_multiple    rgroup
average                             hydrophobic_sections           ring_extraction
common_names                        iwcut                          ring_fingerprint
concat_files                        iwdemerit                      ring_replacement
dbf                                 iwdescr                        ring_substitution
descriptor_file_to_01_fingerprints  iwecfp                         ring_trimming
descriptors_to_fingerprint          iwecfp_intermolecular          rotatable_bonds
dfilefilter                         iwfp                           rule_of_five
dicer                               iwsplit                        rxn_fingerprint
distribution                        iwstats                        rxn_signature
fetch_sdf_quick                     jwsadb                         rxn_standardize
fetch_smiles_quick                  maccskeys                      rxn_substructure_search
fileconv                            make_these_molecules           smiles_mutation
gfp_add_descriptors                 mol2qry                        sp3_filter
gfp_distance_filter                 molecular_abstraction          substitutions
gfp_distance_matrix                 molecular_scaffold             tautomer_generation
gfp_distance_matrix_iwdm            molecular_transformations      tcount
gfp_leader_standard                 molecules_from_reagents        tdt_join
gfp_leader_v2                       molecule_subset                tdt_sort
gfp_lnearneighbours                 msort                          temperature
gfp_lnearneighbours_standard        nn_leader_and_jp               
gfp_naive_bayesian                  normalise                      tnass
gfp_nearneighbours                  notenoughvariance              tp_first_pass
gfp_nearneighbours_single_file      nplotnn                        trxn
gfp_pairwise_distances              preferred_smiles               tshadow
gfp_profile_activity_by_bits        random_molecular_permutations  tsubstructure
gfp_single_linkage                  random_records                 unique_molecules
gfp_sparse_to_fixed                 random_smiles                  unique_rows
gfp_spread_buckets_v2               rearrange_columns              whatsmissing
gfp_spread_standard                 remove_and_label
gfp_spread_v2                       remove_matched_atoms

fileconv seems as same as openbabel but it has interesting feature which can convert some kinds of structure files to smiles with 3D information. As you can see, each atom string has 3d infomation. To read the smiles from rdkit, parser function is required.

$ ./fileconv -i mdl -o smi3d test/cdk2.sdf 
$ head -n 2 test/cdk2.smi
C{{5.423,-0.4412,0.7616}}([H]{{5.7118,-1.2538,0.0931}})([H]{{6.2974,0.1913,0.9136}})([H]{{5.1671,-0.8852,1.7247}})C{{4.2434,0.3667,0.188}}([H]{{4.0364,1.1881,0.8743}})(C{{4.5978,0.963,-1.1852}}([H]{{5.4832,1.5956,-1.1194}})([H]{{4.8059,0.1785,-1.9146}})[H]{{3.7887,1.5777,-1.581}})C{{2.9575,-0.4703,0.1074}}(=O{{2.9988,-1.6999,0.058}})C{{1.6357,0.2975,0.0804}}([H]{{1.6085,0.9288,-0.808}})([H]{{1.5821,0.9425,0.9579}})O{{0.5374,-0.6063,0.0692}}C{{-0.7229,-0.0532,0.031}}1=N{{-0.8677,1.299,0.035}}C{{-2.0919,1.8123,-0.0064}}(=N{{-3.2721,1.2054,-0.0433}}C{{-3.1098,-0.1432,-0.0466}}2=C{{-1.8848,-0.8592,-0.0106}}1N{{-2.1041,-2.231,-0.0241}}=C{{-3.433,-2.2959,-0.0687}}([H]{{-3.9506,-3.2459,-0.0915}})N{{-4.0854,-1.1212,-0.0831}}2[H]{{-5.0816,-0.99,-0.1173}})N{{-2.1448,3.1672,-0.0074}}([H]{{-3.038,3.6039,0.1519}})[H]{{-1.3036,3.6737,0.2145}} ZINC03814457
C{{3.2069,2.4332,0.1683}}1(=N{{1.8933,2.2332,0.2504}}C{{1.808,0.8502,0.1384}}2=C{{0.7321,-0.0685,0.1386}}(N{{1.0086,-1.394,0.0148}}=C{{2.2734,-1.7772,-0.1068}}(N{{3.3866,-1.0536,-0.1367}}=C{{3.0936,0.2661,-0.0051}}2N{{3.968,1.3361,0.0191}}1[H]{{4.9699,1.307,-0.0571}})N{{2.4572,-3.1158,-0.2204}}([H]{{3.36,-3.4373,-0.5285}})[H]{{1.646,-3.6786,-0.418}})O{{-0.5735,0.3558,0.2686}}C{{-1.5971,-0.631,0.2422}}([H]{{-1.4556,-1.3123,1.0829}})([H]{{-1.5595,-1.2189,-0.6765}})[C@@]{{-2.9575,0.065,0.3723}}1([H]{{-2.9193,0.8386,1.1419}})O{{-3.3578,0.6367,-0.8635}}C{{-4.7364,0.9594,-0.7584}}([H]{{-5.2028,1.0183,-1.7422}})([H]{{-4.8426,1.9307,-0.2727}})C{{-5.3322,-0.1496,0.113}}([H]{{-5.9676,-0.8188,-0.4679}})([H]{{-5.9361,0.2761,0.9151}})C{{-4.1165,-0.8888,0.6548}}1([H]{{-4.2043,-1.1389,1.7124}})[H]{{-3.9777,-1.8177,0.0998}})[H]{{3.63,3.4278,0.2204}} ZINC03814459

I used re module to do it.

import re
from rdkit import Chem

smi3d = 'C{{5.423,-0.4412,0.7616}}([H]{{5.7118,-1.2538,0.0931}})([H]{{6.2974,0.1913,0.9136}})([H]{{5.1671,-0.8852,1.7247}})C{{4.2434,0.3667,0.188}}([H]{{4.0364,1.1881,0.8743}})(C{{4.5978,0.963,-1.1852}}([H]{{5.4832,1.5956,-1.1194}})([H]{{4.8059,0.1785,-1.9146}})[H]{{3.7887,1.5777,-1.581}})C{{2.9575,-0.4703,0.1074}}(=O{{2.9988,-1.6999,0.058}})C{{1.6357,0.2975,0.0804}}([H]{{1.6085,0.9288,-0.808}})([H]{{1.5821,0.9425,0.9579}})O{{0.5374,-0.6063,0.0692}}C{{-0.7229,-0.0532,0.031}}1=N{{-0.8677,1.299,0.035}}C{{-2.0919,1.8123,-0.0064}}(=N{{-3.2721,1.2054,-0.0433}}C{{-3.1098,-0.1432,-0.0466}}2=C{{-1.8848,-0.8592,-0.0106}}1N{{-2.1041,-2.231,-0.0241}}=C{{-3.433,-2.2959,-0.0687}}([H]{{-3.9506,-3.2459,-0.0915}})N{{-4.0854,-1.1212,-0.0831}}2[H]{{-5.0816,-0.99,-0.1173}})N{{-2.1448,3.1672,-0.0074}}([H]{{-3.038,3.6039,0.1519}})[H]{{-1.3036,3.6737,0.2145}}'

pat = re.compile('{{([-]?\d*\.\d+),([-]?\d*\.\d+),([-]?\d*\.\d+)}}')
[('5.423', '-0.4412', '0.7616'),
 ('5.7118', '-1.2538', '0.0931'),
 ('6.2974', '0.1913', '0.9136'),
 ('5.1671', '-0.8852', '1.7247'),
 ('4.2434', '0.3667', '0.188'),
 ('4.0364', '1.1881', '0.8743'),
 ('4.5978', '0.963', '-1.1852'),
 ('5.4832', '1.5956', '-1.1194'),
 ('4.8059', '0.1785', '-1.9146'),
 ('3.7887', '1.5777', '-1.581'),
 ('2.9575', '-0.4703', '0.1074'),
 ('2.9988', '-1.6999', '0.058'),
 ('1.6357', '0.2975', '0.0804'),
 ('1.6085', '0.9288', '-0.808'),
 ('1.5821', '0.9425', '0.9579'),
 ('0.5374', '-0.6063', '0.0692'),
 ('-0.7229', '-0.0532', '0.031'),
 ('-0.8677', '1.299', '0.035'),
 ('-2.0919', '1.8123', '-0.0064'),
 ('-3.2721', '1.2054', '-0.0433'),
 ('-3.1098', '-0.1432', '-0.0466'),
 ('-1.8848', '-0.8592', '-0.0106'),
 ('-2.1041', '-2.231', '-0.0241'),
 ('-3.433', '-2.2959', '-0.0687'),
 ('-3.9506', '-3.2459', '-0.0915'),
 ('-4.0854', '-1.1212', '-0.0831'),
 ('-5.0816', '-0.99', '-0.1173'),
 ('-2.1448', '3.1672', '-0.0074'),
 ('-3.038', '3.6039', '0.1519'),
 ('-1.3036', '3.6737', '0.2145')]

smilesstrings = re.sub(pat, '', smi)
Out[41]: 'C([H])([H])([H])C([H])(C([H])([H])[H])C(=O)C([H])([H])OC1=NC(=NC2=C1N=C([H])N2[H])N([H])[H]'

rdkmol = Chem.MolFromSmiles(m)

Out[46]: 'CC(C)C(=O)COc1nc(N)nc2[nH]cnc12'

Of course convert from sdf to smi wthout 3d information can do with ‘-o smi’ option.

$ head -n 2 test/cdk2.smi
C([H])([H])([H])C([H])(C([H])([H])[H])C(=O)C([H])([H])OC1=NC(=NC2=C1N=C([H])N2[H])N([H])[H] ZINC03814457
C1(=NC2=C(N=C(N=C2N1[H])N([H])[H])OC([H])([H])[C@@]1([H])OC([H])([H])C([H])([H])C1([H])[H])[H] ZINC03814459

Check rule of five.

0$ ./rule_of_five test/cdk2.smi 
read mol smi eof
Name ro5_nhoh ro5_no natoms amw clogp violations
ZINC03814457 2 7 30 235.2427 . 0
ZINC03814459 2 7 30 235.2427 . 0
ZINC03814460 3 8 30 248.2415 . 0
ZINC00023543 2 6 35 247.2965 . 0
ZINC03814458 2 6 33 245.2806 . 0
ZINC01641925 3 7 40 298.3433 . 0
ZINC01649340 3 7 52 354.4496 . 0

The code can calculate not only ligand based parameters but also protein-ligand complex based parameters.

More details are described in wiki. I’ll test more command later ;)

In this week, I enjoyed RDKit UGM 2020. Lots of materials are available from github. It was great success of open science. OSS is growing more important tools for science. I would like to commit it as possible as I can.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: