Similarity search with chemical cartridge for SQLite3 #rdkit #sqlite3 #chemicalite

Some days ago I posted topics about chemical cartridge for sqlite named ‘chemicalite’

And in the post I wrote how to install chemicalite and how to conduct substructure search but didn’t wrote similarity search with chemicalite. Original document describes how to make fingerprint table and use it however I couldn’t reproduce it with same code. So I tried to different query. It maybe not efficient way compared to original document but worked on my env.

At first, load chemibl db and chemicalite plugin. And then create fingerprint data

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Draw import rdDepictor

chemicalitepath = '/home/iwatobipen/src/chemicalite/build/src/'
import apsw
# following code was borrowed from original documentation
# the extension is usually loaded right after the connection to the
# database
connection = apsw.Connection('chembldb.sql')

cursor = connection.cursor()

# create a virtual table to be filled with morgan bfp data
cursor.execute("CREATE VIRTUAL TABLE morgan USING\n" +
               "rdtree(id, bfp bytes(64))");

# compute and insert the fingerprints
cursor.execute("INSERT INTO morgan(id, bfp)\n" +
               "SELECT id, mol_morgan_bfp(molecule, 2) FROM chembl")

It took 10 min or more to make fingerprint table and after that, similarity search function will be available. rdtree_tanimoto calculate Tanimoto similarity and where clause is used for getting id which shows higher similarity than threshold.

s = time.time()
count = cursor.execute("SELECT count(*) FROM "
                  "morgan as idx WHERE "
                  " match rdtree_tanimoto(mol_morgan_bfp(?, 2), ?)",
                  ('Cc1ccc2nc(-c3ccc(NC(C4N(C(c5cccs5)=O)CCC4)=O)cc3)sc2c1', 0.6)).fetchone()[0]
f = time.time()
>> 0.27
>> 9

Following example is 2 step similarity search, 1) retrieve id which meet criteria 2) retrieve smiles and calculate tanimoto similarity in sql query.

target = Chem.MolToSmiles(mols[1])
threshold = 0.65
s = time.time()
res1 = cursor.execute("select id from morgan where id match rdtree_tanimoto(mol_morgan_bfp(?, 2), ?)", (target, 0.7)).fetchall()
res1 =[r[0] for r in res1]
res2 = []
for morganid in res1:
    res = cursor.execute("select smiles, chembl_id, bfp_tanimoto(mol_morgan_bfp(molecule, 2), mol_morgan_bfp(?, 2)) "
                         "from chembl "
                         "where id == ? "
                         , (target, morganid)).fetchall()
f = time.time()
>> 0.1522815227508545

Similarity search task was done very quickly ;) And sqlite DB don’t require server setting up. It useful for ad hoc analysis or test case.

Finally check the search results with rdkit functions.

targetmol = [Chem.MolFromSmiles(target)]
getmols = targetmol + [Chem.MolFromSmiles(row[0]) for row in res2]
legends = ['query'] + [f"{row[1]} TC {row[2]:.2f}" for row in res2]
Draw.MolsToGridImage(getmols[:20], legends=legends[:20], molsPerRow=5)

It seems that chemicalite works well for chemoinformatics task. RDKit has MMPDB which is implemented with sqlite and apsw but it doesn’t have structure search engine so integration chemicalite and mmpdb seems interesting.

I would like to make web service or other tools with chemicalite.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: