RDKit based CATS Descriptor #RDKit #Chemoinformatics

Chemically Advanced Template Search (CATS) is developed by Prof. Gisbert Schneider. CATS descriptor is ligand pharmacophore based fuzzy descriptor. So it is suitable for Scaffold hopping of virtual screening. Last week, I attended his lecture and had interest the descriptor again. Fortunately I could find some implementation of the CATS2D descriptor in github repo.

Arthuc’s work (original work is Rajarshi Guha) is nice because the code is written with python and rdkit is used in chemical engine.

I modified the work and made package for CATS2D descriptor calculation with RDKit.

Let’s see an example for distance calculation. At first import packages for calculation.

from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from scipy.spatial.distance import euclidean
from cats2d.rd_cats2d import CATS2D
import numpy as np

from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw

Then load two test molecules. These molecules are well known drugs.

mol1 = Chem.MolFromSmiles('CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C')
mol2 = Chem.MolFromSmiles('CCCC1=NC(=C2N1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)CC)OCC)C')

OK let’s calculate distance of these molecules with Morgan FP and CASTS2D descriptors. CATS2D descriptor is not bit fingerprint, so I used eucdean distance. Surprisingly Tanimoto distance (1.0 – Tanimoto similarity) is very low even if these molecules looks similar.

fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, 2)
fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, 2)
dist = 1.0 - DataStructs.TanimotoSimilarity(fp1, fp2)
print(dist)
> 0.41025641025641024

On the other hand, cats2d descriptor based distance is 0.0. It indicates that the two molecules are almost same based on their pharmacophore features.

cats = CATS2D()
cats1 = cats.getCATs2D(mol1)
cats2 = cats.getCATs2D(mol2)
euclidean(cats1, cats2)
> 0.0

Also the package can provide information of pharmacophore.

print(cats.getPcoreGroups(mol1))
>
['', ['L'], '', '', ['A'], '', '', '', ['A'], '', ['D'], '', ['A'], '', '', '', '', '', '', '', ['A'], ['A'], ['A'], '', '', ['A'], '', '', '', ['A'], '', '', '']

print(cats.getPcoreGroups(mol2))
>
['', ['L'], '', '', ['A'], '', '', '', ['A'], '', ['D'], '', ['A'], '', '', '', '', '', '', '', ['A'], ['A'], ['A'], '', '', ['A'], '', '', '', '', ['A'], '', '', '']

Scaffold hopping is very useful strategy of drug discovery for not only improving compound properties but also expanding IP space.

I would like to improve the package because the package is still under development.

Any comments a/o suggestions are greatly appreciated.

My code can be found following URL.
https://github.com/iwatobipen/CATS2D

Thanks for developing and sharing CATS2D descriptor implementation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.