RDKit based CATS Descriptor #RDKit #Chemoinformatics

Chemically Advanced Template Search (CATS) is developed by Prof. Gisbert Schneider. CATS descriptor is ligand pharmacophore based fuzzy descriptor. So it is suitable for Scaffold hopping of virtual screening. Last week, I attended his lecture and had interest the descriptor again. Fortunately I could find some implementation of the CATS2D descriptor in github repo.

Arthuc’s work (original work is Rajarshi Guha) is nice because the code is written with python and rdkit is used in chemical engine.

I modified the work and made package for CATS2D descriptor calculation with RDKit.

Let’s see an example for distance calculation. At first import packages for calculation.

from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from scipy.spatial.distance import euclidean
from cats2d.rd_cats2d import CATS2D
import numpy as np

from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw

Then load two test molecules. These molecules are well known drugs.

mol1 = Chem.MolFromSmiles('CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C')
mol2 = Chem.MolFromSmiles('CCCC1=NC(=C2N1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)CC)OCC)C')

OK let’s calculate distance of these molecules with Morgan FP and CASTS2D descriptors. CATS2D descriptor is not bit fingerprint, so I used eucdean distance. Surprisingly Tanimoto distance (1.0 – Tanimoto similarity) is very low even if these molecules looks similar.

fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, 2)
fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, 2)
dist = 1.0 - DataStructs.TanimotoSimilarity(fp1, fp2)
print(dist)
> 0.41025641025641024

On the other hand, cats2d descriptor based distance is 0.0. It indicates that the two molecules are almost same based on their pharmacophore features.

cats = CATS2D()
cats1 = cats.getCATs2D(mol1)
cats2 = cats.getCATs2D(mol2)
euclidean(cats1, cats2)
> 0.0

Also the package can provide information of pharmacophore.

print(cats.getPcoreGroups(mol1))
>
['', ['L'], '', '', ['A'], '', '', '', ['A'], '', ['D'], '', ['A'], '', '', '', '', '', '', '', ['A'], ['A'], ['A'], '', '', ['A'], '', '', '', ['A'], '', '', '']

print(cats.getPcoreGroups(mol2))
>
['', ['L'], '', '', ['A'], '', '', '', ['A'], '', ['D'], '', ['A'], '', '', '', '', '', '', '', ['A'], ['A'], ['A'], '', '', ['A'], '', '', '', '', ['A'], '', '', '']

Scaffold hopping is very useful strategy of drug discovery for not only improving compound properties but also expanding IP space.

I would like to improve the package because the package is still under development.

Any comments a/o suggestions are greatly appreciated.

My code can be found following URL.
https://github.com/iwatobipen/CATS2D

Thanks for developing and sharing CATS2D descriptor implementation!