Moleculer Similarity

Sometime, we discuss about molecular similarity.
I think that meaning of similar is depend on a situation.
For example, if aromatic pharmacophore is important, phenyl and pyridil maybe similar.
But If molecular charge is important, phenyl ando pyridil maybe unsimilar.
So, Having some metrics methodologies are useful.
RDKit has interesting fingerprint called “Fraggle Fingerprint”
It’s easy to use.
Let’s coding.

from rdkit import Chem
from rdkit.Chem import AllChem, DataStructs
from rdkit.Chem.Fraggle import FraggleSim
# define TanimotoSim calculator for convinience.
def calctc(mol1,mol2):
    return DataStructs.TanimotoSimilarity(fp1,fp2)
# make molecule from smiles.
# calc. molecular similarity like ECFP4.
In [26]: calctc(mol,mol2)
Out[26]: 0.3333333333333333
#only N,C difference but low similarity !

#calc Fraggle sim.
In [27]:FraggleSim.GetFraggleSimilarity(mol,mol2)
Out[27]: (1.0, '[*]c1ccccc1.[*]c1ccccc1')
# near my feeling.

Fraggle sim. method is more fuzzy calculation method, but it’s acceptable to medchem.
I think the method near to mol-framework.

If you interested in this function.
You can get more useful PDF from here.



