Somedays ago, I posted the topics about ‘exit vector’.
The way to represent chemical space give me new insight for molecular design.
I think it’s useful for replacement of linkage or scaffold.
It’s important for drug discovery to design new molecules that similar ligand for target, but not similar for competitor’s one.
So, I added some function to my code. New code calculate exit vector and tanimoto similarity of each molecule.
All code is following.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import AllChem from rdkit.Chem import rdMolTransforms from rdkit.Chem import DataStructs import numpy as np from rdkit.Chem import Draw import seaborn as sns import pandas as pd import matplotlib.pyplot as plt sdf = Chem.SDMolSupplier( "3d_amine.sdf", removeHs=False ) mols = [ m for m in sdf ] fps = [ AllChem.GetMorganFingerprintAsBitVect( mol,2 ) for mol in mols ] mol = mols[1] mol.GetSubstructMatch( Chem.MolFromSmarts( "N[H]" ) ) atoms = [atom for atom in mol.GetAtoms()] def getev( mol ): if mol.GetNumConformers() >= 1: matches = mol.GetSubstructMatches( Chem.MolFromSmarts( "N[H]" ) ) conf = mol.GetConformer() theta = rdMolTransforms.GetDihedralDeg( conf, matches[0][1], matches[0][0], matches[1][0], matches[1][1] ) temp_phi1 = 180 - rdMolTransforms.GetAngleDeg(conf, matches[1][0], matches[0][0], matches[0][1] ) temp_phi2 = 180 - rdMolTransforms.GetAngleDeg(conf, matches[0][0], matches[1][0], matches[1][1] ) if temp_phi1 >= temp_phi2: phi1 = temp_phi1 phi2 = temp_phi2 else: phi1 = temp_phi2 phi2 = temp_phi1 r = rdMolTransforms.GetBondLength( conf, matches[0][0], matches[1][0] ) return theta, phi1, phi2, r else: print( "No conformer!" ) def transform_cartegian( theta, phi1, phi2, r ): theta = np.deg2rad( theta ) phi1 = np.deg2rad( phi1 ) phi2 = np.deg2rad( phi2 ) x = np.sin( theta ) * np.sin( phi1 ) * np.sin( phi2 ) *r y = np.sin( theta ) * np.sin( phi1 ) * np.cos( phi2 ) *r z = np.sin( theta ) * np.cos( phi1 ) *r t = np.cos( theta ) *r return x, y, z, t def get_dist(v1,v2): v1 = np.asarray( v1 ) v2 = np.asarray( v2 ) delta = v1 - v2 d = np.linalg.norm( delta ) return d def calc_distance( mol1, mol2 ): theta1, phi11, phi21, r1 = getev( mol1 ) theta2, phi12, phi22, r2 = getev( mol2 ) cart1 = transform_cartegian( theta1, phi11, phi21, r1 ) cart2 = transform_cartegian( theta2, phi12, phi22, r2 ) d = get_dist( cart1, cart2 ) return d f = open( 'ditance_sim.txt','w' ) f.write( 'idx1,idx2,distance,ECFP4sim\n' ) dataset = [] for i, mol in enumerate( mols ): d = calc_distance(mols[3],mol) sim = DataStructs.TanimotoSimilarity( fps[3], fps[i] ) if d < 2: print( "idx_"+str(3), "idx_"+str(i), "distance_"+str(d), "tanimoto"+str(sim) ) else: pass f.write( '{},{},{},{}\n'.format( 3,i,d,sim ) ) dataset.append([d,sim]) f.close()
Out put is following.
idx_3 idx_1 distance_1.48017700543 tanimoto0.47058823529411764 idx_3 idx_3 distance_0.0 tanimoto1.0 idx_3 idx_6 distance_1.40146560569 tanimoto0.6666666666666666 idx_3 idx_7 distance_0.0619944778129 tanimoto0.6 idx_3 idx_13 distance_1.85179121146 tanimoto0.5 idx_3 idx_14 distance_1.80235655185 tanimoto0.5
OK, plot these results.
df = pd.DataFrame(dataset, columns=['dist','sim']) g=sns.lmplot('dist','sim',df, fit_reg=False) plt.show(g)
Finally, I got the image.
Some molecules showed similarity scores are low ( ~0.5) but high distance similarity ( 1.~2.).
Is it useful for hopping ?
I want to try more large data sets ASAP.
*All code is written in Python, and pushed my code to my github repo!
https://github.com/iwatobipen/rdkit_evp