Some days ago, I participated a seminar about drug discovery.
All presentations were nice, and I interested in one of unique method to representation chemical space, named “exit vector plot”.
ref is following
Following Ramachandran: exit vector plots (EVP) as a tool to navigate chemical space covered by 3D bifunctional scaffolds. The case of cycloalkanes
The concept is, chemical space is represented as two vectors n1, n2 with 4 parameters phi1, phi2, theta( dihedral angle ), r.
It’s like a ramachandran plot that is used to plot structure of protein.
In this work, the author calculate exit vector about several ring systems and clustering them.
I think this concept is very useful because if chemist want to replace a ring system to another one, it’s difficult to search molecules that shows low structure similarity but has high exit vector similarity.
So, if we can search molecules use exit vector similarity, it will be new way to expand chemical space.
In the paper, the author clustering molecules using Euclidean distance of exit vector. Distance is almost dissimilarity.
Now let’s try to write exit vector calculator using RDKit!
In following code, I focused in 2nd-diamine set because it was easy to define two vectors(n1, n2).
First I prepare sample sdf that has energy minimized 3D structure.(there is not in the code)
Then write code!
RDKit has cool function to calculate distance and angle in rdMolTransform class.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import AllChem from rdkit.Chem import rdMolTransforms import numpy as np from rdkit.Chem import Draw sdf = Chem.SDMolSupplier( "3d_amine.sdf", removeHs=False ) mols = [ m for m in sdf ] mol = mols mol.GetSubstructMatch( Chem.MolFromSmarts( "N[H]" ) ) atoms = [atom for atom in mol.GetAtoms()] def getev( mol ): if mol.GetNumConformers() &gt;= 1: matches = mol.GetSubstructMatches( Chem.MolFromSmarts( "N[H]" ) ) conf = mol.GetConformer() theta = rdMolTransforms.GetDihedralDeg( conf, matches, matches, matches, matches ) temp_phi1 = 180 - rdMolTransforms.GetAngleDeg(conf, matches, matches, matches ) temp_phi2 = 180 - rdMolTransforms.GetAngleDeg(conf, matches, matches, matches ) if temp_phi1 &gt;= temp_phi2: phi1 = temp_phi1 phi2 = temp_phi2 else: phi1 = temp_phi2 phi2 = temp_phi1 r = rdMolTransforms.GetBondLength( conf, matches, matches ) return theata, phi1, phi2, r else: print( "No conformer!" ) def transform_cartegian( theta, phi1, phi2, r ): theta = np.deg2rad( theta ) phi1 = np.deg2rad( phi1 ) phi2 = np.deg2rad( phi2 ) x = np.sin( theta ) * np.sin( phi1 ) * np.sin( phi2 ) *r y = np.sin( theta ) * np.sin( phi1 ) * np.cos( phi2 ) *r z = np.sin( theta ) * np.cos( phi1 ) *r t = np.cos( theta ) *r return x, y, z, t def get_dist(v1,v2): v1 = np.asarray( v1 ) v2 = np.asarray( v2 ) delta = v1 - v2 d = np.linalg.norm( delta ) return d def calc_distance( mol1, mol2 ): theta1, phi11, phi21, r1 = getev( mol1 ) theta2, phi12, phi22, r2 = getev( mol2 ) cart1 = transform_cartegian( theta1, phi11, phi21, r1 ) cart2 = transform_cartegian( theta2, phi12, phi22, r2 ) d = get_dist( cart1, cart2 ) return d
Then run code.
for i, mol in enumerate( mols ): d = calc_distance(mols,mol) print( 3, i, d )
I got following response.
idx1 idx2 distace
3 0 4.90798883048
3 1 1.48017700543
3 2 6.08814365316
3 3 0.0
3 4 3.93873942603
3 5 5.45068906229
3 6 1.40146560569
3 7 0.0619944778129
3 8 6.92182054937
3 9 6.22560061852
3 10 6.19913412456
3 11 5.75093874617
3 12 5.38531808804
3 13 1.85179121146
3 14 1.80235655185
idx1, 6, 7 are near to idx4.
Draw.MolsToGridImage( [mols,mols, mols, mols] ) Draw.MolsToGridImage( mols )
I got following images.
It’s seems works fine.
I uploaded the code to github.