Visualize chemical space as a grid #chemoinformatics #rdkit

Visualize chemical space is important for medicinal chemist I think. Recently, Prof. Bajorath group published nice article. URL is below.

The author described new approach that combines SARMatrix and Molecular Grid maps. SARMatrics is one of the method for SAR analysis like Free Wilson analysis.

I had interest their approach because they uses molecular grid maps. I often use PCA and/or t-SNE for chemical space mapping but it is not grid.

Molecular grid maps is like SOM. To make the maps, they used J-V algorithms. The details are described in following URL.

I would like to try the mapping method.

Fortunately python package for JV-algorithm is provided in Github! The name is lapjb. And I installed it and try to use it.
My code is below.

At first, import packages and load data. Sample data came from CHEMBL.

%matplotlib inline
import matplotlib.pyplot as plt
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
df = pd.read_csv('CHEMBL3888506.tsv', sep='\t', header=0)

To make grid mapping, number of sample must be N^2. The dataset has 467 molecules, so I used 400 molecule for test. It means I will embed 20 x 20 grid space.

mols = [Chem.MolFromSmiles(smi) for smi in df.Smiles]
sampleidx = np.random.choice(list(range(len(mols))), size=400, replace=False)
samplemols = [mols[i] for i in sampleidx]
sampleact = [9-np.log10(df['Standard Value'][idx]) for idx in sampleidx]
fps = [AllChem.GetMorganFingerprintAsBitVect(m,2) for m in samplemols]
def fp2arr(fp):
    arr = np.zeros((0,))
    return arr
X = np.asarray([fp2arr(fp) for fp in fps])

Then perform PCA-t-SNE for getting chemical space and normalize the data.

size = 20
N = size*size
data = PCA(n_components=100).fit_transform(X.astype(np.float32))
embeddings = TSNE(init='pca', random_state=794, verbose=2).fit_transform(data)
embeddings -= embeddings.min(axis=0)
embeddings /= embeddings.max(axis=0)

Check the t-SNE result. Activity is used for color mapping.

plt.scatter(embeddings[:,0], embeddings[:,1], c=sampleact, cmap='hsv')

Next let’s projection chemical space to grid. Usage of lapjv is very simple. At first calculate similarity matrix with scipy cdist function. Then pass the matrix to lapjv.

grid = np.dstack(np.meshgrid(np.linspace(0,1,size), np.linspace(0,1,size))).reshape(-1,2)
from lapjv import lapjv
cost_mat = cdist(grid, embeddings, 'sqeuclidean').astype(np.float32)
cost_mat2 = cost_mat * (10000 / cost_mat.max())
row_asses, col_asses, _ = lapjv(cost_mat2)
grid_lap = grid[col_asses]

Now ready. Let’s plot grid map.

plt.scatter(grid_lap[:,0], grid_lap[:,1], c=sampleact, cmap='hsv')

It seems work well. Same color dot is located near space.

Grid plot is useful because it can avoid overlapping of each dot.

The author developed more sophisticated tool. However the source code is not disclosed. It seems very attractive for medchem ;-)

Rational drug design from computer assist is very important. But I think visualization and analysis method for medicinal chemist is equally important for drug design.

Today’s code is below.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
code example

Reader who has interest in lapjv, please try it.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: