Visualize chemical space using RDKit-Scikitlearn-Highchart

I often use Principle component analysis (PCA) to visualize chemical space. PCA is useful to describe chemical diversity. I wonder if I could project new designed molecules to reference current chemical space.
I think that sci-kitlearn and rdkit is suitable to do that. Recently I often use seaborn to visualization, but today I used highcharts to visualize data. Because highchart can handle data interactively in web app.
Flask was used web-app framework, and rdkit was used to fingerprint calculation.

My first example was following. All function and data were embedded ‘’.
Structures.sdf is data of DrugBank.
Following code is ….
1st- calculate fingerprints about reference molecules and test molecules.
2nd- Do PCA against reference molecules.
3rd- projection test mols to reference molecules chemical space.
4th- Convert molecue to svg text. ( It is nice work of RDKIT! )
5th- pass datas( PC1, PC2, SVG ) to highcharts.js
To convert molecules to SVG is important to visualize molecules in tooltip.

from flask import Flask, render_template
app = Flask( __name__ )

from rdkit import Chem
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import MolDraw2DSVG
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem import rdDepictor, Descriptors, AllChem, DataStructs
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
import pickle
#structures from drug bank
drugs = [ mol for mol in Chem.SDMolSupplier( "structures.sdf" ) if mol != None ][:500]
test = [ mol for mol in Chem.SDMolSupplier( "testset.sdf" ) if mol != None ]

def calc_fp_arr( mols ):
    fplist = []
    for mol in mols:
        arr = np.zeros( (1,) )
        fp = AllChem.GetMorganFingerprintAsBitVect( mol, 2 )
        DataStructs.ConvertToNumpyArray( fp, arr )
        fplist.append( arr )
    return np.asarray( fplist )

def getsvgtext( mol ):
    d2d = rdMolDraw2D.MolDraw2DSVG(200,200)
    d2d.DrawMolecule( mol )
    svg = d2d.GetDrawingText()
    return svg.replace( "svg:","" )

drugfparr = calc_fp_arr( drugs )
testfparr = calc_fp_arr( test )

#do PCA
pca = PCA( n_components=2 ) drugfparr )
f = open( 'drugpca.pkl', 'wb' )
pickle.dump( pca, f )

drugsX = pca.transform( drugfparr )
data1 = [ { 'x' : drugsX[i][0], 'y':drugsX[i][1], 'svg': getsvgtext( drugs[i] ) } for i in range(len(drugsX)) ]
testX = pca.transform( testfparr )
data2 = [ {  'x': testX[i][0], 'y':testX[i][1], 'svg': getsvgtext( test[i] ) } for i in range(len(testX)) ]

@app.route( '/' )
@app.route( '/chart' )
def chart():
    return render_template( 'chart.html', data1 = data1, data2 = data2 )

if __name__ == '__main__':
    app.debug = True  )

Next, wrote template ‘chart.html’.
It’s important to load jquery at first, if highcharts is loaded at first following code did not run.
I embedded SVG in tooltip, so useHTML set true.
And another option is almost default settings.
Highcharts can access attribute of dataset like ‘ this.point.hogehoge’.
So, I used this.point.svg to get the svgtext from dataset.

<!DOCTYPE html>

    <title> test </title>
    <script type='text/javascript' src ="{{ url_for('static', filename='jquery-2.2.4.min.js') }}"></script>
    <script type='text/javascript' src = "{{ url_for( 'static', filename='highcharts/js/highcharts.js' ) }}"  ></script>
    <script type='text/javascript' src = "{{ url_for( 'static', filename='highcharts/js/modules/exporting.js' ) }}"  ></script>

      chart :{
        type : 'scatter',
        zoomType : 'xy'
      title : {
        text : 'chemical space mapping'
      xAxis : {
        title : { text : 'PC1'},
        gridLineWidth : 2,
      yAxis : {
        title : { text : 'PCA2'}
      tooltip :{
        useHTML : true,
        formatter : function(){
          return this.point.svg
      series : [{
        name : 'drugs',
        color : 'rgba( 223, 83, 83, .3 )',
        data : {{ data1|safe }}
        name : 'testmol',
        color : 'rgba( 119, 152, 191, .8 )',
        data : {{ data2|safe }}

    <p> scatter plot </p></br>
    <div id = "container" style = "width:500px; height:500px;"></div>


Then run code.


I got interactive scatter plot.
Easy to zoom!!
It works fine. I pushed all code to my github repo.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: