タグ: highcharts

Visualize chemical space using RDKit-Scikitlearn-Highchart

I often use Principle component analysis (PCA) to visualize chemical space. PCA is useful to describe chemical diversity. I wonder if I could project new designed molecules to reference current chemical space.
I think that sci-kitlearn and rdkit is suitable to do that. Recently I often use seaborn to visualization, but today I used highcharts to visualize data. Because highchart can handle data interactively in web app.
Flask was used web-app framework, and rdkit was used to fingerprint calculation.

My first example was following. All function and data were embedded ‘app.py’.
Structures.sdf is data of DrugBank.
Following code is ….
1st- calculate fingerprints about reference molecules and test molecules.
2nd- Do PCA against reference molecules.
3rd- projection test mols to reference molecules chemical space.
4th- Convert molecue to svg text. ( It is nice work of RDKIT! )
5th- pass datas( PC1, PC2, SVG ) to highcharts.js
To convert molecules to SVG is important to visualize molecules in tooltip.

from flask import Flask, render_template
app = Flask( __name__ )

from rdkit import Chem
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import MolDraw2DSVG
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem import rdDepictor, Descriptors, AllChem, DataStructs
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
import pickle
#structures from drug bank
drugs = [ mol for mol in Chem.SDMolSupplier( "structures.sdf" ) if mol != None ][:500]
#testset
test = [ mol for mol in Chem.SDMolSupplier( "testset.sdf" ) if mol != None ]

def calc_fp_arr( mols ):
    fplist = []
    for mol in mols:
        arr = np.zeros( (1,) )
        fp = AllChem.GetMorganFingerprintAsBitVect( mol, 2 )
        DataStructs.ConvertToNumpyArray( fp, arr )
        fplist.append( arr )
    return np.asarray( fplist )

def getsvgtext( mol ):
    d2d = rdMolDraw2D.MolDraw2DSVG(200,200)
    d2d.DrawMolecule( mol )
    d2d.FinishDrawing()
    svg = d2d.GetDrawingText()
    return svg.replace( "svg:","" )

drugfparr = calc_fp_arr( drugs )
testfparr = calc_fp_arr( test )

#do PCA
pca = PCA( n_components=2 )
pca.fit( drugfparr )
f = open( 'drugpca.pkl', 'wb' )
pickle.dump( pca, f )
f.close()

drugsX = pca.transform( drugfparr )
data1 = [ { 'x' : drugsX[i][0], 'y':drugsX[i][1], 'svg': getsvgtext( drugs[i] ) } for i in range(len(drugsX)) ]
testX = pca.transform( testfparr )
data2 = [ {  'x': testX[i][0], 'y':testX[i][1], 'svg': getsvgtext( test[i] ) } for i in range(len(testX)) ]

@app.route( '/' )
@app.route( '/chart' )
def chart():
    return render_template( 'chart.html', data1 = data1, data2 = data2 )

if __name__ == '__main__':
    app.debug = True
    app.run(  )

Next, wrote template ‘chart.html’.
It’s important to load jquery at first, if highcharts is loaded at first following code did not run.
I embedded SVG in tooltip, so useHTML set true.
And another option is almost default settings.
Highcharts can access attribute of dataset like ‘ this.point.hogehoge’.
So, I used this.point.svg to get the svgtext from dataset.

<!DOCTYPE html>
<html>
<head>

    <title> test </title>
    <script type='text/javascript' src ="{{ url_for('static', filename='jquery-2.2.4.min.js') }}"></script>
    <script type='text/javascript' src = "{{ url_for( 'static', filename='highcharts/js/highcharts.js' ) }}"  ></script>
    <script type='text/javascript' src = "{{ url_for( 'static', filename='highcharts/js/modules/exporting.js' ) }}"  ></script>

    <script>
    $(function(){
    $('#container').highcharts({
      chart :{
        type : 'scatter',
        zoomType : 'xy'
      },
      title : {
        text : 'chemical space mapping'
      },
      xAxis : {
        title : { text : 'PC1'},
        gridLineWidth : 2,
      },
      yAxis : {
        title : { text : 'PCA2'}
      },
      tooltip :{
        useHTML : true,
        formatter : function(){
          return this.point.svg
        }
      },
      series : [{
        turboThreshold:0,
        name : 'drugs',
        color : 'rgba( 223, 83, 83, .3 )',
        data : {{ data1|safe }}
      },{
        name : 'testmol',
        color : 'rgba( 119, 152, 191, .8 )',
        data : {{ data2|safe }}
      }]
    });
    });
    </script>

  </head>
  <body>
    <p> scatter plot </p></br>
    <div id = "container" style = "width:500px; height:500px;"></div>

  </body>
</html>

Then run code.

python app.py

I got interactive scatter plot.
chemicalspace_pca
Easy to zoom!!
zoom
It works fine. I pushed all code to my github repo.
https://github.com/iwatobipen/chemo_info/tree/master/highcharts_app