Make report with rdkit and matplotlib #RDKit #memo #chemoinformatics

Recently jupyternotebook is being very powerful and useful tool for chemoinformatitian. It can be able to not only analysis but also visualize data. Of course I love it.

However sometime it is not so friendly tool to medchem I think. So I think that it is nice to having reporting function for medchem.

I found some useful code in github the coude use matplotlib for making PDF. And rdkit has drawing function for MolToImage, hummmm it seems fun! I tried to make PDF report maker with rdkit and matplotlib.

Following code is just a simple example but by using the same approach, we can make report with many kinds of plot / table and compound structure.

OK, let’s dive to code! At first import library.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import RDConfig
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pgf import PdfPages
import os
from PIL import Image
import io
import numpy as np

Then off the matplotlib interactive plot function and load sample molecules. And define simmap function which calculates similarity map and weights and return the structure as PIL image object. The function makes similarity map of refmol and probmol.

mols = [m for m in Chem.SDMolSupplier(os.path.join(RDConfig.RDDocsDir, 'Book/data/cdk2.sdf'))]
for m in mols:
fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]
refmol = mols[0]
probmol = mols[1]
def simmap(refmol, probmol):
    im, score = SimilarityMaps.GetSimilarityMapForFingerprint(refmol, 
    bio = io.BytesIO()
    im.savefig(bio, bbox_inches='tight', dpi=200)
    im =
    return im, score

Next get similarity map image. I use subplt2grid function later. So I need to get image object with GetSimilarityMapForFingerprint function at first, because GetSimilarityMapForFingerprint use matplotlib.pyplot object. So plt information is reseted at each calling the function.

im1, _ = simmap(mols[0], mols[0])
im2s, scores = [], []
for i, m in enumerate(mols[:5]):
    im2, score = simmap(mols[0], m)

Almost there, let’s make report! Following example make pdf report with similarity map and some molecular property table. And save it as PDF format.

fig = plt.figure(figsize=(8.27, 11.69)) #A4 size
pdf_pages = PdfPages('report.pdf')

for i, m in enumerate(mols[:5]):
    ax1 = plt.subplot2grid((5,3),(i,0))
    im1 = im1.resize((400,400))
    ax1.imshow(im1, interpolation="catrom")
    ax2 = plt.subplot2grid((5,3),(i,1))
    ax2.imshow(im2s[i], interpolation="catrom")
    ax3 = plt.subplot2grid((5,3),(i,2))
    ax3.table(cellText=[["SimScore",f"{scores[i]:.2}"]],bbox=(0,0,1,1), cellLoc="center")
pdf_pages.savefig(fig, dpi=200)

By using same manner, it is easy to make table which has more molecular information such as biological activity, physchem prop etc. And also easy to add different type of plot such as scatter, bar, radar etc.

rdkit and matplotlib will be nice combi for chemoinformatics.

I pushed today’s code on following URL ;).