Make molecule mesh data #RDKit #chemoinformatics #meshlab

I have an interest to predictive model build with 3D compound information. Pytorch3d and open3d seems attractive package for me. However, to use the package, I need to convert molecular information to 3D data such as pointcloud etc.

At first I tried it to use openbabel because recent version of openbabel can convert molecule from MDL molformat to VDW pointcloud xyz format. It worked well but it was difficult to add color information such as atom kind.

So I gave up to use openbabel and shifted pymol.

I installed open source pymol from source code to my python3.7 env. And tried it. Fortunately pymol can save surface 3D data as wrl format. OK sounds good ….. Wait! Unfortunately open3d can’t read wrl format file so I need to convert wrl file to pyl (point cloud format) format.

Finally I found the solution. Use meshlab! It is easy to install meshlab to linux. Just use apt. ;)

$sudo apt -yV install meshlab

Type meshlab in the terminal meshlab application will launch. And the app can convert file format. But I would like to it from python. So I used meshlabserver from subprocess.

meshlabserver is useful for batch process. Following code is an example for reading 3d mesh data generated from rdkit.

At first, load molecule from SMILES and generate 3D conformer. The save it as MDL mol format. Then load file from pymol from library mode.

Then call ‘show surface’ and change quality of surface.

Finally save surface as wrl format. It is easy. Just call ‘save hogehoge.wrl’ in pymol. ;)

import subprocess
import open3d as o3d
from rdkit import Chem
from rdkit.Chem import AllChem
import pymol
mol = Chem.MolFromSmiles('c1ccncc1C')
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=794) # to generate same coordination of molecule
Chem.MolToMolFile(mol, 'mol.mol')
pymol.cmd.set('surface_quality', '3')'surface.wrl')

Now I could get molecular surface data as wrl. Then convert the file format. Call meshserver command with -i inputfile -o outputfile and -om vc option.

-om vc means add vertex color information to the outputfile. Following code I would like to get ply format. So -o is surface.ply.'meshlabserver -i surface.wrl -o surface.ply -om vc'.split())

Readly. Open3d can read ply format. (pytorch3d can read the file format too!)'surface.ply')
> True

Open3d has visualization method.


After calling the command above, another window will open and I can see interactive 3d pointcloud image of query molecule.

I tried several condition of surface quality settings. By using default pymol setting, very sparse data is generated.

Now ready to dive into 3D information based deep learning. ;)

3D Alignment function of RDKit #RDKit

During the UGM, I was interested in Ben Tehan & Rob Smith’s great work.
They showed me a nice example of molecular alignment with RDKit.
RDKit has several function to perform 3D alignment. In the Drug Discovery 3D alignment of ligands is important not only Comp Chem but also Med Chem. After their presentation, I talked them and they told me that GetCrippenO3A is useful for 3D alignment.
Hmm, that’s sounds interesting.
I tried to use the function.
My example code is below. Following code run on ipython notebook. To visualize 3D structure of molecules, I used py3Dmol. It can visualize multiple 3D molecules and easy to use!

import py3Dmol
import copy
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
from rdkit.Chem import rdBase
from rdkit.Chem import rdMolAlign
from rdkit.Chem import rdMolDescriptors
import numpy as np
p = AllChem.ETKDGv2()
p.verbose = True

If p.verbose set True, user can get SMARTS patterns which hit the definition of ETKDG like below.

[cH0:1]c:2!@;-[O:3][C:4]: 8 7 6 5, (100, 0, 0, 0, 0, 0)
[C:1][CX4H2:2]!@;-[OX2:3][c:4]: 3 5 6 7, (0, 0, 4, 0, 0, 0)
[O:1][CX4:2]!@;-[CX3:3]=[O:4]: 6 5 3 4, (0, 3, 0, 0, 0, 0)
[C:1][CX4:2]!@;-[CX3:3]=[O:4]: 0 1 3 4, (0, 0, 1, 0, 0, 0)
[cH0:1]c:2!@;-[O:3][C:4]: 3 5 10 11, (100, 0, 0, 0, 0, 0)
[C:1][CX4H2:2]!@;-[OX2:3][c:4]: 12 11 10 5, (0, 0, 4, 0, 0, 0)
[OX2:1][CX4:2]!@;-[CX4:3][OX2:4]: 10 11 12 16, (0, 0, 3, 0, 0, 0)
[cH0:1]c:2!@;-[O:3][C:4]: 3 5 10 11, (100, 0, 0, 0, 0, 0)
[C:1][CX4H2:2]!@;-[OX2:3][c:4]: 12 11 10 5, (0, 0, 4, 0, 0, 0)
[OX2:1][CX4:2]!@;-[CX4:3][N:4]: 10 11 12 17, (0, 0, 4, 0, 0, 0)
[cH0:1]c:2!@;-[O:3][C:4]: 3 5 10 11, (100, 0, 0, 0, 0, 0)

Next load molecules and generate conformers. I used cdk2.sdf which is provided in rdkit as sample.

mols = [m for m in Chem.SDMolSupplier('cdk2.sdf') if m != None][:6]
for mol in mols:
hmols_1 = [Chem.AddHs(m) for m in mols]
hmols_2 = copy.deepcopy(hmols_1)
# Generate 100 conformers per each molecule
for mol in hmols_1:
    AllChem.EmbedMultipleConfs(mol, 100, p)

for mol in hmols_2:
    AllChem.EmbedMultipleConfs(mol, 100, p)
# for Ipython notebook

To conduct GetCrippenO3A and GetO3A, I calculate crippen_contrib of each atom and MMFF params of molecules.

crippen_contribs = [rdMolDescriptors._CalcCrippenContribs(mol) for mol in hmols_1]
crippen_ref_contrib = crippen_contribs[0]
crippen_prob_contribs = crippen_contribs[1:]
ref_mol1 = hmols_1[0]
prob_mols_1 = hmols_1[1:]

mmff_params = [AllChem.MMFFGetMoleculeProperties(mol) for mol in hmols_2]
mmff_ref_param = mmff_params[0]
mmff_prob_params = mmff_params[1:]
ref_mol2 = hmols_2[0]
prob_mols_2 = hmols_2[1:]

OK Let’s align molecules and visualize them!
I retrieved the best score index from multi conformers of each molecule and added viewer.
For crippenO3A…

p_crippen = py3Dmol.view(width=600, height=400)
p_crippen.addModel(Chem.MolToMolBlock(ref_mol1), 'sdf')
crippen_score = []
for idx, mol in enumerate(prob_mols_1):
    tempscore = []
    for cid in range(100):
        crippenO3A = rdMolAlign.GetCrippenO3A(mol, ref_mol1, crippen_prob_contribs[idx], crippen_ref_contrib, cid, 0)
    best = np.argmax(tempscore)
    p_crippen.addModel(Chem.MolToMolBlock(mol, confId=int(best)), 'sdf')

For O3A…

p_O3A = py3Dmol.view(width=600, height=400)
p_O3A.addModel(Chem.MolToMolBlock(ref_mol2), 'sdf')
pyO3A_score = []
for idx, mol in enumerate(prob_mols_2):
    tempscore = []
    for cid in range(100):
        pyO3A = rdMolAlign.GetO3A(mol, ref_mol2, mmff_prob_params[idx], mmff_ref_param, cid, 0)
    best = np.argmax(tempscore)
    p_O3A.addModel(Chem.MolToMolBlock(mol, confId=int(best)), 'sdf')

In my example, both methods shows good results. To check the details, I will calculate ShapeTanimoto and/or RMSD etc.

In summary, rdkit has many useful functions not only 2D but also 3D. I would like to use this function in my project.

All code of the post, I uploaded my github repo.