Cut molecule to ring and linker with RDKit #RDKit #chemoinformatics #memo

Sometime chemists analyze molecule as small parts such as core, linker and substituents.

RDKit has functions for molecular decomposition RECAP, BRICS and rdMMPA. It’s useful functions but these functions can’t extract directly extract core and linker from molecule.

I had interested how to do it and tried it.

Following code, Core is defined Rings in the molecule and Linker is defined chain which connects two rings.

I defined main function named ‘getLinkerbond’ which extracts bonds that connects atom in ring and atom not in ring.

Then call Chem.FragmentOnBonds function to get fragments by cutting linker bond.

Let’s see the code. First, import some libraries. And define test molecules.

import copy
from rdkit import Chem
from rdkit.Chem.Scaffolds import MurckoScaffold
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import display

mol1 = Chem.MolFromSmiles("C1CCNCC1")
mol2 = Chem.MolFromSmiles("c1ccccc1")
mol3 = Chem.MolFromSmiles("C1CC1CC")
mol4 = Chem.MolFromSmiles("C1CC1CCC2CCOC2")
mol5 = Chem.MolFromSmiles("C1CCC2CCCCC2C1")
mol6 = Chem.MolFromSmiles("OC1CC2(C1)CCCC2")
mol7 = Chem.MolFromSmiles("CC1CCN(CC1N(C)C2=NC=NC3=C2C=CN3)C(=O)CC#N")
mol8 = Chem.MolFromSmiles("c1ccccc1c2ccccc2OC")
mol9 = Chem.MolFromSmiles("CC1=C(C=C(C=C1)NC(=O)C2=CC=C(C=C2)CN3CCN(CC3)C)NC4=NC=CC(=N4)C5=CN=CC=C5")
mol10 = Chem.MolFromSmiles("C1CCCC1CCCOCOCOCOCOCc1ccccc1")
mols = [mol1,mol2,mol3,mol4,mol5,mol6,mol7,mol8,mol9,mol10]

Then define main function. I wrote helper function for the main function. is_in_samering function check the atoms which construct a bond are in same ring or not.

def is_in_samering(idx1, idx2, bond_rings):
    for bond_ring in bond_rings:
        if idx1 in bond_ring and idx2 in bond_ring:
            return True
    return False

Following code, I set atom prop at first to keep original atom index. Because GetScaffoldForMol function returns new molecule with new atom index. I would like to use original atom idex in this function.

def getLinkerbond(mol, useScaffold=True):
    res = []
    for atom in mol.GetAtoms():
        atom.SetIntProp("orig_idx", atom.GetIdx())
    for bond in mol.GetBonds():
        bond.SetIntProp("orig_idx", bond.GetIdx())
    
    if useScaffold:
        mol = MurckoScaffold.GetScaffoldForMol(mol)
        
    ring_info = mol.GetRingInfo()
    bond_rings = ring_info.BondRings()
    ring_bonds = set()
    for ring_bond_idxs in bond_rings:
        for idx in ring_bond_idxs:
            ring_bonds.add(idx)
    all_bonds_idx = [bond.GetIdx() for bond in mol.GetBonds()]
    none_ring_bonds = set(all_bonds_idx) - ring_bonds
    for bond_idx in none_ring_bonds:
        bgn_idx = mol.GetBondWithIdx(bond_idx).GetBeginAtomIdx()
        end_idx = mol.GetBondWithIdx(bond_idx).GetEndAtomIdx()
        if mol.GetBondWithIdx(bond_idx).GetBondTypeAsDouble() == 1.0:
            if mol.GetAtomWithIdx(bgn_idx).IsInRing()+mol.GetAtomWithIdx(end_idx).IsInRing() == 1:
                bond = mol.GetBondWithIdx(bond_idx)
                orig_idx = bond.GetIntProp("orig_idx")
                res.append(orig_idx)
            elif not is_in_samering(bgn_idx, end_idx, bond_rings) and mol.GetAtomWithIdx(bgn_idx).IsInRing()+mol.GetAtomWithIdx(end_idx).IsInRing() == 2:
                bond = mol.GetBondWithIdx(bond_idx)
                orig_idx = bond.GetIntProp("orig_idx")
                res.append(orig_idx)
    return res

Check the function.


for mol in mols:
    bonds = getLinkerbond(mol)
    if bonds:
        res = Chem.FragmentOnBonds(mol, bonds)
        display(res)
    else:
        display(mol)

The function cut murckoscaffold structure as default. So bonds not connect rings is not cut.

Next, case of useScaffold False. In this case all bonds outside of rings are cut.

for mol in mols:
    bonds = getLinkerbond(mol, False)
    if bonds:
        res = Chem.FragmentOnBonds(mol, bonds)
        display(res)
    else:
        display(mol)

Works fine.

It is useful for molecular decomposition at linker site I think.

Today’s whole code is uploaded my gist. Any comments and advices will be high appreciated. ;)

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
view raw cutbond.ipynb hosted with ❤ by GitHub

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “Cut molecule to ring and linker with RDKit #RDKit #chemoinformatics #memo

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.