Comparison of rdMMPA cut rules #RDKit #Chemoinformatics #memo

RDKit has code for making mmp in Contrib folder. And also rdkit provides rdMMPA class which can make MMP which is based on user defined cutting rules.

Today I checked the rule and modified it with GetSubstructMatches.

Default cutting rule is described in rdMMPA document and it’s defined as SMARTS pattern.

pattern=’[#6+0;!$(=,#[!#6])]!@!=!#[]’  >> It means that Carbon(valence is 0 and not connected with double and triple bond) and Any atom which is not connected with ring bond, double and triple bond.

OK let’s see interesting molecule and check which bond will be cut.

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Draw import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem import rdMMPA
from IPython.display import display
default_patt = '[#6+0;!$(*=,#[!#6])]!@!=!#[*]'
mol = Chem.MolFromSmiles('C1CNCCN1(CCCCCCc2ccccc2)')

The default setting will cult molecules on all methylene bond so it will generate lots of fragment.

frag1 = rdMMPA.FragmentMol(mol)
> 28

I would like to cut on bond between linker (aliphatic chain) and ring. So let’s modify cutting rule.

#Check only ring and not ring connection
patt2 = '[#6+0;R;!$(*=,#[!#6])]!@!=!#[*]'

Hmm, it seems work better but bond between piperazine and methylene isn’t cut. Because patt2 defines bond between carbon(in Ring)-and any atom.

So I added additional option for the rule.

#Check only ring and not ring connection
patt3 = '[#6+0,#7+0;R;!$(*=,#[!#6])]!@!=!#[*]'

Yah, the rule can detect only ring-linker bond.

frag3 = rdMMPA.FragmentMol(mol, pattern=patt3)

It’s not still enough, consider if the molecule has long chain substituent, the rule will detect bond between ring and side chain.

#Check only ring and not ring connection
mol2 = Chem.MolFromSmiles('C1CNCCN1(CCCCCCc2ccc(CC)cc2)')
patt3 = '[#6+0,#7+0;R;!$(*=,#[!#6])]!@!=!#[*]'

To prevent the issue, I added additional option to any atom which is connected any 4 atoms which is not in ring.

patt4 = '[#6+0,#7+0;R;!$(*=,#[!#6])]!@!=!#[*;$([A!R][A!R][A!R])]'

Wow final code worked fine in this case ;) But it’s not still enough because if molecule has longer side chains, the pattern detect the bond. Any comments advice and suggestions will be greatly appreciated.

In summary, define appropriate SMARTS pattern for your interested dataset is very useful for making your original MMP dataset I think.

I uploaded today’s code on my gist.

Thanks for reading. Have a nice weekend ;)

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: