Build peptide from monomer library from ChEMBL #RDKit #ChEMBL #Chemoinformatics

Recently, ChEMBL ver. 30 is released. I’ve installed it in my PC and added rdkit schema ;) And current chembl ftp site provides monomer library of HELM.

@magattaca posted really useful blog post in Japanese The post describes about HELM, its monomer and render these monomers.

I’m interested in how to build peptide from these monomers. And recently, @dr_greg_landrum introduced how to build molecule from parts of fragments with molzip function.

So I thought that by using molzip function and monomer library which is provided from ChEMBL, it will be easy to build peptide from monomers.

To do that, I defined 3 functions,
1) combine_fragments which combines two monomers with N-terminal and C-terminal as an amide.
2) make peptide which build peptide from list of monomers.
3) cap_terminal which caps terminal of peptide.

The main functions are shown below.

def combine_fragments(m1, m2):
    m1 = Chem.Mol(m1)
    m2 = Chem.Mol(m2)
    for atm in m1.GetAtoms():
        if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R2':
    for atm in m2.GetAtoms():
        if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R1':
    return molzip(m1, m2)

def make_peptide(monomerlist):
    monomerlist = copy.deepcopy(monomerlist)
    for idx, monomer in enumerate(monomerlist):
        if Chem.MolToSmiles(monomer).count("*") == 1:
        if idx==0:
            res = monomer
            res = combine_fragments(res, monomer)
    return res

def cap_terminal(m):
    m = Chem.Mol(m)
    n_term = Chem.MolFromSmiles('CC(=O)[*:1]')
    c_term = Chem.MolFromSmiles('CO[*:2]')
    for atm in m.GetAtoms():
        if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R1':
        if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R2':
    res = molzip(m, n_term)
    res = molzip(res, c_term)
    return res

Here is an example of molecule from two monomers.


And here is an example of peptide from more than two monomers. Right bottom molecule is a final products. My function omits monomers which has only one “*”(attachment points).

It works fine. I’ll add ring closure function for making macro cyclic peptides.

The monomer library has lots of amino acids so it’s interesting data set for chemoinformatics.

Whole code is uploaded my gist.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: