Recently, ChEMBL ver. 30 is released. I’ve installed it in my PC and added rdkit schema ;) And current chembl ftp site provides monomer library of HELM.
@magattaca posted really useful blog post in Japanese https://magattaca.hatenablog.com/entry/2020/11/25/004829. The post describes about HELM, its monomer and render these monomers.
I’m interested in how to build peptide from these monomers. And recently, @dr_greg_landrum introduced how to build molecule from parts of fragments with molzip function.
So I thought that by using molzip function and monomer library which is provided from ChEMBL, it will be easy to build peptide from monomers.
To do that, I defined 3 functions,
1) combine_fragments which combines two monomers with N-terminal and C-terminal as an amide.
2) make peptide which build peptide from list of monomers.
3) cap_terminal which caps terminal of peptide.
The main functions are shown below.
def combine_fragments(m1, m2):
m1 = Chem.Mol(m1)
m2 = Chem.Mol(m2)
for atm in m1.GetAtoms():
if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R2':
atm.SetAtomMapNum(1)
for atm in m2.GetAtoms():
if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R1':
atm.SetAtomMapNum(1)
return molzip(m1, m2)
def make_peptide(monomerlist):
monomerlist = copy.deepcopy(monomerlist)
for idx, monomer in enumerate(monomerlist):
if Chem.MolToSmiles(monomer).count("*") == 1:
continue
if idx==0:
res = monomer
else:
res = combine_fragments(res, monomer)
return res
def cap_terminal(m):
m = Chem.Mol(m)
n_term = Chem.MolFromSmiles('CC(=O)[*:1]')
c_term = Chem.MolFromSmiles('CO[*:2]')
for atm in m.GetAtoms():
if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R1':
atm.SetAtomMapNum(1)
if atm.HasProp('atomLabel') and atm.GetProp('atomLabel') == '_R2':
atm.SetAtomMapNum(2)
res = molzip(m, n_term)
res = molzip(res, c_term)
return res
Here is an example of molecule from two monomers.

And here is an example of peptide from more than two monomers. Right bottom molecule is a final products. My function omits monomers which has only one “*”(attachment points).

It works fine. I’ll add ring closure function for making macro cyclic peptides.
The monomer library has lots of amino acids so it’s interesting data set for chemoinformatics.
Whole code is uploaded my gist.