Peptide design x Deep learning

You know recurrent neural network (RNN) is universally used in machine learning for natural language, handwriting, speech and also chemistry.
Recently there are lots of reports that use RNN against SMILES strings to solve chemoinformatics problems. Today I read a short article published from Prof. Gisbert Schneider’s group.
URL is below.
https://pubs.acs.org/doi/10.1021/acs.jcim.7b00414

They applied RNN ( LSTM ) for designing of antimicrobial peptides(AMPs). The strategy is basic. First added tag to peptide sequence and padded fixed length. Then encoded one hot vector.
I think key point of their method is selection of training peptides. They removed the sequences that containing Cys because Cys residues potentially forming S-S bridges. It will complicate problems.

Finally they evaluate trained model and the model generate novel peptides that have suitable hydrophobic nature and length.
I think their strategy (remove Cys residues) is nice and fit to RNN.
BTW, regarding the method machine learns peptides as bunch of strings but does not lean features of each amino acid. This is same as SMILES in chemoinformatics area.
I have no answer about it.

If reader who is interested in the approach you can get source code from following URL.
https://github.com/alexarnimueller/LSTM_peptides

Advertisements

Applications of Fluorine atom in Drug discovery

Some years ago, there was good review in drug discovery about the applications of fluorine. The perspective was published by researchers in BMS. There were many informations about fluorine based on their experience and published data. I think this is still useful for Med Chem.
It was published in 2015 from ACS.
https://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.5b00258

And recently same author who is researcher in BMS reported new review about the review!
https://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.7b01788
I skimmed the review today (59 page! Too long for me ;-)). There are some examples that were reported in previous review but there are lots of new insights and examples of fluorine. The article mainly focused on
Bioisosteric replacement of molecules with fluorine. Bioisosteric replacement is often used in drug discovery to not only maintain potency but also improve metabolic stability, solubility or any parameters.
For example, the author describes about replacement from tert-butyl group to tri-fluoro cyclopropane analogue in “Table3”. It was interesting for me because it is not simple replacement, from tert-Bu to try-fluoro-dimethly group. Also there are some same replacement examples in different protein targets.

Strategy of metabolic block with fluorine atoms is some time easy to understand and medicinal chemists try to introduce fluorine atoms in their compounds. But application of fluorine is not limited in the strategy. An interesting examples are described in the article.
Introduction Fluorine atom in aromatic ring in Compound 174 can improve solubility of the compound from 15mg/mL to >500 mg/ml. The effect of the fluorine is not clear but the author describes the fluorine atom polarizing the adjacent Nβˆ’H affects the solubility.

And I surprised because fluorine atom strategy is also effective for peptide drug discovery. In table 29, fluoro- derivatives of 36-residue peptide derived from amino terminus of human parathyroid hormone (hPTH) have binding potency for PTH receptor. If there are lots of cost effective fluorinated amino acids are available, can we design more potent peptide derivatives ?

New finding of fluorine effects creates new strategy for drug design. And sometime it is needed new chemistry to make fluorinated building blocks or conduct fluorination reactions.

Medicinal chemists need to catch up both new strategy for drug design and synthetic chemistry I think.

Rational design of GPCR biased Ligand

GPCR is one of druggable target. GPCR activation controls many networks of signaling pathways, which for most receptors are mediated by both G proteins and beta-arrestins. Different signaling pathways give different effects. To avoid side effects from G protein signals, designing beta-arrestins selective ligand is useful strategy for drug discovery. And there are lots of reports about biased-ligand from a few years ago.
i.e.
https://www.sciencedirect.com/science/article/pii/S0165614714000698

I am interested in these area and following article found.
“structure-inspired design of b-arrestin-biased ligands for aminergic GPCRs”
https://www.nature.com/articles/nchembio.2527

The authors design selective biased ligand of D2 receptor by using homology modeling/SBDD and MD.

At first they focused in to TM5 and EL2 region where are important for G protein/beta arrestin selectivity. And design new molecule from Aripiprazole, replace from di-chloro to indole moiety (Compound 1). The compound 1 was biased.
Next they tried to design substituted analogue of compound1 and got clear SAR of the substituents. Also they performed MD simulation about the indole motif and revealed the effect of the substituents.

Finally they could rationally design more selective biased ligand than initial compound 1 Fig5. Bias index is 20 vs 2 (compound7 vs compound1)

It was interesting for me because all molecules have quite similar structure but little difference affect protein-ligand contacts and can control their signaling pathway!

And computational approach helps rational biased drug design. I feel Low-molecular drug discovery is still exciting area of science.

BTW, in the article Aripiprazole is used for starting point.
Aripiprazole is one of major drug for schizophrenia and bipolar disorder. And Rexulti is also approved drug for schizophrenia and major depression. Structural difference of these molecules is a tail part, di chloro benzene or benzothiophene.
https://en.wikipedia.org/wiki/Aripiprazole
https://en.wikipedia.org/wiki/Brexpiprazole

These compounds show different pharmacological profiles.
Also there are difference in metabolic profiles.
Receptor Rexulti Abilify(Ki nM)
5-HT1A 0.12 5.6
5-HT2A 0.47 8.7
5-HT2B 1.9 0.4
D2 0.3 1.6
D3 1.1 5.4
H1 19 27.9
a1b 0.17 34.4
a2c 0.59 37.6

I am now interested in the patent strategy. I will check it.

PKPD in R.

You know, to drug development understanding PKPD is important.
I’m not DMPK dept. but I think it’s better to know about basic PKPD theory.
There are some packages about pkpd analysis in R.
And I found cool library developed ronkeiser named “PKPDsim”.
http://ronkeizer.github.io/PKPDsim/
This library can integrate shiny, so user can calculate PKPD on the fly!
I used the library today.
I following code is almost same as document.
Following code is simulation of 1 compartment model, oral dose.
pk_1cmt_oral is defined in source code.

library("PKPDsim")
library("ggplot2")
p <- list(CL=1, V=10, KA=0.5)
pk1 <- new_ode_model("pk_1cmt_oral")
r1 <- new_regimen( amt=100,
                   times=c(0,12,24,36)
                   )
dat <- sim_ode(ode = "pk1",
               par = p,
               regimen=r1
               )
plt <- ggplot( dat, aes(x=t, y=y) ) + geom_line() + facet_wrap(~comp)
print(plt)

Now I got following image.
1 means elimination phase, 2 means absorption phase.
Rplot

Then run shiny, and simulate on the fly.

sim_ode_shiny(ode='pk1', par=p, regimen=r1)

Now web browser launched…
Screen Shot 2015-10-21 at 11.13.04 PM

I think it’s good library to study or simulate PKPD.

Visualizing the process of lead optimization

Some time we set milestones to management of portfolio, or/and to check the progeress of projects.
These data were reported document, power point slides etc, so it’s difficult to grasp situation of LO timely.
Researchers at GSK published a solution of visualize LO process.
It was impressive for me.
Link is here.
http://www.ncbi.nlm.nih.gov/pubmed/26262898

They called “LO telemetry” that shows time course of total risk of compounds.
Total risk is calculated based on potency of each target, ADME, Tox and physchem profiles.
Ideally, total risk will decreased progress of project. But, there are a lot of problems in drug discovery project (at least for me! πŸ˜‰ ).
Fig5 shows one of the example.
The figure shows progress of lead optimization and design entropy(chemical diversity).
Design entropy is suddenly increased because of Tox problems. PhysChem prop risk slightly increased also.
To avoid tox problem(adverse effect) chemist think about change of chemical series or dynamic change of structure. It risk to loss of potency, but Fig 5 shows there strategy keep row score of pharmacological risk.

The paper reported that LO project team can check the telemetry. It tells team about bottlenecks and progress of there project.
Also the system can use portfolio management.
It useful to decision make, motivate the team.
On the other hand, the telemetry provides a vivid description of each projects.
How do you think about metrics of Lead Optimization.

Passport for compound.

I was interested in the title.
“Compound Passport Service”
http://dx.doi.org/10.1016/j.drudis.2015.06.011
AZ made passport for compound to manage compound rights tracking.

The system can manage status of compounds, like ownership, permission and structure shared.

I really impressed with the concept and system because I think that management of compound(and right) logistics is key factor in Drug Discovery.

I want to develop seamless compound logistics system and tracking system of medicinal chemistry…

How to visualize QSAR model.

I often discuss with other chemist(s) about QSAR.
And sometime they told me …”QSAR is useful tool for drug discovery, but I don’t understand it. Because QSAR model (i.e. ML) is hard to understand why the compound is good ?”
Hmm, I agree his opinion.
SVM, NB, RF etc are very useful but these models are black box. So, it difficult to understand effect of substructures to the moldes.
JΓΌrgen Bajorath et al. challenged to solve the gap and published interesting paper in J. Chem. Inf. Model.
http://pubs.acs.org/doi/abs/10.1021/ci500410g

They described in the paper…

understanding why a compound has undesirable ADME cahracterisitcs is just as important as knowing that it(ADME prediction) does.

I like this phrase.

They developed python library named nbvis that depend on scikitlearn and matplotlib.
The library can visualise contribution of each features of vectors.
I think the key point of the method is that the author used MACCSkeys to build model.
Because MACCSkey is easy to understand for chemist.
I wrote demo_code using RDKit.
https://github.com/iwatobipen/chemo_info/tree/master/modelviz
Sample data was downloaded following ftp.
ftp://ftp.ics.uci.edu/pub/baldig/learning/Sutherland/
And added Class properties.(I set active flag “IC50_uM < 0.1 is active”)
At first, I set arguments 'names' and 'groups'.
Then wrote sample script like following.

import nbviz
import numpy as np
import sys
import maccskey
from rdkit import Chem
from rdkit.Chem import MACCSkeys
from sklearn.naive_bayes import BernoulliNB


def calc_MACCS_fp( mol ):
	mol_fp =list( MACCSkeys.GenMACCSKeys( mol ).GetOnBits() )
	mol_fp_vec = np.zeros( 167, )
	mol_fp_vec[ mol_fp ] = 1
	return mol_fp_vec

def make_fp_array( mols ):
	fp_array = [ calc_MACCS_fp( mol ) for mol in mols ]
	return fp_array

mols = [ mol for mol in Chem.SDMolSupplier( sys.argv[1] ) ]
X = make_fp_array( mols )
Y = [ float(mol.GetProp( "Class" )) for mol in mols ]

model = BernoulliNB( alpha=0.1 )
model.fit( X[1:], Y[1:] )
conditional_probs = np.exp( model.feature_log_prob_ )
prior = np.exp( model.class_log_prior_[1] )
print 'condtional feature prob', conditional_probs
print 'class prior', prior
nbviz.visualize_model( conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

nbviz.visualize_prediction( X[0], conditional_probs, prior, names=maccskey.names, groups=maccskey.groups )

Let,s run script!

modelviz iwatobipen$ python view_model_demo.py mol_viz_demo/cox2_test.sdf 

Then two figures generated.
Red and blue colour of circles indicate that positive / negative influence of features and distance indicate that log odds ratio.
The approach is useful for discussion, because the figure provide information to chemists why the model indicate the substructures are effective.
But, it hard for me to visualise each targets….

figure_0

figure_1