Is VR useful for drug discovery?

I read the article about the VR application in drug discovery. I felt it is very interesting approach because it allows chemists to see molecules directly.

Also there are many tools and services to use same approach. Now chemists can dive in to a protein pocket and look deeply around the site.

Some articles said that VR makes drug design process more intuitively. Hmm… is it true? Of course the approach provides new opportunity to view the binding site or any other thing. It will be exciting and will be easy to understand 3D structure. Just like “Don’t think feel”…. It is opposite to AI( Machine Learning ) driven drug discovery. I think most of drug design process is not intuitive. And current VR system lacks the sense of touch the feel. User can not feel the repulsion or attraction between ligand – protein interaction directly.
If VR drug design works very well, it indicates there are many elements which are not still defined as descriptors or energy I think. (It just my opinion…)

From AI side, I found very interesting article about making “SAKE” with AI.
The article describes a Toji’s challenge to making “SAKE” with AI. Toji is the chief brewer at a sake brewery.
They collected many data from their brewing process and trained AI with the data.

These technologies are progressing rapidly, so I might be necessary to change my opinion soon.
VR and AI is very attractive and interesting area for me. I would like to follow the fashion.


Do we need measure metabolic stability in chiral form?

Recently the importance of rsp3 ratio is increasing because of accessing designed space, improving physchem properties such as solubility etc.
However, accessibility of chiral compound is difficult due to synthetic accessibility or lack to chiral separation conditions.
As you know, sometime biological activity is quite different between enantiomers. It is as same as ADMET properties. So, how about importance of the chirality in metabolic stability?
Today I found short letter from Merck’s researchers.
“Interpretation of in vitro metabolic stability studies for racemic mixtures”
They analyzed in house DMPK data and conduct simulation. Finally they concluded that the risk of misinforming project teams through generation of metabolic stability data on racemic mixtures is low.

In the Figure 4 of the article shows that frequency of compounds which shows 10 times difference of metabolic stability between R and S enantiomers!

For QSAR modeler it is worth to know the data. BTW, for project member small difference of molecular properties are very important even if the difference is small (2 times). I think drug designer is required balance, to see things from a wide point of view and to see things from a specific view. I need improve myself more and more…

Label Free ubiquitin assay system

Here is a new report about ubiquitin pathway system.
Recently ubiquitin system is becoming attractive drug targets. Known assay systems such as ELISA and SDSPAGE has limitation for through put and FRET is depended on the fluorescent.
The author developed and reported new HTS system to overcome the issue.

They use MALDI-TOF (rapifleX) with N15-labeled ubiquitin for the assay. N15-labeled ubiquitin is used for internal standard.
This assay system detects consumption of mono-ubuquitin directly.

To detect the covalent binder that binds to cys in ubiquitins, they recommends to use (tris(2-carbox- yethyl)phosphine instead of DTT or beta-mercaptoethanole( BME) for conducting the enzymatic reaction. And they conduct the assay with high ATP concentration to reduce the likelihood of identifying ATP analogs as inhibitors.
Well-designed system.

They performed HTS assay with 1430 FDA approved compounds as library and MDM2, ITCH and HOIP as E3 ligase.

Finally they got some hit compounds.
resveratrol showed week UBE1/UBE2L3/HOIP pathway inhibition activity. Hmm…..
bendamustine shows high potency.
It seems irreversible binder I think.

Label free high though put assay is useful to reduce the risk of false positive and analyze ligand-protein interaction directly.
Technology and science is moving so fast….

The power of synthetic robot in organic chemistry

Here is amazing paper from Leroy Cronin’s group.
I read the article in this morning and very excited. I know organic synthesis of course.
Conduct the reaction then work up and do purification and analyze product with NMR, IR, MS etc…

The approach in the article is very different!
You can see image from following link.

Their platform seems all in one the system perform PDCA in one place even if NMR analysis. The author performed reaction condition optimization with the automation system and machine learning.
They build a reactivity classification model and the model could predict reactivity with high accuracy. I think this technology is not only effective for reaction condition optimization but also for compound optimization in drug discovery.

They reported more than 5000 experimental results. I would like to know how long does it take to collect these data. Ultra high throughput synthesis is very attractive area for me.

And also the author published automated synthesis with 3D printer!
They made reaction vessel with 3D printer and conduct several step reactions to make ibprofen. The article was published in 2016.

Hmm… Organic synthesis will be Lego near the future. ;-)

AMES classification with WL graph kernel #RDKit

I often feel it difficult for me to implement algorithm from zero-base… I need to more practice. ;-)
BTW, recently I can find many articles about application of graph theory for chemoinformatics.
I found some interesting articles and they provides useful packages in github!

Today, I tried a library named Grakel.
You can find original article from the URL below.

I used the package and compared the performance to traditional SVC. Open AMES dataset is used for following test.
My code is below. The Grakel package has many algorithms and easy to use for calculation of graph kernel. I calculated WL graph kernel with Adjacency matrix from RDKit and built predictive model. At the same time, I built tradicional SVC model with ECFP4(Morgan Finger print radi=2).

To compare the results, it is interesting for me that WL graph kernel worked well even if the kernel does not have details for the molecules such as charge, num of hydrogen etc.
Is it means that Graph based model is powerful? This is only one experience for the descriptor.
I would like to try any other dataset.

These model is based on feature of ligand and not include protein information. For the real world, drug discovery process is needed many informations not only ligands, but also proteins.

I would like to know possibility of graph based approach for chemoinformatics.

from grakel import GraphKernel
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
import numpy as np
import argparse
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

def getparser():
    parser = argparse.ArgumentParser('argparser')
    parser.add_argument('input', help='file path and name of input')
    parser.add_argument('prop', help='properties for predict')
    return parser.parse_args()

def molg_from_smi(smiles):
    mol = Chem.MolFromSmiles(smiles)
    atom_with_idx = { i:atom.GetSymbol() for i, atom in enumerate(mol.GetAtoms())}
    adj_m = Chem.GetAdjacencyMatrix(mol, useBO=True).tolist()
    return [adj_m, atom_with_idx]

def molg_from_rdkit(mol):
    atom_with_idx = { i:atom.GetSymbol() for i, atom in enumerate(mol.GetAtoms())}
    adj_m = Chem.GetAdjacencyMatrix(mol, useBO=True).tolist()
    return [adj_m, atom_with_idx]

def mol2fp(mol):
    fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2)
    arr = np.zeros((1,))
    DataStructs.ConvertToNumpyArray(fp, arr)
    return arr

if __name__=='__main__':
    args = getparser()
    mols = [mol for mol in Chem.SDMolSupplier(args.input) if mol != None]
    X = [molg_from_rdkit(mol) for mol in mols]
    Ames_dict = {'mutagen':1, 'nonmutagen':0}
    Y = [ Ames_dict[mol.GetProp('Ames test categorisation')] for mol in mols]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y)

    gk = GraphKernel(kernel=[{"name": "weisfeiler_lehman", "niter": 5},{"name":"subtree_wl"}], normalize=True)
    K_train = gk.fit_transform(X_train)
    K_test = gk.transform(X_test)

    gclf = SVC(kernel='precomputed'), Y_train)
    y_pred_g = gclf.predict(K_test)

    from sklearn.metrics import classification_report
    from sklearn.metrics import confusion_matrix
    rep = classification_report(Y_test, y_pred_g)
    print('WL graph kernel')
    print(confusion_matrix(Y_test, y_pred_g))


    X = [mol2fp(mol) for mol in mols]
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
    clf = SVC(C=20.), Y_train)
    y_pred = clf.predict(X_test)
    rep = classification_report(Y_test, y_pred)
    print(confusion_matrix(Y_test, y_pred))

WL graph kernel
[[381 125]
[ 85 494]]
precision recall f1-score support

      0       0.82      0.75      0.78       506
      1       0.80      0.85      0.82       579

avg / total 0.81 0.81 0.81 1085

[[381 110]
[111 483]]
precision recall f1-score support

      0       0.77      0.78      0.78       491
      1       0.81      0.81      0.81       594

avg / total 0.80 0.80 0.80 1085

real 0m40.446s
user 1m49.922s
sys 0m3.074s