Virtual Screening(VS) with over hundred million compound in a few days! #Chemoinformatics

Recently virtual screening often is used for first screening for drug discovery project. Because it can screen huge amount of compound very fast compared to wet screening. I thought docking score is not reflect binding affinity of ligand and target protein. But today I read interesting article and I changed my mind.
The title was ‘Ultra-large library docking for discovering new chemotypes’, URL is below.

The author focused on make-on-demand library which is compound library designed from 70,000 building blocks from Enamine and 130 well-charactalized reactions. So if there is hit in virtual screening, the hit compound can synthesis with high probability.

The authors conducted VS against two targets AmpC and dopamine D4 receptor with over 100 million compounds! In the case of D4 receptor, They used 138 million molecules for VS and it took only 1.2 calendar days! Of course they used many cores but it is exciting for me. From their analysis performance of virtual screening correlates number of compounds so bigger is better.

After the VS, they selected 589 molecules by docking score and 549 were successfully synthesized! Wow very high success rate amazing!!! ;)
Interestingly the authors analyzed performance of compound selection by human and machine.
Extended Data Fig. 7 shows comparison of hit rates achieved by combined docking score and human prioritization(visual inspection) compared to the rates achieved by docking score alone.
It is very interesting for me that hit rate from machine predictions seems slightly higher than human predictions. And compounds selected by human had better binding affinity. I surprised that compound selection by using only docking score is not so bad.

And finally they could get new chemotype with high potency. It is worth to know that huge amount of VS give an opportunity to accessing new chemical space and to get chance to get high quality hit compounds.

And I also amazed that delivery time of over 500 compounds was only six weeks. Of course all compounds has high purity (>90%).

Massive computing power is useful for drug discovery.

BTW, Mishima.syk #13 will be held on this weekend. If reader who will be participate the meeting and would like to enjoy hands-on session, I recommend to build conda python 3.6 environment with pytorch 0.3 and rdkit. ;). If it is difficult, I will provide google colab env.


switch xcode version

Today I updated xcode version from 7.x to 8.x. After upgrade xcode, keras cause following error.

iwatobipen$ python
Python 3.5.2 |Anaconda 2.4.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
th=79 :
nvcc fatal   : The version ('80000') of the host compiler ('Apple clang') is not supported


Opps ! It is because Cuda version 8 is not supported xcode 8.x.
So, to fix the error, I switched xcode version.

iwatobipen$ sudo xcode-select --switch /Applications/
iwatobipen$ sudo xcode-select --switch /Applications/


iwatobipen$ ipython
Python 3.5.2 |Anaconda 2.4.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import keras
Using Theano backend.

In [2]: 

Worked fine !