Useful package for virtual screening #chemoinformatics #RDKit

Virtual Screening is important task of drug discovery projects. There are lots of approach for example Finger print based, substructure based and shape based screening. All approaches listed above is not only used in SBDD but also LBDD.

And there are lots of apprications to do these tasks. I wrote scripts for these task and use then. But recently I found nice package for VS named VSflow which is developed by Paul Czodrowski’s group.

It seems interesting, so I tried to use it. At first, I prepared conda env and install it.

$ gh repo clone czodrowskilab/VSFlow
$ cd VSFlow
$ conda env create --quiet --force --file environment.yml
$ conda activate vsflow
$ pip install .

After running the code, I could use vsflow command.

Next, I prepared dabase. Database can be made from any kinds of dataset but I used default set. ‘-d pdb’ option means prepare database with smiles which cames from ligandexpo.

$ time vsflow preparedb -d pdb -o pdb_ligs -np 6
**************************

 VV        VV  SSSSSSS             VSFlow
  VV      VV  SSS    SS       Virtual Screening
   VV    VV    SSSS               Workflow
    VV  VV       SSSS
     VVVV     SS    SSS
      VV       SSSSSSS

**************************

Start: 06/21/2022, 21:22:37
Running in parallel mode on 6 threads
Downloading database pdb ...
Finished downloading database
Generating database file ...
Finished in 168 seconds

real	2m48.611s
user	0m6.480s
sys	0m1.619s

Now I could get ‘pdb_ligs.vsdb’ which is pickled data for VSFlow. Next, I tried to substructure and fp sim search. I used SMILES as a query. The task done in a second.

$ vsflow substructure -smi 'c1ccnnc1' -d pdb_ligs.vsdb -o smi_sub_pdb.sdf
**************************

 VV        VV  SSSSSSS             VSFlow
  VV      VV  SSS    SS       Virtual Screening
   VV    VV    SSSS               Workflow
    VV  VV       SSSS
     VVVV     SS    SSS
      VV       SSSSSSS

**************************

Start: 06/21/2022, 21:28:50
Running in single core mode
Loading database pdb_ligs.vsdb ...
Reading query ...
Finished substructure search in 0.83285 seconds
Generating output file(s) ...
313 matches found
Finished: 06/21/2022, 21:28:51
Finished in 0.91936 seconds

SSS hit compounds are below.

Following example is similarity sarch and I made similarity map as PDF.

$ vsflow fpsim -d pdb_ligs.vsdb -smi "CC1CCN(C(=O)CC#N)CC1N(C)c1ncnc2[nH]ccc12" -o sim.sdf --pdf --simmap
**************************

 VV        VV  SSSSSSS             VSFlow
  VV      VV  SSS    SS       Virtual Screening
   VV    VV    SSSS               Workflow
    VV  VV       SSSS
     VVVV     SS    SSS
      VV       SSSSSSS

**************************

Start: 06/21/2022, 22:06:41
Running in single core mode
Loading database pdb_ligs.vsdb ...
Reading query input ...
Calculating fingerprints ...
Finished fingerprint generation in 6.04996 seconds
Calculating similarities ...
Finished calculating similarities in 0.08398 seconds
Writing 10 molecules to output file(s)
Generating output file(s) ...
Generating PDF file(s) ...
Calculating similarity maps for 10 matches ...
Finished: 06/21/2022, 22:06:56
Finished in 14.63942 seconds

Similarity map is nice approach to visualize similarity between query(tofacitinib) and hit compounds. This example used fcfp4 as FP however user can use other rdkit supported FP such as ECFP, RDKit, Atom etc.

Final example is shape similarity. To do it vsdb should have 3D structure information. So I got 3D data from ligand expo and made vsdb.

Data link is below.
http://ligand-expo.rcsb.org/dictionaries/Components-pub.sdf.gz

Then run shape sim search. I took long time compared to commercial package such as ROCS but could generate nice output.

$ vsflow shape -smi "CC1CCN(C(=O)CC#N)CC1N(C)c1ncnc2[nH]ccc12" -d pdb_ligs3d.vsdb -o shapesmi -np 6 --pymol
**************************

 VV        VV  SSSSSSS             VSFlow
  VV      VV  SSS    SS       Virtual Screening
   VV    VV    SSSS               Workflow
    VV  VV       SSSS
     VVVV     SS    SSS
      VV       SSSSSSS

**************************

Start: 06/21/2022, 22:43:11
Running in parallel mode on 6 threads
Reading database ...
Reading query ...
Performing shape screening ...
Generating 3D conformer(s) for 1 query molecule(s)
Generating PyMOl file ...
Finished: 06/22/2022, 02:16:54
Finished in 12822.98383 seconds

Here is an example output of shape similarity screening. Green is query molecule. As you can see, vsflow got molecules which has similar 3D shape.

In summary vsflow is useful package for chemoinformatics.

More detials are described the arxiv and repository’s wiki.

https://chemrxiv.org/engage/chemrxiv/article-details/628c60215d9485a206cc8ecc

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “Useful package for virtual screening #chemoinformatics #RDKit

  1. Hi There,
    I converted the files from sdf to pdb (generate the conformers), it is in vsdb format.
    is there way i can split the vsdb into individual pdb files that ready to dock ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: