A Task-Based Parallelization Framework

Jug is a python library for parallelization.
The library easy to install by using pip.
Today I wrote some example to use it.
First get sample data from ZINC DB.

iwatobipen$ wget http://zinc.docking.org/db/bysubset/11/11_p0.smi.gz
iwatobipen$ gzip -d 11_p0.smi.gz 
iwatobipen$ wc -l 11_p0.smi 
 4591276 11_p0.smi

Got 4M mols. Too much for test.
So, I used 10,000 mols

 iwatobipen$ head -n 10000 11_p0.smi > zinc.smi

Next, made sample script named jug_rdk.py

import csv
from rdkit import Chem
from jug import TaskGenerator
from rdkit.Chem import Descriptors

inf = open("zinc.smi", "r")
inf = csv.reader(inf, delimiter=" ")
mols = [ Chem.MolFromSmiles(line[0]) for line in inf ]
@TaskGenerator
def calc( mol ):
    molwt = Descriptors.MolWt( mol )
    smi = Chem.MolToSmiles( mol )
    return smi, str(molwt)
res = map(calc, mols)

After jug installed, I could use jug command.

iwatobipen$ jug status jug_rdk.py 
     Waiting       Ready    Finished     Running  Task name                     
--------------------------------------------------------------------------------
           0       10000           0           0  jug_rdk.calc                  
................................................................................
           0       10000           0           0  Total        

Then run script, “use jug execute”

iwatobipen$ jug execute jug_rdk.py &
[1] 4681
iwatobipen$ jug execute jug_rdk.py &
[2] 4682

check status, using “jug status jug_rdk.py”

     Waiting       Ready    Finished     Running  Task name                     
--------------------------------------------------------------------------------
           0        2205        7793           2  jug_rdk.calc                  
................................................................................
           0        2205        7793           2  Total      

Running 2 indicate that 2 core used.
After job was finished, check status

iwatobipen$ jug status jug_rdk.py
     Waiting       Ready    Finished     Running  Task name                                                          
---------------------------------------------------------------------------------------------------------------------
           0           0       10000           0  jug_rdk.calc                                                       
.....................................................................................................................
           0           0       10000           0  Total  

All job was finished ;-).
jug is an interesting tool for multiprocessing.

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “A Task-Based Parallelization Framework

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: