Jug is a python library for parallelization.
The library easy to install by using pip.
Today I wrote some example to use it.
First get sample data from ZINC DB.
iwatobipen$ wget http://zinc.docking.org/db/bysubset/11/11_p0.smi.gz iwatobipen$ gzip -d 11_p0.smi.gz iwatobipen$ wc -l 11_p0.smi 4591276 11_p0.smi
Got 4M mols. Too much for test.
So, I used 10,000 mols
iwatobipen$ head -n 10000 11_p0.smi > zinc.smi
Next, made sample script named jug_rdk.py
import csv from rdkit import Chem from jug import TaskGenerator from rdkit.Chem import Descriptors inf = open("zinc.smi", "r") inf = csv.reader(inf, delimiter=" ") mols = [ Chem.MolFromSmiles(line[0]) for line in inf ] @TaskGenerator def calc( mol ): molwt = Descriptors.MolWt( mol ) smi = Chem.MolToSmiles( mol ) return smi, str(molwt) res = map(calc, mols)
After jug installed, I could use jug command.
iwatobipen$ jug status jug_rdk.py Waiting Ready Finished Running Task name -------------------------------------------------------------------------------- 0 10000 0 0 jug_rdk.calc ................................................................................ 0 10000 0 0 Total
Then run script, “use jug execute”
iwatobipen$ jug execute jug_rdk.py & [1] 4681 iwatobipen$ jug execute jug_rdk.py & [2] 4682
check status, using “jug status jug_rdk.py”
Waiting Ready Finished Running Task name -------------------------------------------------------------------------------- 0 2205 7793 2 jug_rdk.calc ................................................................................ 0 2205 7793 2 Total
Running 2 indicate that 2 core used.
After job was finished, check status
iwatobipen$ jug status jug_rdk.py Waiting Ready Finished Running Task name --------------------------------------------------------------------------------------------------------------------- 0 0 10000 0 jug_rdk.calc ..................................................................................................................... 0 0 10000 0 Total
All job was finished ;-).
jug is an interesting tool for multiprocessing.
One thought on “A Task-Based Parallelization Framework”