Seed up of array calculation.

My colleague in CADD team told me that to calculate massive data in python, his recommendation is numpy.
And he showed me very nice code. ( Nice stuff. )
Numpy is the fundamental package for scientific computing with Python. It can handle data as vector, array.

That means native python handles Ns list data with loop of N times. The cost O(N).
But numpy can handle directly Ns list. so, user don’t need write loop. ;-)

Somedays ago, I found numexpr in pypi.
The library can boost calculation in some cases.
Numexpr can be installed using pip, conda.

1st try.
Vector handling

import numpy as np
import numexpr as ne
a = np.arange(1e6)
b = np.arange(1e6)

#Use numpy
%timeit a*b-4.1*a > 2.5*b
#100 loops, best of 3: 8.51 ms per loop

#User numexr
%timeit ne.evaluate( "a*b-4.1*a > 2.5*b" )
#1000 loops, best of 3: 1.12 ms per loop

2nd Array handling

ar1 = np.random.random((1e3,1e3))
ar2 = np.random.random((1e3,1e3))
#use numpy
%timeit ar1 * ar2
#100 loops, best of 3: 1.95 ms per loop

#use numexr
%timeit ne.evaluate("ar1*ar2")
#1000 loops, best of 3: 673 µs per loop

In both examples numexr showed good performance.


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: