ParallelAccelerator

0 views
Skip to first unread message

Leopold Haimberger

unread,
May 10, 2017, 9:00:12 AM5/10/17
to Numba Public Discussion - Public
Hi there, 

I just want to thank for merging in the ParallelAccelerator  branch, it is awesome!
Several functions got substantial speedups (up to 5x) on our 28core server after simply adding "parallel=True" 
in the jit decorator without any further modifications. However I suspect more speedups could be realized if I could "help" the
Accelerator what can be parallelized. Does it parallize explicit loops as well?
I found a little bit of documentation in the devloper docs on GitHub but I believe there is more to say.  

Leo 

Stanley Seibert

unread,
May 11, 2017, 5:27:50 PM5/11/17
to Numba Public Discussion - Public
Hi, we're glad you are trying it out.  Currently, the ParallelAccelerator optimization passes only apply to array operations, and not explicit loops.  We have had some conversations with the Intel developers working on ParallelAccelerator, and they are interested in using this technology to bring back Numba's old "prange" feature, where you could mark a loop as safe to execute in parallel.

We're also working with the Intel developers to expand the documentation of the feature to say more about what it can and cannot do.  The overall goal is to keep expanding it, so this PR merger is just the first step.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/4cd5b9ad-e21d-467a-b24f-8f5d0b355716%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Ehsan Totoni

unread,
May 11, 2017, 8:02:13 PM5/11/17
to Numba Public Discussion - Public
Hi Leo,

Thanks for letting us know. Yes, explicit loops are on our todo list as Stan mentioned. There is also a lot to be done in the documentation. Could you please open an issue and post some code segments to help us understand your workload? Motivating examples can help accelerate development significantly.

Best,
Ehsan


On Thursday, May 11, 2017 at 2:27:50 PM UTC-7, Stanley Seibert wrote:
Hi, we're glad you are trying it out.  Currently, the ParallelAccelerator optimization passes only apply to array operations, and not explicit loops.  We have had some conversations with the Intel developers working on ParallelAccelerator, and they are interested in using this technology to bring back Numba's old "prange" feature, where you could mark a loop as safe to execute in parallel.

We're also working with the Intel developers to expand the documentation of the feature to say more about what it can and cannot do.  The overall goal is to keep expanding it, so this PR merger is just the first step.
On Wed, May 10, 2017 at 6:00 AM, Leopold Haimberger <haim...@gmail.com> wrote:
Hi there, 

I just want to thank for merging in the ParallelAccelerator  branch, it is awesome!
Several functions got substantial speedups (up to 5x) on our 28core server after simply adding "parallel=True" 
in the jit decorator without any further modifications. However I suspect more speedups could be realized if I could "help" the
Accelerator what can be parallelized. Does it parallize explicit loops as well?
I found a little bit of documentation in the devloper docs on GitHub but I believe there is more to say.  

Leo 

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

Leopold Haimberger

unread,
May 12, 2017, 5:08:08 AM5/12/17
to Numba Public Discussion - Public
Hi Ehsan,

please find below some code that calculates the spherical distance between random points on a sphere. It typically gives me a 5x speedup on our 28 processor machine.

To get even more speedup I would love to parallelize the outer loop in the sdist function, which essentially fills a triangle matrix. 

Cheers
Leo

from numba import njit
import numpy
import math
import time
import os

@njit(cache=False,parallel=False)
def tdist(dists,lats,lons,weight):
    
    x=numpy.cos(lats*math.pi/180.)*numpy.cos(lons*math.pi/180.)
    y=numpy.cos(lats*math.pi/180.)*numpy.sin(lons*math.pi/180.)
    z=numpy.sin(lats*math.pi/180.)
    
    sdist(dists, x, y, z)
    dists[:]=numpy.arccos(dists*0.999999)

    return 

@njit(cache=False,parallel=True)
def pdist(dists,lats,lons,weight):
    
    x=numpy.cos(lats*math.pi/180.)*numpy.cos(lons*math.pi/180.)
    y=numpy.cos(lats*math.pi/180.)*numpy.sin(lons*math.pi/180.)
    z=numpy.sin(lats*math.pi/180.)
    
    sdist(dists, x, y, z)
    dists[:]=numpy.arccos(dists*0.999999)

    return

@njit()
def sdist(dists,x,y,z):

    id=0
    for l in range(x.shape[0]):
        for k in range(l,x.shape[0]):
            dists[id]=x[l]*x[k]+y[l]*y[k]+z[l]*z[k]
            id+=1

    return


n=3000
lats=numpy.random.rand(n)*180.-90.
lons=numpy.random.rand(n)*360.

dists=numpy.empty((n+1)*n/2,numpy.float64)
dists2=dists.copy()
for r in range(20):
    t=time.time()
    tdist(dists,lats,lons,1)
    t1=time.time()
    pdist(dists2,lats,lons,1)
    print ('serial: {:5.4f}s parallel: {:5.4f}s speedup: {:5.2f}'.format(t1-t,time.time()-t1,(t1-t)/(time.time()-t1)))
    assert(numpy.sum(dists)==numpy.sum(dists2))
    
t=time.time()
for r in range(20):
    tdist(dists,lats,lons,1)
t1=time.time()
for r in range(20):
    pdist(dists2,lats,lons,1)

print ('sequential serial: {:5.4f}s parallel: {:5.4f}s speedup: {:5.2f}'.format(t1-t,time.time()-t1,(t1-t)/(time.time()-t1)))
assert(numpy.sum(dists)==numpy.sum(dists2))

Ehsan Totoni

unread,
May 12, 2017, 10:07:07 AM5/12/17
to numba...@continuum.io
Hi Leo,

Thank you for sending the code. This is a great motivating example for our parallel loop development. I will let you know when we have it ready (we will prioritize it). BTW, I think this code has a lot of potential for other compiler optimizations. It will be much faster with our upcoming developments I hope.

Best,
Ehsan

To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.

To post to this group, send email to numba...@continuum.io.
Reply all
Reply to author
Forward
0 new messages