Is there any way to use jitclasses with multiprocessing? (and 2 other quick things)

48 views
Skip to first unread message

ja...@troutnut.com

unread,
Apr 22, 2016, 6:27:22 PM4/22/16
to Numba Public Discussion - Public
I'm using Numba with several jitclasses to speed up a fairly complex biophysics model which must be solved with a computationally intensive but nicely parallelizable genetic algorithm. Before I started compiling things with Numba, I was getting a ~4x speed improvement out of Python's multiprocess module splitting the job across my four cores. The compilation from Numba is giving an even bigger speed improvement, but I can no longer parallelize it because jitclasses can't be pickled. It would be ideal if I could get both these speed benefits at the same time.

I've tried using the pathos version of multiprocessing instead of the standard multiprocess module, which is supposed to use the more flexible 'dill' for serialization, but I still get the following error thrown by the standard multiprocess library's pool.pyc:

PicklingError: Can't pickle <class 'numba.jitclass.base.MyJitclassName'>: it's not found as numba.jitclass.base.MyJitclassName

Is there any hope for using Jitclasses in any sort of parallel way in the near future? Or is there an alternative, comprehensibly-structured way to pass a complex data structure in and out of jitted functions? Or is the only parallelizable option to pass it all around in simple numpy arrays?

--------------

Two other quick questions/comments:
  • I figured out that we can use jitclasses as attributes of other jitclasses by defining their types like so: my_jitclass_type = MyJitclassName.class_type.instance_type. This doesn't seem to be anywhere in the documentation, and it would be really helpful to others if it were.
  • Is there any way to print() something more complex than a single number at a time from inside a jitted class or function? Diagnosing bugs in my jitted code has been a nightmare without a good way to view intermediate values.
Thanks in advance for any tips.

Stephan Hoyer

unread,
Apr 22, 2016, 6:32:39 PM4/22/16
to numba...@continuum.io
Numba may do this already (I haven't checked), but in an ideal world, operations on jit-classes would release the GIL. Then you could use multi-threading (e.g., via concurrent.futures) instead multi-processing, which can actually be significantly faster because you don't need to copy data between processes.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/b7fce592-2c59-4961-9539-569a2aac132d%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Stanley Seibert

unread,
Apr 22, 2016, 6:34:20 PM4/22/16
to Numba Public Discussion - Public
We do not release the GIL currently, but we will add that once we add the ability to control @jit flags on a per method basis.  (This will also allow object-mode methods for when you don't care about performance and need to do something Numba can't specialize.)

Siu Kwan Lam

unread,
Apr 22, 2016, 6:59:18 PM4/22/16
to Numba Public Discussion - Public
Pickling jitclass can get tricky when there are cyclic references or multiple references point the same instance (e.g. diamond shape).  If the jitclass is just a simple record of scalar or array data, it shouldn't be hard to implement the pickling of it.  

Can you tell us about your usecase?  In particular, what is the structure of the jitclass?  I have created an issue here: https://github.com/numba/numba/issues/1846.  Feel free to comment on it.

  • Is there any way to print() something more complex than a single number at a time from inside a jitted class or function? Diagnosing bugs in my jitted code has been a nightmare without a good way to view intermediate values.
Can you submit an issue on the kind of values you want print() to support?


 

Joshua Adelman

unread,
Apr 22, 2016, 8:32:45 PM4/22/16
to Numba Public Discussion - Public
I've opened up an issue for a more flexible print capability in nopython mode here:

Thanks,
Josh

Brian Merchant

unread,
Apr 28, 2016, 3:39:52 PM4/28/16
to Numba Public Discussion - Public
Is pickling jitclass also tough with a package like `dill`? https://pypi.python.org/pypi/dill

I have been able to successfully use dill in order to pickle normal Python classes for storage purposes. Is it possible to make multiprocessing use dill as the pickling package, rather than cPickle?
Message has been deleted

Stanley Seibert

unread,
Apr 29, 2016, 12:27:42 PM4/29/16
to Numba Public Discussion - Public
The part that makes pickling JIT classes hard is the fact that a data structure composed of JIT classes lives partially outside the Python object system as pointers to native data structures.  To serialize that data structure (without having to duplicate a bunch of code already implemented by Python) we need to make it visible to pickle (or dill) as Python objects.  Unfortunately the possibility of multiple references to the same object creates some problems.


Reply all
Reply to author
Forward
0 new messages