Multiprocessing hurdles

Andrew Nelson

unread,

Sep 20, 2015, 9:23:08 PM9/20/15

to lmfit-py

There is a PR currently in progress for scipy, that is aimed at introducing multiprocessing into scipy.optimize.differential_evolution. For some problems whose objective functions take a long time to calculate this could speed up optimization significantly.

I am currently writing another PR for lmfit, called emcee, that samples the posterior probability distribution function using Markov Chain Monte Carlo. This allows one to obtain robust probability distributions for parameters, marginalise nuisance parameters and also perform model selection. The PR requires that the objective function be called a few orders of magnitude more. This can make the whole process very slow. However, the emcee package that is used to do the sampling can sample in parallel. Given that most people have multiple core computers (or possibly clusters) I would like to try and make the emcee method use this option. However, the parallelisation is done using multiprocessing. Multiprocessing means that everything used for the parallelisation needs to be pickleable, via the pickle module. This would include:

1) the objective function

2) Minimizer._residual

3) Parameters

And everything contained in them. I know that at the moment Parameters objects are not pickleable. (lambda functions and inline functions can't be used for example, all functions have to be declared at the top level) Does anyone know the scope of work for these to become pickleable?

--

_____________________________________
Dr. Andrew Nelson

_____________________________________

Andrew Nelson

unread,

Sep 20, 2015, 9:26:18 PM9/20/15

to lmfit-py

It looks as if individual Parameter objects are pickleable.

Matt Newville

unread,

Sep 20, 2015, 11:17:06 PM9/20/15

to Andrew Nelson, lmfit-py

Hi Andrew,

I think there are many reasons to want to make using multiple cores in optimization. So I applaud the effort.

But... I also think the basic issue is with Pickling itself (or with multiprocessing relying on pickling). Trying to adapt lmfit -- or any other library -- to provide only objects that are easily picklable seems like the wrong approach. We will be continually fighting pickle, losing often, and it may not be worth that effort. For sure the "asteval" code will make this challenging, and it's reasonable to think about scaling back the dynamic nature of asteval is acceptable for lmfit. But it's also reasonable to say that Pickling is crippled. ;).

I recommend looking into dill (https://pypi.python.org/pypi/dill), and work on getting that used with scipy.optimize to allow multiprocessing. As I understand it, dill was designed more or less for exactly this reason. I have not tried this myself.

I'm willing to say that I would probably be OK with making changes necessary to make objects Picklable with dill.

--Matt

--Matt Newville <newville at cars.uchicago.edu> 630-252-0431

Andrew Nelson

unread,

Sep 21, 2015, 2:07:49 AM9/21/15

to Matt Newville, lmfit-py

I think I've managed to pickle the Parameters object... A PR will be submitted forthwith.

Peter Metz

unread,

Jan 27, 2016, 2:16:48 PM1/27/16

to lmfit-py, newv...@cars.uchicago.edu

Hi all,

I wanted to use cPickle to create save states for a refinement method I've developed around lmfit. I run into a problem when I'm trying to unpickle. Particularly, it looks like pickle isn't playing nice with asteval, resulting in errors like:

[...]

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 559, in value
return self._getval()

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 535, in _getval
check_ast_errors(self._expr_eval)

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 22, in check_ast_errors
expr_eval.raise_exception(None)

File "build\bdist.win-amd64\egg\lmfit\asteval.py", line 167, in raise_exception
raise exc(self.error_msg)

NameError: name 'phase_1_transition_1_alp11' is not defined in expr='<_ast.Module object at 0x000000002032A128>'

It looks like this is a known problem- so my question is has there been some progress on this since September? Any workarounds?

Thanks for the help,
Peter Metz

Matt Newville

unread,

Jan 27, 2016, 2:40:18 PM1/27/16

to Peter Metz, lmfit-py

Hi Peter,

On Wed, Jan 27, 2016 at 1:16 PM, Peter Metz <pet...@gmail.com> wrote:

Hi all,

I wanted to use cPickle to create save states for a refinement method I've developed around lmfit. I run into a problem when I'm trying to unpickle. Particularly, it looks like pickle isn't playing nice with asteval, resulting in errors like:

[...]

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 559, in value
return self._getval()

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 535, in _getval
check_ast_errors(self._expr_eval)

File "build\bdist.win-amd64\egg\lmfit\parameter.py", line 22, in check_ast_errors
expr_eval.raise_exception(None)

File "build\bdist.win-amd64\egg\lmfit\asteval.py", line 167, in raise_exception
raise exc(self.error_msg)

NameError: name 'phase_1_transition_1_alp11' is not defined in expr='<_ast.Module object at 0x000000002032A128>'

It looks like this is a known problem- so my question is has there been some progress on this since September? Any workarounds?

Not that I am aware of. As far as I know, Pickle and cPickle from the standard library are still unable to adequately serialize many Python objects, notably methods. Consequently using multiprocessing in Python for anything other than trivial problems is hopelessly crippled. Lmfit's use of callback functions and asteval would certainly qualify as non-trivial for any serialization, and Pickle absolutely cannot do it. But this is inherent to pickling (or, perhaps, to multiprocessing relying pickling). It's certainly not something we can solve here.

The dill module might be adequate, or at least provide hope that something can be made to work around this inherent flaw in multiprocessing.

If you figure it out, many of us would definitely be interested!

--Matt

Reply all

Reply to author

Forward