Emcee Multiprocessing

469 views
Skip to first unread message

Abolfazl Ziaeemehr

unread,
May 7, 2021, 5:34:57 AM5/7/21
to lmfit-py
Hi every body.
I am almost new to LMFIT so my apology if it seems simple.

I am going to use multiple workers in Minimizer.emcee but I always get single process:
Here is simplified part of the code:


fitter = lmfit.Minimizer(residual, params,
    fcn_args=(simulation_params,),
    max_nfev=2000)
    
mcmc = fitter.emcee(steps=100,
    workers=8,
    burn=0,
    nwalkers=10,
    thin=1,
    is_weighted=False)

# -----------------------------------------------------------------
def residual(params, simulation_params):
    A = simulator(params)
    distances = measure_distances(A, observed_data)  # The distance which need to be minimized
    return distances
# -----------------------------------------------------------------
The simulator just integrate a system of ODE and return a 2d numpy array. 

I don't know where I am wrong?
best,
Abolfazl

Matt Newville

unread,
May 7, 2021, 8:09:47 AM5/7/21
to lmfit-py
I think that should work, but you will need to make sure that `emcee` and `dill` are both installed:
    pip install emcee dill

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/fd973fa7-b140-4a97-99d1-9ec2f78995c1n%40googlegroups.com.


--
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411

Abolfazl Ziaeemehr

unread,
May 7, 2021, 8:54:38 AM5/7/21
to lmfi...@googlegroups.com
Yeah the problem was dill.
I couldn't find that in the documentation, or maybe  it's somewhere irrelevant.
best,
Abolfazl

You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/ai0Tgrp6j_w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESboN4wr5sqnN2LaUc1tQBo6qgb8ver-%3D8%2BqXuRLqTV4SJA%40mail.gmail.com.


--
​​Abolfazl Ziaeemehr
PhD in Computational Neuroscience
Institute for Advanced Studies in 
Basic Sciences (IASBS)

Page: https://ziaeemehr.github.io/

Matt Newville

unread,
May 7, 2021, 9:58:32 AM5/7/21
to lmfit-py
Hi

On Fri, May 7, 2021 at 7:54 AM Abolfazl Ziaeemehr <a.zia...@gmail.com> wrote:
Yeah the problem was dill.
I couldn't find that in the documentation, or maybe  it's somewhere irrelevant.

I do think this is under-documented.   I think it might be reasonable for us to make `dill` a required dependency -- it is small, pure python, so doesn't add much overhead -- and is sort of necessary to use lmfit with multiprocessing.

--Matt

Abolfazl Ziaeemehr

unread,
May 7, 2021, 3:17:37 PM5/7/21
to lmfi...@googlegroups.com
Something is still weird or maybe I am using the wrong setting. 
The simulation starts with multiple workers and after a while (less than a minute) 
becomes serial with a single process.

mcmc_result = lmfit.minimize(lib.residual,
params=params,
args=(simulation_params,),
workers=8,
burn=1,
steps=100,
thin=1,
nwalkers=20,
method='emcee',
nan_policy='omit',
is_weighted=False)

I have 3 free parameters and use this setting to just try if the program can run without error.
Then I use more sampling for final simulations.
I don't know if it is clear?
The full code is quite long and I don't want to bother you digging through hundreds lines of codes.




Abolfazl Ziaeemehr

unread,
May 8, 2021, 9:02:09 AM5/8/21
to lmfi...@googlegroups.com
Hi,

This is a simple example to parallelize emcee:
I set workers=4 but the code runs in serial.
Do you know where the problem is?
'''
'''
import matplotlib.pyplot as plt
from time import time
import numpy as np
import lmfit


def residual(p):
v = p.valuesdict()
return v['a1'] * np.exp(-x / v['t1']) + v['a2'] *\
np.exp(-(x - 0.1) / v['t2']) - y


x = np.linspace(1, 10, 250)
np.random.seed(0)
y = 3.0 * np.exp(-x / 2) - 5.0 * np.exp(-(x - 0.1) / 10.) + \
0.1 * np.random.randn(x.size)


p = lmfit.Parameters()
p.add_many(('a1', 4.), ('a2', 4.), ('t1', 3.), ('t2', 3., True))
mi = lmfit.minimize(residual, p, method='nelder', nan_policy='omit')
lmfit.printfuncs.report_fit(mi.params, min_correl=0.5)

mi.params.add('__lnsigma', value=np.log(0.1), min=np.log(0.001), max=np.log(2))

start = time()
res = lmfit.minimize(residual, method='emcee',
nan_policy='omit',
burn=300,
steps=1000, thin=20,
params=mi.params,
workers=8,
is_weighted=False,
progress=True)

print("Done in {} seconds".format(time()-start))

I made sure that the emcee and dill installed.

Best,
Abolfazl

Matt Newville

unread,
May 9, 2021, 10:28:35 AM5/9/21
to lmfit-py
Hm, 

I'm not sure what could be going wrong.   The `emcee` "fitting method"  (I use quotes because it is the only fitting method in lmfit that does not change parameter values in order to try to improve the quality of the fit) is a pretty thin wrapper around `emcee.EnsembleSampler`.   You might try testing whether that sampler is correctly using multiprocessing.   Lmfit should "just" be passing it a multiprocessing pool. 

For sure, doing anything remotely complex with multiprocessing in Python is perilous.  Expecting it to speed up any calculation requires sort of a lot of understanding of what the code (including any libraries) does during the calculation.

FWIW, `emcee` is a terrible fitting strategy:  It will explore parameter space with a random walk, prioritizing "likely parameter values" but not updating them, and mostly ignoring any smoothness in the probability distribution even when it is obvious.  Just to be clear: MCMC is definitely useful (and really important) for exploring ensembles of discrete states.  But for the kind of fitting one would do with lmfit?  No, not so much.

Unless you have a lot of variables or a huge amount of data with few "events", I doubt it is going to tell you anything that you cannot learn from doing a non-linear least-squares fit and exploring the confidence intervals with `lmfit.minimize()` with `leastsq` or `least_squares` method and `conf_interval`.  And it will definitely be really, really, really slow.

That said, it is in lmfit and should "work" in the sense of "do the thing it advertises it does".   But, I'm not sure we ever said "multiprocessing will definitely make it faster".



--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

Renee Otten

unread,
May 9, 2021, 1:05:26 PM5/9/21
to lmfi...@googlegroups.com
I will take a look later today or next week to make sure we are actually passing the arguments correctly to emcee. 

Can you please provide some information on what version(s) you are using? Please open your Python shell and get the output of the following:

import sys, lmfit, numpy, scipy, asteval, uncertainties, emcee
print('Python: {}\n\nlmfit: {}, scipy: {}, numpy: {}, asteval: {}, uncertainties: {}, emcee: {}'\
      .format(sys.version, lmfit.__version__, scipy.__version__, numpy.__version__, \
      asteval.__version__, uncertainties.__version__, emcee.__version__))

Secondly, how did you verify that the code runs in serial? Only timing it and comparing the run-time for the simple example like the one you showed might not give you an improvement in running time when changing the number of workers. 

Renee

Abolfazl Ziaeemehr

unread,
May 10, 2021, 9:08:14 AM5/10/21
to lmfi...@googlegroups.com
Hi, sorry for the late reply,
print('Python: {}\n\nlmfit: {}, scipy: {}, numpy: {}, asteval: {}, uncertainties: {}, emcee: {}'\
      .format(sys.version, lmfit.__version__, scipy.__version__, numpy.__version__, \
      asteval.__version__, uncertainties.__version__, emcee.__version__))

Python: 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
lmfit: 1.0.2, scipy: 1.6.1, numpy: 1.19.5, asteval: 0.9.23, uncertainties: 3.1.5, emcee: 3.0.2

I am using Ubuntu 20.04 LTS.

I check the number of active processes from htop, I also print os.getpid() in the residual function  to make sure how many processes are active.

best,
Abolfazl


--
You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/ai0Tgrp6j_w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages