Use of multiprocessing to speed up minimize()?

Jeremy DeJournett

unread,

Feb 9, 2024, 4:43:18 PMFeb 9

to lmfit-py

Hi all,

I'm trying to speed up some parameter fitting for a simulation application, and noticed there isn't really any options for multiprocessing. Is this an architectural limitation within lmfit, or just something nobody has built yet?

I understand you'd probably need to have a cooperative pool of workers (regularly submitting best results to a central repository that ocassionally updates workers with an Event), but I would think that by chunking up the task, you could see big time savings.

Right now, fitting for some data takes 20-30 minutes, but the fitting is not even maxing out a single core, much less using the available cores on my machine.

Here's how I'm using lmfit:

result = minimize(
    model_resid,
    params,
    args=(t, *curves, w),
)

def model_resid(params, t, g_curve, i_curve, d_curve, w):
    x_0 = params["x_0"].value
    i1_0 = params["i1_0"].value
    i2_0 = params["i2_0"].value
    x0 = g_curve(0), x_0, i1_0, i2_0

    integrated = odeint(model_rhs, t, x0, params, i_curve, d_curve, w)

    observed_g = np.array([g_curve(time) for time in t])
    model_g = np.array([row[0] for row in integrated])

    return (model_g - observed_g).ravel()

Let me know what you think, or if there's some bit of documentation I missed!

Jeremy

Matt Newville

unread,

Feb 10, 2024, 1:37:14 PMFeb 10

to lmfi...@googlegroups.com

On Fri, Feb 9, 2024 at 10:43 PM Jeremy DeJournett <jcdejo...@gmail.com> wrote:

Hi all,

I'm trying to speed up some parameter fitting for a simulation application, and noticed there isn't really any options for multiprocessing. Is this an architectural limitation within lmfit, or just something nobody has built yet?

I would describe it as an implementation limit in the libraries that lmfit uses. Your use of "just" with "something nobody has built yet" sort of implies that it would not not be hard, if only someone had thought to do it. The truth is more like "it is known to be very hard and no one has figured out how to do it".

A fit is an iterative process. The results of one call of the objective function influences the values used in later calls. It is sort of true that for most of the methods (and the default one) that one might imagine a supervisory process that calls the objective function in parallel for each variable parameter for each "iteration" (we can leave this definition vague) to construct the Jacobian matrix -- those calls per variable at each iteration step might be done in parallel to construct the Jacobian matrix -- then send results back to a supervisor task to decide whether to continue and the next set of parameter values, and start over. That would, at best, do N_variable function calls "at the same time". Currently, the best (and default) solvers we use are implemented in Fortran and do not have an iteration loop in Python. So that would have to be done in Fortran, of the code translated to Python, which is likely to give a bit performance hit that would need to be overcome.

There are other complications and overheads involved. For example, although Python multiprocessing promises that it can share state between processes, this is only true in a pretty limited sense and for simple types. This might be surmountable, but it would add overhead.

I understand you'd probably need to have a cooperative pool of workers (regularly submitting best results to a central repository that ocassionally updates workers with an Event), but I would think that by chunking up the task, you could see big time savings.

Right now, fitting for some data takes 20-30 minutes, but the fitting is not even maxing out a single core, much less using the available cores on my machine.

Here's how I'm using lmfit:

result = minimize(
model_resid,
params,
args=(t, *curves, w),
)

def model_resid(params, t, g_curve, i_curve, d_curve, w):
x_0 = params["x_0"].value
i1_0 = params["i1_0"].value
i2_0 = params["i2_0"].value
x0 = g_curve(0), x_0, i1_0, i2_0

integrated = odeint(model_rhs, t, x0, params, i_curve, d_curve, w)

observed_g = np.array([g_curve(time) for time in t])
model_g = np.array([row[0] for row in integrated])

return (model_g - observed_g).ravel()

Let me know what you think, or if there's some bit of documentation I missed!

We always recommend providing a complete, minimal working example of real code, and to post a fit report. We don't know anything about your code (data size, for example) or how many iterations were used.

With only 3 variables, I would not expect any theoretical benefit of parallelization to help you much.

It seems suspicious that you loop over `t` just to call `g_curve()` with a single value, and then turn the list of results into an array, and you repeat that in each call of your objective function. It sure looks like that could be done exactly once, before you start the fit. I would also imagine that if your `g_curve()` worked on numpy arrays, the loop in the list comprehension could be eliminated.

--Matt

Jeremy DeJournett

unread,

Feb 11, 2024, 4:21:13 PMFeb 11

to lmfit-py

Thanks for illuminating that for me! I was thinking that you could parallelize the "brute force" part of the search, and periodically have the supervisor pick the best result to inform the next phase of search. The fact that this would have to be built much deeper in the stack makes this approach less feasible, of course.

I'll have to think about how to full extract a working example, as this is part of a larger simulation codebase, so the model implementations you would need to be able to run this would require a few hundred lines of code to be extracted (the model itself, the curve objects, the data loading), and I didn't want to overwhelm with a bunch of information.

To answer your question about "g_curve": in this code, t is a numpy array of time values for which we want to integrate the model. I thought about caching, and indeed I do cache the output of these curves (they're splines/step functions under the hood), but I wasn't sure the *same* time values would be used in different calls to this residual function. I didn't know if there was some kind of adaptive step size where different iterations might use different times to sample, so I keep the (somewhat expensive) calculation in the hot loop to ensure correctness rather than save computation time. Because the same curve object does cache results for different time values, `observed_g` amortizes to a fairly cheap list copy.

I'm actually fitting for many more parameters than the sample code implies, here's my fit report:

[[Fit Statistics]]
# fitting method = leastsq
# function evals = 381
# data points = 2000
# variables = 10
chi-square = 3033802.20
reduced chi-square = 1524.52372
Akaike info crit = 14668.8496
Bayesian info crit = 14724.8586
[[Variables]]
h: 111.762032 +/- 1508.53479 (1349.77%) (init = 136)
n: 4.33066010 +/- 30.5492149 (705.42%) (init = 0.13)
gb: 120.976891 +/- 1.62835388 (1.35%) (init = 120)
ib: 49.7449559 +/- 1123.46722 (2258.45%) (init = 10)
p1: -0.09994449 +/- 0.01059817 (10.60%) (init = -0.0131)
p2: -0.09120352 +/- 1.78702475 (1959.38%) (init = -0.0135)
p3: 2.9656e-06 +/- 6.9307e-05 (2337.06%) (init = 2.9e-06)
x_0: 1.0143e-04 +/- 0.00511144 (5039.62%) (init = 0.01)
i1_0: 1.86116644 +/- 2736.46694 (147029.67%) (init = 10)
i2_0: -99.4873902 +/- 2285.40105 (2297.18%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(gb, p3) = +0.4926
C(gb, p2) = +0.3764
C(p2, i1_0) = -0.3742
C(gb, p1) = -0.3442
C(p3, i1_0) = +0.3419
C(h, p2) = +0.3405
C(p2, p3) = -0.3273
C(h, p3) = -0.2888
C(ib, i2_0) = +0.2689
C(i1_0, i2_0) = -0.2334
C(n, p3) = +0.2137
C(n, ib) = +0.2101
C(h, n) = +0.2012
C(gb, i1_0) = +0.1885
C(n, i2_0) = -0.1779
C(h, p1) = +0.1670
C(ib, p1) = +0.1621
C(p2, i2_0) = +0.1584
C(n, p2) = -0.1556
C(n, x_0) = -0.1539
C(p2, x_0) = -0.1511
C(n, gb) = -0.1487
C(p3, x_0) = +0.1471
C(n, i1_0) = +0.1293
C(h, i2_0) = +0.1207
C(p1, x_0) = -0.1170
C(p1, p2) = -0.1055

And the actual parameter values from the final fit:

Parameters([

('h', <Parameter 'h', value=111.76203180426232 +/- 1.51e+03, bounds=[80:200]>),

('n', <Parameter 'n', value=4.330660103060304 +/- 30.5, bounds=[0.01:5.0]>),

('gb', <Parameter 'gb', value=120.97689107965567 +/- 1.63, bounds=[80:200]>),

('ib', <Parameter 'ib', value=49.74495592655311 +/- 1.12e+03, bounds=[1:50]>),

('p1', <Parameter 'p1', value=-0.0999444948667807 +/- 0.0106, bounds=[-0.1:-0.001]>),

('p2', <Parameter 'p2', value=-0.09120351550969862 +/- 1.79, bounds=[-0.1:-0.001]>),

('p3', <Parameter 'p3', value=2.9655792256020028e-06 +/- 6.93e-05, bounds=[5e-07:1e-05]>),

('x_0', <Parameter 'x_0', value=0.00010142508073466625 +/- 0.00511, bounds=[0.0001:0.5]>),

('i1_0', <Parameter 'i1_0', value=1.8611664435517237 +/- 2.74e+03, bounds=[1:50]>),

('i2_0', <Parameter 'i2_0', value=-99.48739015035827 +/- 2.29e+03, bounds=[-100:100]>)])

The model is a fairly simple glucose model for patients in the ICU, and consists of a system of 4 ODEs. I'm trying to solve for constant parameters that are roughly the same for a given patient, as well as the initial conditions for the ODEs that do not have a direct observation we can seed with.

I'll try to extract a minimal example that demonstrates the slowdown I saw with this model, it may take a bit though.

Reply all

Reply to author

Forward