hddm can't pickle lamda functions.

Adam Moore

unread,

Jun 1, 2022, 3:54:46 PM6/1/22

to hddm-users

Hi all. I've just gotten hddm running on my Windows 10 machine which was a chore and a half!

Now it runs my model (in test mode) just fine:

----

def run_model(id):
import hddm
data1 = hddm.load_csv('C:/path_to_data/St1_MFCT_dropped.csv')
v_reg = {'model' : "v ~ 1 + C(MFCTType)", 'link_func' : lambda x: x}
t_reg = {'model' : "t ~ C(Action)", 'link_func' : lambda x: x}
a_reg = {'model' : 'a ~ C(Valence)', 'link_func' : lambda x: x}
reg_descr = [t_reg, a_reg, v_reg]
m = hddm.HDDMRegressor(data1, reg_descr, p_outlier = .05)
m.find_starting_values()
m.sample(5, burn = 1, dbname='db%i'%id, db='pickle')
return m

from ipyparallel import Client

v = Client()[:]
jobs = v.map(run_model, range(3))
models = jobs.get()

combined_model2 = kabuki.utils.concat_models(models)
combined_model2.save('C:/path_to_folder/Study1_MFCT_concatenated_models)
combined_model2.print_stats('C:/path_to_folder/Full_stats_report.csv')

----

However, I get an error (identical for every engine):

---

AttributeError Traceback (most recent call last)
~\anaconda3\envs\hddmEnv\lib\site-packages\ipyparallel\serialize\serialize.py in serialize_object(obj, buffer_threshold, item_threshold)
117 buffers.extend(_extract_buffers(cobj, buffer_threshold))
118
--> 119 buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
120 return buffers
121

AttributeError: Can't pickle local object 'run_model.<locals>.<lambda>'

---

After some judicious googling, it appears that Python 3 can't pickle lambda functions. I've tried using dill instead, but importing that (either vanilla or import dill as pickle) and substituting 'dill' in, e.g., dbname='db%i'%id, db='dill', throws a different error about how pymc doesn't recognise this in its database backend.

Any suggestions or helpful tips on how to fix or work around this so I can save my model?

Thanks,

Adam

Krishn Bera

unread,

Jun 1, 2022, 4:26:11 PM6/1/22

to hddm-users

Hi,

Can you check the kabuki version in your environment? You can use this --

import kabuki

print(kabuki.__version__)

Best,
Krishn

Adam Moore

unread,

Jun 2, 2022, 3:59:58 AM6/2/22

to hddm-users

Hi Krishn,

It's 0.6.4

Best,

A

Adam Moore

unread,

Jun 2, 2022, 4:06:44 AM6/2/22

to hddm-users

Some additional package version info (running on python 3.7.13):

cython 0.29.30

dill 0.3.5.1

hddm 0.9.6

ipyparallel 8.3.0

kabuki 0.6.4

numpy 1.21.0

pandas 1.2.5

pickle-mixin 1.0.2

pymc 2.3.8

Adam Moore

unread,

Jun 2, 2022, 4:12:35 PM6/2/22

to hddm-users

Update to this:

Downgrading to hddm 0.7.7 solved the issue.

Perhaps a bug somewhere in 0.9.6 in conjunction with the other package versions I was running?

Cheers,

Adam

Alexander Fengler

unread,

Jun 2, 2022, 5:16:22 PM6/2/22

to hddm-users

The kabuki version should be 0.6.5

Try installing kabuki with:

pip install git+https://github.com/hddm-devs/kabuki

This should resolve the issue.

We do advise against installing an older version of hddm instead.

Best,

Alex

Adam Moore

unread,

Jun 4, 2022, 8:52:29 PM6/4/22

to hddm-users

Hey Alex,

After upgrading to kabuki 0.6.5 and HDDM 0.9.6, the error still gets thrown after the models successfully complete (and are thus lost). All other package info remains the same.

Any thoughts?

Adam

Fengler, Alexander

unread,

Jun 4, 2022, 9:28:35 PM6/4/22

to hddm-...@googlegroups.com

Hi Adam,

ah I overlooked one aspect that is specific to your code.

Try saving the models in your run_model() function directly instead of first returning them.

The pickle error you get is actually specific to ipyparallel which attempts to pickle the model in the process of returning it to your main control flow.

Internally kabuki now (since 0.6.5) uses cloudpickle instead of pickle to allow saving more complex models, but ipyparallel will put a wrench in that via its own attempt to use pickle.

Best,

Alex

--
You received this message because you are subscribed to a topic in the Google Groups "hddm-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hddm-users/cL54xFciN7s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/a1386303-3d0b-42ff-ad08-9465063b6c9cn%40googlegroups.com.

Adam Moore

unread,

Jun 5, 2022, 4:04:38 AM6/5/22

to hddm-users

Hey Alex,

Thanks for the suggestion. I updated the code to attempt your suggestion (not sure if it's the right way to do it or not - see test code below):

----

def run_model(id):
import hddm
data1 = hddm.load_csv('C:/path_to_folder/Study1/St1_MFCT_dropped.csv')
v_reg = {'model' : "v ~ 1 + C(Valence) * C(MFCTType)", 'link_func' : lambda x: x}

t_reg = {'model' : "t ~ C(Action)", 'link_func' : lambda x: x}

reg_descr = [v_reg, t_reg]

m = hddm.HDDMRegressor(data1, reg_descr, p_outlier = .05)
m.find_starting_values()

m.sample(2, burn = 0, dbname='db%i'%id, db='pickle')
m.save('C:/path_to_folder/Study1/Study1_MFCT_model0%i'%id)
return m

----

But it throws a very similar error for every engine in in the cluster:

---

AttributeError Traceback (most recent call last)
~\anaconda3\envs\hddmEnv\lib\site-packages\ipyparallel\serialize\serialize.py in serialize_object(obj, buffer_threshold, item_threshold)
117 buffers.extend(_extract_buffers(cobj, buffer_threshold))
118
--> 119 buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
120 return buffers
121

AttributeError: Can't pickle local object 'run_model.<locals>.<lambda>'

----

I'm pretty new to Python, but surely others are running models in parallel and successfully saving them, so obviously the error is somewhere on my end. Any help is deeply appreciated!

Adam

Adam Moore

unread,

Jun 5, 2022, 5:43:19 AM6/5/22

to hddm-users

Hi Alex (et al.),

Further update to this:

While the error still gets thrown, it appears that the code is writing each model to the directory. I can then load them individually, concat them, and print stats, etc. Strange. I'll see if this also works to run posterior predictives and post again when I find out.

Adam

Alexander Fengler

unread,

Jun 8, 2022, 6:18:52 PM6/8/22

to hddm-users

If this error still gets thrown, it is likely that your run_model() function still returns the model.

The run_model() function should not return the hddm model itself, potentially nothing or some (that's what I tend to do) integer that signifies if the process finished successfully.

Best,

Alex

Adam Moore

unread,

Jun 9, 2022, 9:48:13 AM6/9/22

to hddm-users

Ah, gotcha. That makes sense, thanks!

Reply all

Reply to author

Forward