hddm can't pickle lamda functions.

144 views
Skip to first unread message

Adam Moore

unread,
Jun 1, 2022, 3:54:46 PM6/1/22
to hddm-users
Hi all. I've just gotten hddm running on my Windows 10 machine which was a chore and a half!

Now it runs my model (in test mode) just fine:
----
def run_model(id):
    import hddm
    data1 = hddm.load_csv('C:/path_to_data/St1_MFCT_dropped.csv')
    v_reg = {'model' : "v ~ 1 + C(MFCTType)", 'link_func' : lambda x: x}
    t_reg = {'model' : "t ~ C(Action)", 'link_func' : lambda x: x}
    a_reg = {'model' : 'a ~ C(Valence)', 'link_func' : lambda x: x}
    reg_descr = [t_reg, a_reg, v_reg]
    m = hddm.HDDMRegressor(data1, reg_descr, p_outlier = .05)
    m.find_starting_values()
    m.sample(5, burn = 1, dbname='db%i'%id, db='pickle')
    return m
   
from ipyparallel import Client

v = Client()[:]
jobs = v.map(run_model, range(3))
models = jobs.get()

combined_model2 = kabuki.utils.concat_models(models)
combined_model2.save('C:/path_to_folder/Study1_MFCT_concatenated_models)
combined_model2.print_stats('C:/path_to_folder/Full_stats_report.csv')
----

However, I get an error (identical for every engine):
---
AttributeError                            Traceback (most recent call last)
~\anaconda3\envs\hddmEnv\lib\site-packages\ipyparallel\serialize\serialize.py in serialize_object(obj, buffer_threshold, item_threshold)
    117         buffers.extend(_extract_buffers(cobj, buffer_threshold))
    118
--> 119     buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
    120     return buffers
    121

AttributeError: Can't pickle local object 'run_model.<locals>.<lambda>'
---
After some judicious googling, it appears that Python 3 can't pickle lambda functions. I've tried using dill instead, but importing that (either vanilla or import dill as pickle) and substituting 'dill' in, e.g., dbname='db%i'%id, db='dill', throws a different error about how pymc doesn't recognise this in its database backend.

Any suggestions or helpful tips on how to fix or work around this so I can save my model?

Thanks,
Adam

Krishn Bera

unread,
Jun 1, 2022, 4:26:11 PM6/1/22
to hddm-users
Hi,

Can you check the kabuki version in your environment? You can use this --
import kabuki
print(kabuki.__version__)

Best,
Krishn

Adam Moore

unread,
Jun 2, 2022, 3:59:58 AM6/2/22
to hddm-users
Hi Krishn,

It's 0.6.4

Best,
A

Adam Moore

unread,
Jun 2, 2022, 4:06:44 AM6/2/22
to hddm-users
Some additional package version info (running on python 3.7.13):

cython       0.29.30
dill              0.3.5.1
hddm         0.9.6
ipyparallel 8.3.0
kabuki       0.6.4
numpy       1.21.0
pandas      1.2.5
pickle-mixin 1.0.2
pymc         2.3.8

Adam Moore

unread,
Jun 2, 2022, 4:12:35 PM6/2/22
to hddm-users
Update to this:

Downgrading to hddm 0.7.7 solved the issue.

Perhaps a bug somewhere in 0.9.6 in conjunction with the other package versions I was running?

Cheers,
Adam

Alexander Fengler

unread,
Jun 2, 2022, 5:16:22 PM6/2/22
to hddm-users
The kabuki version should be 0.6.5

Try installing kabuki with:

This should resolve the issue.
We do advise against installing an older version of hddm instead.

Best,
Alex

Adam Moore

unread,
Jun 4, 2022, 8:52:29 PM6/4/22
to hddm-users
Hey Alex,

After upgrading to kabuki 0.6.5 and HDDM 0.9.6, the error still gets thrown after the models successfully complete (and are thus lost). All other package info remains the same.

Any thoughts?
Adam

Fengler, Alexander

unread,
Jun 4, 2022, 9:28:35 PM6/4/22
to hddm-...@googlegroups.com
Hi Adam,

ah I overlooked one aspect that is specific to your code.
Try saving the models in your run_model() function directly instead of first returning them.

The pickle error you get is actually specific to ipyparallel which attempts to pickle the model in the process of returning it to your main control flow.
Internally kabuki now (since 0.6.5) uses cloudpickle instead of pickle to allow saving more complex models, but ipyparallel will put a wrench in that via its own attempt to use pickle.

Best,
Alex


--
You received this message because you are subscribed to a topic in the Google Groups "hddm-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hddm-users/cL54xFciN7s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hddm-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hddm-users/a1386303-3d0b-42ff-ad08-9465063b6c9cn%40googlegroups.com.

Adam Moore

unread,
Jun 5, 2022, 4:04:38 AM6/5/22
to hddm-users
Hey Alex,

Thanks for the suggestion. I updated the code to attempt your suggestion (not sure if it's the right way to do it or not - see test code below):
----
def run_model(id):
    import hddm
    data1 = hddm.load_csv('C:/path_to_folder/Study1/St1_MFCT_dropped.csv')
    v_reg = {'model' : "v ~ 1 + C(Valence) * C(MFCTType)", 'link_func' : lambda x: x}

    t_reg = {'model' : "t ~ C(Action)", 'link_func' : lambda x: x}
    reg_descr = [v_reg, t_reg]

    m = hddm.HDDMRegressor(data1, reg_descr, p_outlier = .05)
    m.find_starting_values()
    m.sample(2, burn = 0, dbname='db%i'%id, db='pickle')
    m.save('C:/path_to_folder/Study1/Study1_MFCT_model0%i'%id)
    return m
----

But it throws a very similar error for every engine in in the cluster:
---
AttributeError                            Traceback (most recent call last)
~\anaconda3\envs\hddmEnv\lib\site-packages\ipyparallel\serialize\serialize.py in serialize_object(obj, buffer_threshold, item_threshold)
    117         buffers.extend(_extract_buffers(cobj, buffer_threshold))
    118
--> 119     buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
    120     return buffers
    121

AttributeError: Can't pickle local object 'run_model.<locals>.<lambda>'
----
I'm pretty new to Python, but surely others are running models in parallel and successfully saving them, so obviously the error is somewhere on my end. Any help is deeply appreciated!

Adam

Adam Moore

unread,
Jun 5, 2022, 5:43:19 AM6/5/22
to hddm-users
Hi Alex (et al.),

Further update to this:

While the error still gets thrown, it appears that the code is writing each model to the directory. I can then load them individually, concat them, and print stats, etc. Strange. I'll see if this also works to run posterior predictives and post again when I find out.

Adam

Alexander Fengler

unread,
Jun 8, 2022, 6:18:52 PM6/8/22
to hddm-users
If this error still gets thrown, it is likely that your run_model() function still returns the model.
The run_model() function should not return the hddm model itself, potentially nothing or some (that's what I tend to do) integer that signifies if the process finished successfully.

Best,
Alex

Adam Moore

unread,
Jun 9, 2022, 9:48:13 AM6/9/22
to hddm-users
Ah, gotcha. That makes sense, thanks!
Reply all
Reply to author
Forward
0 new messages