save/load models in v0.3

Øystein Sandvik

unread,

Sep 15, 2012, 6:49:51 AM9/15/12

to hddm-...@googlegroups.com

Hi

I have a few questions related to the changes in the new version and to saving and loading models.

Prior to the release of version 0.3 I wrote some code for saving and loading model objects. The script I made saved models by extracting core variables from the model object (e.g. data, include, is_group_model, depends_on, bias, wiener_params, db, dbname, etc.). These variables were saved to file by using the shelve module (http://docs.python.org/library/shelve.html). Hence, the ability to save a model this way depends on whether these variables are picklable. Loading such models from file then simply recreated the models by initializing a new model with the saved variables.

model = hddm.HDDM(data,
    include=include,
    is_group_model=is_group_model,
    depends_on=depends_on,
    bias=bias,
    wiener_params=wiener_params)
model.load_db(dbname=m['dbname'], db=m['db'])

In version 0.2 I was able to implement such a solution. This made it possible to separate the processes of sampling different chains and calculating convergence statistics. An additional benefit of this approach was that we could run more samples on the same chains/models if the chains had not converged.
My first question is then: Do you see any potential problems with such a setup? Are there any other variables that could be critical when saving and loading models this way? (e.g. extracting and restoring the state of the sampler and step methods of the MCMC object)

My next question is related to the new version. In the 0.3 release, some of these variables have been changed. For instance, “depends_on” is no longer a class variable. It is simply used for initializing “depends”, which is a defaultdict. Since “depends_on” is no longer available through the model object and a defaultdict is not picklable, my approach to saving and loading models is no longer as clean and straightforward as it was with the previous version. My question to the developers is then: Could I request that this be changed in a future update? (e.i. self.depends_on = depends_on). I realize that there are theoretical arguments for not making this variable a class variable (e.g. after “depends” has been initialized, accessing “depends_on” will not change anything), but from a practical point of view I do not think this will cause any major problems.

Alternatively, does anyone have any suggestions to how saving and loading models could be done differently?
Kabuki has a (new?) method init_from_existing_model(), but as far as I can tell, this will only be beneficial for within-script/processes. One would still need the pre_model object.

When attempting to adapt my save/load script to version 0.3, I am also having some other problems. When loading the database by using the load_db() method (kabuki/hierarchical.py), I get the error message: 'HDDM' object has no attribute 'param_container'. Where in the source code is param_container initialized? (Even when hacking my way passed this, I get the same error for the method create_nodes()). I can not see that this should be related to my saving and loading procedure, because I still get the same problems if I create a new model from scratch and try to load an existing db. Any suggestions will be greatly appreciated.

Thanks,
Øystein

Thomas Wiecki

unread,

Sep 15, 2012, 12:31:12 PM9/15/12

to hddm-...@googlegroups.com

Hi Øystein,

First of all, I think it's great you are working on this and willing to share your experiences/code. That's definitely a feature I've been wanting for a long time. See responses below.

On Sat, Sep 15, 2012 at 6:49 AM, Øystein Sandvik <oystein...@gmail.com> wrote:

Hi

I have a few questions related to the changes in the new version and to saving and loading models.

Prior to the release of version 0.3 I wrote some code for saving and loading model objects. The script I made saved models by extracting core variables from the model object (e.g. data, include, is_group_model, depends_on, bias, wiener_params, db, dbname, etc.). These variables were saved to file by using the shelve module (http://docs.python.org/library/shelve.html). Hence, the ability to save a model this way depends on whether these variables are picklable. Loading such models from file then simply recreated the models by initializing a new model with the saved variables.

model = hddm.HDDM(data,
    include=include,
    is_group_model=is_group_model,
    depends_on=depends_on,
    bias=bias,
    wiener_params=wiener_params)
model.load_db(dbname=m['dbname'], db=m['db'])

I think that is in general the right strategy (saving model variables, recreating the object and loading db). I'm not sure what shelve gives you what the simpler pickle doesn't though so you might want to consider using this. Ideally I think we'll end up with some code that allows saving of arbitrary kabuki models but its fine to start with HDDM.

In version 0.2 I was able to implement such a solution. This made it possible to separate the processes of sampling different chains and calculating convergence statistics. An additional benefit of this approach was that we could run more samples on the same chains/models if the chains had not converged.
My first question is then: Do you see any potential problems with such a setup? Are there any other variables that could be critical when saving and loading models this way? (e.g. extracting and restoring the state of the sampler and step methods of the MCMC object)

My next question is related to the new version. In the 0.3 release, some of these variables have been changed. For instance, “depends_on” is no longer a class variable. It is simply used for initializing “depends”, which is a defaultdict. Since “depends_on” is no longer available through the model object and a defaultdict is not picklable, my approach to saving and loading models is no longer as clean and straightforward as it was with the previous version. My question to the developers is then: Could I request that this be changed in a future update? (e.i. self.depends_on = depends_on). I realize that there are theoretical arguments for not making this variable a class variable (e.g. after “depends” has been initialized, accessing “depends_on” will not change anything), but from a practical point of view I do not think this will cause any major problems.

Right, this was changed but there is no harm with saving the depends_on separately. In the git develop branch this is now done.

Alternatively, does anyone have any suggestions to how saving and loading models could be done differently?
Kabuki has a (new?) method init_from_existing_model(), but as far as I can tell, this will only be beneficial for within-script/processes. One would still need the pre_model object.

Yeah, I think that code is to spawn an identical model so this wouldn't help you here since you have to create a new one from a saved state.

When attempting to adapt my save/load script to version 0.3, I am also having some other problems. When loading the database by using the load_db() method (kabuki/hierarchical.py), I get the error message: 'HDDM' object has no attribute 'param_container'. Where in the source code is param_container initialized? (Even when hacking my way passed this, I get the same error for the method create_nodes()). I can not see that this should be related to my saving and loading procedure, because I still get the same problems if I create a new model from scratch and try to load an existing db. Any suggestions will be greatly appreciated.

OK that's a bug. param_container is from an older version. Should also be fixed in the develop branch.

Thomas

Øystein Sandvik

unread,

Sep 15, 2012, 12:49:23 PM9/15/12

to hddm-...@googlegroups.com

Great! We'll just work with the development package for now and
hopefully my save/load script will work when the next version is
released. I'll share the complete save/load script after I have
cleaned it up.
Øystein

--
Sent from my mobile device

Philippe Domenech

unread,

Oct 26, 2012, 5:40:07 AM10/26/12

to hddm-...@googlegroups.com

Hi,

I used the following dirty trick to load my traces:

import pymc
# recreate model then
model.sample(1, burn=1)
model.mc.db = pymc.database.pickle.load('traces.db')

any chance that this will be a problem when using the saved traces later on ?

Philippe

Thomas Wiecki

unread,

Oct 28, 2012, 9:34:22 AM10/28/12

to hddm-...@googlegroups.com

Hi Pihlippe,

There is also model.load_db():

"""Load samples from a database created by an earlier model
run (e.g. by calling .mcmc(dbname='test'))

:Arguments:
dbname : str
File name of database
verbose : int <default=0>
Verbosity level
db : str <default='sqlite'>
Which database backend to use, can be
sqlite, pickle, hdf5, txt.
"""

which does the same thing you are doing. That's perfectly fine and
shouldn't give you any problems. The problem addressed in this thread
is if you could save the mode specification so that you wouldn't have
to remember the parameters you passed it.

Thomas

Philippe Domenech

unread,

Oct 28, 2012, 10:49:05 AM10/28/12

to hddm-...@googlegroups.com

Hi Thomas,

Thanks for our answer (and for making this very interesting toolbox available).
I am aware that the issue raised here relate to saving previously defined models, not traces.
I should have started a new topic. Sorry, if I was unclear.

The issue I encountered with v0.3 is that the load_db() method returns the same error mentioned above: <no attribute 'param_container'>
Hence, the need for the work around.

I was also wandering if it was still possible to run a proportional model (for eg, the decision threshold if proportional to some experimentally manipulated dimension) and how to proceed ?

best,

Philippe

Thomas Wiecki

unread,

Oct 28, 2012, 11:08:06 AM10/28/12

to hddm-...@googlegroups.com

On Sun, Oct 28, 2012 at 10:49 AM, Philippe Domenech <tele...@gmail.com> wrote:
> Hi Thomas,
>
> Thanks for our answer (and for making this very interesting toolbox
> available).
> I am aware that the issue raised here relate to saving previously defined
> models, not traces.
> I should have started a new topic. Sorry, if I was unclear.
>
> The issue I encountered with v0.3 is that the load_db() method returns the
> same error mentioned above: <no attribute 'param_container'>
> Hence, the need for the work around.

Ah, that's a bug. Thanks for reporting it. This is fixed now in the
development branch (of kabuki) and in the upcoming version if you want
to wait.

> I was also wandering if it was still possible to run a proportional model
> (for eg, the decision threshold if proportional to some experimentally
> manipulated dimension) and how to proceed ?

Yes, that's a frequently requested feature. I recently fixed up the
regression model which is required for this and it should be in the
development branch very soon.

Thomas

Reply all

Reply to author

Forward