released. I'll share the complete save/load script after I have
> Hi Øystein,
> First of all, I think it's great you are working on this and willing to
> share your experiences/code. That's definitely a feature I've been wanting
> for a long time. See responses below.
> On Sat, Sep 15, 2012 at 6:49 AM, Øystein Sandvik
> <oystein.sand...@gmail.com>wrote:
>> Hi
>> I have a few questions related to the changes in the new version and to
>> saving and loading models.
>> Prior to the release of version 0.3 I wrote some code for saving and
>> loading model objects. The script I made saved models by extracting core
>> variables from the model object (e.g. data, include, is_group_model,
>> depends_on, bias, wiener_params, db, dbname, etc.). These variables were
>> saved to file by using the shelve module (
>> http://docs.python.org/library/shelve.html). Hence, the ability to save a
>> model this way depends on whether these variables are picklable. Loading
>> such models from file then simply recreated the models by initializing a
>> new model with the saved variables.
>> model = hddm.HDDM(data,
>> include=include,
>> is_group_model=is_group_model,
>> depends_on=depends_on,
>> bias=bias,
>> wiener_params=wiener_params)
>> model.load_db(dbname=m['dbname'], db=m['db'])
> I think that is in general the right strategy (saving model variables,
> recreating the object and loading db). I'm not sure what shelve gives you
> what the simpler pickle doesn't though so you might want to consider using
> this. Ideally I think we'll end up with some code that allows saving of
> arbitrary kabuki models but its fine to start with HDDM.
>> In version 0.2 I was able to implement such a solution. This made it
>> possible to separate the processes of sampling different chains and
>> calculating convergence statistics. An additional benefit of this
>> approach
>> was that we could run more samples on the same chains/models if the
>> chains
>> had not converged.
>> My first question is then: Do you see any potential problems with such a
>> setup? Are there any other variables that could be critical when saving
>> and
>> loading models this way? (e.g. extracting and restoring the state of the
>> sampler and step methods of the MCMC object)
>> My next question is related to the new version. In the 0.3 release, some
>> of these variables have been changed. For instance, “depends_on” is no
>> longer a class variable. It is simply used for initializing “depends”,
>> which is a defaultdict. Since “depends_on” is no longer available through
>> the model object and a defaultdict is not picklable, my approach to
>> saving
>> and loading models is no longer as clean and straightforward as it was
>> with
>> the previous version. My question to the developers is then: Could I
>> request that this be changed in a future update? (e.i. self.depends_on =
>> depends_on). I realize that there are theoretical arguments for not
>> making
>> this variable a class variable (e.g. after “depends” has been
>> initialized,
>> accessing “depends_on” will not change anything), but from a practical
>> point of view I do not think this will cause any major problems.
> Right, this was changed but there is no harm with saving the depends_on
> separately. In the git develop branch this is now done.
>> Alternatively, does anyone have any suggestions to how saving and loading
>> models could be done differently?
>> Kabuki has a (new?) method init_from_existing_model(), but as far as I
>> can
>> tell, this will only be beneficial for within-script/processes. One would
>> still need the pre_model object.
> Yeah, I think that code is to spawn an identical model so this wouldn't
> help you here since you have to create a new one from a saved state.
>> When attempting to adapt my save/load script to version 0.3, I am also
>> having some other problems. When loading the database by using the
>> load_db() method (kabuki/hierarchical.py), I get the error message:
>> 'HDDM'
>> object has no attribute 'param_container'. Where in the source code is
>> param_container initialized? (Even when hacking my way passed this, I get
>> the same error for the method create_nodes()). I can not see that this
>> should be related to my saving and loading procedure, because I still get
>> the same problems if I create a new model from scratch and try to load an
>> existing db. Any suggestions will be greatly appreciated.
> OK that's a bug. param_container is from an older version. Should also be
> fixed in the develop branch.
> Thomas