Hi
I have a few questions related to the changes in the new version and to saving and loading models.
Prior to the release of version 0.3 I wrote some code for saving and loading model objects. The script I made saved models by extracting core variables from the model object (e.g. data, include, is_group_model, depends_on, bias, wiener_params, db, dbname, etc.). These variables were saved to file by using the shelve module (
http://docs.python.org/library/shelve.html). Hence, the ability to save a model this way depends on whether these variables are picklable. Loading such models from file then simply recreated the models by initializing a new model with the saved variables.
model = hddm.HDDM(data,
include=include,
is_group_model=is_group_model,
depends_on=depends_on,
bias=bias,
wiener_params=wiener_params)
model.load_db(dbname=m['dbname'], db=m['db'])
In version 0.2 I was able to implement such a solution. This made it possible to separate the processes of sampling different chains and calculating convergence statistics. An additional benefit of this approach was that we could run more samples on the same chains/models if the chains had not converged.
My first question is then: Do you see any potential problems with such a setup? Are there any other variables that could be critical when saving and loading models this way? (e.g. extracting and restoring the state of the sampler and step methods of the MCMC object)
My next question is related to the new version. In the 0.3 release, some of these variables have been changed. For instance, “depends_on” is no longer a class variable. It is simply used for initializing “depends”, which is a defaultdict. Since “depends_on” is no longer available through the model object and a defaultdict is not picklable, my approach to saving and loading models is no longer as clean and straightforward as it was with the previous version. My question to the developers is then: Could I request that this be changed in a future update? (e.i. self.depends_on = depends_on). I realize that there are theoretical arguments for not making this variable a class variable (e.g. after “depends” has been initialized, accessing “depends_on” will not change anything), but from a practical point of view I do not think this will cause any major problems.
Alternatively, does anyone have any suggestions to how saving and loading models could be done differently?
Kabuki has a (new?) method init_from_existing_model(), but as far as I can tell, this will only be beneficial for within-script/processes. One would still need the pre_model object.
When attempting to adapt my save/load script to version 0.3, I am also having some other problems. When loading the database by using the load_db() method (kabuki/hierarchical.py), I get the error message: 'HDDM' object has no attribute 'param_container'. Where in the source code is param_container initialized? (Even when hacking my way passed this, I get the same error for the method create_nodes()). I can not see that this should be related to my saving and loading procedure, because I still get the same problems if I create a new model from scratch and try to load an existing db. Any suggestions will be greatly appreciated.
Thanks,
Øystein