Bug in txt database?

6 views
Skip to first unread message

Jeremy Sanders

unread,
Feb 27, 2009, 5:23:45 AM2/27/09
to PyMC
Hi - I tried to load back my chain and it complained about the format
of the text:

File "/home/jss/lib64/python2.5/pymc/database/txt.py", line 154, in
load
db._state_ = eval(file.read())
File "<string>", line 19
0.02595237, 0.02296866, 0.02045975]), 'concentration':
19.336739013585699, 'logPouter':-10.823919082349803}, 'step_methods':
{'Metropolis_r200': {'adaptive_scale_factor': 1.0,
'proposal_distribution': 'Normal', 'accepted': 128.0, 'rejected':
872.0, 'proposal_sd': 1.1210113815992098}, 'Metropolis_concentration':
{'adaptive_scale_factor': 1.0, 'proposal_distribution':'Normal',
'accepted': 354.0, 'rejected': 646.0, 'proposal_sd':
5.9827190743764662}, 'Metropolis_logPouter': {'adaptive_scale_factor':
1.0, 'proposal_distribution': 'Normal', 'accepted': 24.0, 'rejected':
976.0, 'proposal_sd': 10.889470505486667}, 'AdaptiveMetropolis_sb':
{'adaptive_scale_factor': 1.0, 'C': array([[ 0.03189851, 0. ,
0. , ..., 0. ,

^
SyntaxError: invalid syntax


It looks to me that the data are being stored using repr on numpy
arrays and loaded back with eval. Unfortunately it looks like numpy
doesn't store a full string representation of the array with repr, but
a truncated one. Surely this is a numpy bug?

In [35]: repr(numpy.arange(10000))
Out[35]: 'array([ 0, 1, 2, ..., 9997, 9998, 9999])'

Jeremy Sanders

unread,
Feb 27, 2009, 5:31:12 AM2/27/09
to PyMC
Actually, reading the documentation suggests that you have to use
numpy.set_printoptions(threshold=xxx) to adjust where numpy does this.
Pymc should probably increase this so that arrays are never truncated
when written to the file (it would probably have to save and restore
the values).

I find it irritating numpy does this truncation with repr however.

David Huard

unread,
Feb 27, 2009, 9:20:09 AM2/27/09
to py...@googlegroups.com
Hi Jeremy,

thanks for the report and the solution !

Could you try it now ?

Thanks,

David

Andrew Straw

unread,
Feb 27, 2009, 7:22:53 PM2/27/09
to py...@googlegroups.com
David,

It may be best to do this in a state-preserving construct such as:

oldstate = np.get_printoptions()
np.set_printoptions(threshold=1e6)
try:
blah_blah_blah()
finally:
np.set_printoptions(**oldstate)

That way other people's set_printoptions don't get clobbered mysteriously...
--
Andrew D. Straw, Ph.D.
California Institute of Technology
http://www.its.caltech.edu/~astraw/

David Huard

unread,
Mar 5, 2009, 9:22:15 AM3/5/09
to py...@googlegroups.com
Good idea. Will do.


David


- Show quoted text -



David Huard

unread,
Mar 5, 2009, 10:27:02 AM3/5/09
to py...@googlegroups.com
This is fixed. I just want to point out that I consider the txt database is a "toy" backend. It's tested as are the other backends, but it was designed mostly to give a sense of what pymc can do to newcomers. As far as I know, none of the developpers use it in their daily work. For real applications, I'd encourage users to use one of the other backends, preferably hdf5. 

If someone really wishes to use the txt backend in production mode, then I suggest they spend a couple of hours refactoring the brittle parts, such as the savestate and loadstate methods. In particular, the loadstate method uses eval... not a good thing.

Hope this clarifies things,

David




On Thu, Mar 5, 2009 at 9:22 AM, David Huard <david...@gmail.com> wrote:
Good idea. Will do.


David


- Show quoted text -
Reply all
Reply to author
Forward
0 new messages