Saving Parameters in a text file.

512 views
Skip to first unread message

Andrew Nelson

unread,
Jan 11, 2015, 8:26:47 PM1/11/15
to lmfi...@googlegroups.com
Hi all,
I would like some way of saving the Parameters dictionary.  At the moment Parameters is picklable, but this is in a file that is not human readable or easily editable.

One could use jsonpickle to get an ASCII representation that one could save, but it would be difficult to edit that easily. 

What are peoples opinions on adding save and load methods to the Parameters class?  I'd prefer to see a flat txt file format, something like.

[a]
value = 2
vary = True
min = -1
max = 2
expr = ''
stderr = 1
correl  = {'b': 0.5, 'c': 0.2}

We could use configparser to read/save the files in a really simple fashion.

Andy

Matt Newville

unread,
Jan 11, 2015, 10:11:20 PM1/11/15
to Andrew Nelson, lmfit-py
I think saving / restoring Parameters to an ASCII representation is a great idea.  I'm +1 on adding a "save()" and "restore()" (or "load()"?) methods to Parameters, using either INI or json.   My first inclination would probably be to just dump to json, just because it seems easier than INI, and I would imagine the use-case was more about save()/restore() actions than about having a human actually edit the text file.   But, I'd certainly have no objection to the more easily edited INI format. 

--Matt

Andrew Nelson

unread,
Jan 12, 2015, 12:10:20 AM1/12/15
to Matt Newville, lmfit-py
How about the following.  THe drawback is that there is no meta data being saved with the file.

===============================================
import json
from lmfit import Parameters, Parameter
try:
    from collections import OrderedDict
except ImportError:
    from ordereddict import OrderedDict

def save(params, f):
    p = OrderedDict()
    for k, v in params.items():
        p[k] = {'name': v.name, 'value': v.value, 'min': v.min,
                'max': v.max, 'vary': v.vary, 'expr': v.expr,
                'stderr': v.stderr, 'correl': v.correl}
    s = json.dumps(p)
    f.write(s)

def restore(f):
    s = json.load(f)
    p = Parameters()
    for k, v in s.items():
        p.add(str(k), value=v['value'], vary=v['vary'], min=v['min'],
              max=v['max'], expr=v['expr'])

        p[k].correl = v['correl']
        p[k].stderr = v['stderr']

    return p

if __name__ == '__main__':
    p = Parameters()
    p.add_many(('a', 1, True, 0, 2, None))
    print(p)
    with open('test.json', 'wb') as f:
        save(p, f)

    with open('test.json', 'rb') as f:
        p = restore(f)
        print(p)

===============================================
--
_____________________________________
Dr. Andrew Nelson


_____________________________________

Matt Newville

unread,
Jan 12, 2015, 12:47:43 AM1/12/15
to Andrew Nelson, lmfit-py
Hi Andrew,






If you use Parameter.__getstate__() and .__setstate__() (which are there so pickle can work), it might even be easier:

    # save
    >>> pars = Parameters()
    >>> pars.add('x', value=3, min=0)
    >>> pars.add('y', value=9, min=2)
    >>> pars.add('z', expr='x*sqrt(y)')
    >>> out  =  json.dumps([val.__getstate__() for val in pars.values()])
    >>> print(out)
    [["x", 3, true, null, 0, Infinity, null, null, 3], ["y", 9, true, null, 2, Infinity, null, null, 9],
    ["z", null, true, "x*sqrt(y)", -Infinity, Infinity, null, null, null]]

where the 'state' for each Parameter is (self.name, self.value, self.vary, self.expr, self.min, self.max, self.stderr, self.correl, self.init_value)
 
I would suggest having save() and restore() be methods of the Parameters class.
Should the "save()" method take a file handle or file name, or even just return the json string and let the user save it however?   I might opt for the last option (because I might want to dump those to HDF5 or an sqlite database rather than to a textfile), or perhaps take an *optional* filename *and* return the string too.

Likewise, should restore() take a file handle, file name, or string?   Not sure.

--Matt

Andrew Nelson

unread,
Jan 12, 2015, 1:10:09 AM1/12/15
to Matt Newville, lmfit-py
If you use Parameter.__getstate__() and .__setstate__() (which are there so pickle can work), it might even be easier:

where the 'state' for each Parameter is (self.name, self.value, self.vary, self.expr, self.min, self.max, self.stderr, self.correl, self.init_value)

Is there a reason why __getstate__ and __setstate__ return tuples and not dict?

def __getstate__(self):
    """get state for pickle"""
    return {'name': self.name, 'value': self.value, 'vary', self.vary, 'expr': self.expr, 'min': self.min,
'max': self.max, 'stderr': self.stderr, 'correl': self.correl, 'init_value': self.init_value}
 
I would suggest having save() and restore() be methods of the Parameters class.

For restore to be a method it would have to pop all the parameters first:

map(pars.pop, pars.keys())

Before adding them back in again.

 
Should the "save()" method take a file handle or file name, or even just return the json string and let the user save it however?   I might opt for the last option (because I might want to dump those to HDF5 or an sqlite database rather than to a textfile), or perhaps take an *optional* filename *and* return the string too.  

Likewise, should restore() take a file handle, file name, or string?   Not sure.

My preference is for save and restore to take a file handle.  Then calling code can deal with all the problems of setting a file up. The file handle doesn't have to be a file, it could be a file-like, e.g. stringIO object.  save could return the string.

Matt Newville

unread,
Jan 12, 2015, 1:06:17 PM1/12/15
to Andrew Nelson, lmfit-py
On Mon, Jan 12, 2015 at 12:10 AM, Andrew Nelson <andy...@gmail.com> wrote:
 
If you use Parameter.__getstate__() and .__setstate__() (which are there so pickle can work), it might even be easier:

where the 'state' for each Parameter is (self.name, self.value, self.vary, self.expr, self.min, self.max, self.stderr, self.correl, self.init_value)

Is there a reason why __getstate__ and __setstate__ return tuples and not dict?

So order is easier to preserve?  To be clear, you can do

    state = one_param.__getstate__() 
    this_param = Parameter()
    this_param.__setstate__(state)

so the contents of 'state' is not as crucial as being internally consistent.   I agree that for human editing, the json dump of this Parameter state is worse than an INI file section.   Again, it's a question of what the intention is.


def __getstate__(self):
    """get state for pickle"""
    return {'name': self.name, 'value': self.value, 'vary', self.vary, 'expr': self.expr, 'min': self.min,
'max': self.max, 'stderr': self.stderr, 'correl': self.correl, 'init_value': self.init_value}
 
I would suggest having save() and restore() be methods of the Parameters class.

For restore to be a method it would have to pop all the parameters first:

map(pars.pop, pars.keys())

Before adding them back in again.

Yeah, I guess that a restore() method is a little trickier than the save() method, as the saved state may not have the same set of parameters as the current parameter list.  So, it's not completely clear whether "restore" means "make the union of" or "completely replace, deleting if necessary".   I think either is fine, as long as it is clearly documented.  
 
Should the "save()" method take a file handle or file name, or even just return the json string and let the user save it however?   I might opt for the last option (because I might want to dump those to HDF5 or an sqlite database rather than to a textfile), or perhaps take an *optional* filename *and* return the string too.  

Likewise, should restore() take a file handle, file name, or string?   Not sure.

My preference is for save and restore to take a file handle.  Then calling code can deal with all the problems of setting a file up. The file handle doesn't have to be a file, it could be a file-like, e.g. stringIO object.  save could return the string.

Sure.  The alternative would be to just deal with strings and have the user handling I/O however they choose (including SQL, HDF5, or sent over a wire).  It's a small amount of data, so I would probably have used strings, but I do not have a strong preference.     I guess the main question is: what is the likely usage for save()/restore()?   If it's a GUI saving state or a long-running app saving multiple "parameter sets", maybe 1 string per fit result is better than 1 file per fit result.

--Matt

Andrew Nelson

unread,
Jan 12, 2015, 6:21:17 PM1/12/15
to Matt Newville, lmfit-py

    state = one_param.__getstate__() 
    this_param = Parameter()
    this_param.__setstate__(state)

so the contents of 'state' is not as crucial as being internally consistent.

The consistency is important.  The only reason I commented on that is because it's more usual to return dict from __getstate__.  dicts are futureproof, in that they don't mind if you change the ordering or add new values.
 
I agree that for human editing, the json dump of this Parameter state is worse than an INI file section.   Again, it's a question of what the intention is. 

What I'd like to do in my program is save Parameters for a fit, for future reference.  For example, you might be writing a paper and want to retrieve the values.  pickling is fine for storage, but you have to start Python or your program up to get to the values.  A large proportion of my intended user base would have problems doing that.  A text file is reasonably easy to access, one just has to have Notepad (cringe)/vi/etc.  I want to be able to start an editor and easily figure out what the fitted value was for a given parameter.

Using json has the advantage that it's short and easy to construct the information string.  However, it's not very readable.  With your json example you got a list of lists:

    [["x", 3, true, null, 0, Infinity, null, null, 3], ["y", 9, true, null, 2, Infinity, null, null, 9], 
    ["z", null, true, "x*sqrt(y)", -Infinity, Infinity, null, null, null]]

The second step of a 'save', could be to flatten it. i.e. write:

"x", 3, true, null, 0, Infinity, null, null, 3
"y", 9, true, null, 2, Infinity, null, null, 9 
"z", null, true, "x*sqrt(y)", -Infinity, Infinity, null, null, null

On reading back in, reconstitute to the list of lists so that you can reconstitute to the state required for each parameter.  It is a little more code, but I think it's worth it because it is more readable to write each parameter to a separate line, and we don't need the extra square brackets.

Yeah, I guess that a restore() method is a little trickier than the save() method, as the saved state may not have the same set of parameters as the current parameter list.  So, it's not completely clear whether "restore" means "make the union of" or "completely replace, deleting if necessary".   I think either is fine, as long as it is clearly documented.  

My overwhelming choice here is to completely replace, deleting if necessary.  If you have the union of something you're not reconstructing what you originally had. This is why in my original code I wrote a restore function returning a Parameters instance rather than a method.

  
Should the "save()" method take a file handle or file name, or even just return the json string and let the user save it however?   I might opt for the last option (because I might want to dump those to HDF5 or an sqlite database rather than to a textfile), or perhaps take an *optional* filename *and* return the string too.  

Likewise, should restore() take a file handle, file name, or string?   Not sure.

My preference is for save and restore to take a file handle.  Then calling code can deal with all the problems of setting a file up. The file handle doesn't have to be a file, it could be a file-like, e.g. stringIO object.  save could return the string.

Sure.  The alternative would be to just deal with strings and have the user handling I/O however they choose (including SQL, HDF5, or sent over a wire).  It's a small amount of data, so I would probably have used strings, but I do not have a strong preference.     I guess the main question is: what is the likely usage for save()/restore()?   If it's a GUI saving state or a long-running app saving multiple "parameter sets", maybe 1 string per fit result is better than 1 file per fit result.

The json/pickle module have dump and dumps.  The first is write to file, the second is write to string.  Perhaps we could have save/saves, or just use dump/dumps as the names, the first writing to file, the second writing to string.
When my GUI saves state it is pickling, when my user saves a parameter set I want to save text for the reasons outlined above.

A.

Matt Newville

unread,
Jan 13, 2015, 1:13:24 PM1/13/15
to Andrew Nelson, lmfit-py
On Mon, Jan 12, 2015 at 5:21 PM, Andrew Nelson <andy...@gmail.com> wrote:

    state = one_param.__getstate__() 
    this_param = Parameter()
    this_param.__setstate__(state)

so the contents of 'state' is not as crucial as being internally consistent.

The consistency is important.  The only reason I commented on that is because it's more usual to return dict from __getstate__.  dicts are futureproof, in that they don't mind if you change the ordering or add new values.


Dunno.  I offer no excuse for using a tuple vs dict except that the order is explicit.
 
I agree that for human editing, the json dump of this Parameter state is worse than an INI file section.   Again, it's a question of what the intention is. 

What I'd like to do in my program is save Parameters for a fit, for future reference.  For example, you might be writing a paper and want to retrieve the values.  pickling is fine for storage, but you have to start Python or your program up to get to the values.  A large proportion of my intended user base would have problems doing that.  A text file is reasonably easy to access, one just has to have Notepad (cringe)/vi/etc.  I want to be able to start an editor and easily figure out what the fitted value was for a given parameter.


There seem to be two separate topics:  1) a readable report for humans, and 2) a method to save and restore Parameter settings.  There is a way to get a readable report already --  why not just have your application write out a fit report for every fit?    Anyway, your **application** should manage that for the users, and if the users need to launch the data analysis application to inspect seems OK to me -- I have several such apps that I've written, and that others have written that I use regularly. 


Using json has the advantage that it's short and easy to construct the information string.  However, it's not very readable.  With your json example you got a list of lists:

    [["x", 3, true, null, 0, Infinity, null, null, 3], ["y", 9, true, null, 2, Infinity, null, null, 9], 
    ["z", null, true, "x*sqrt(y)", -Infinity, Infinity, null, null, null]]
 
With json you get a string, not a list of list.   Readability is not critical -- it's like an ASCII version of Pickle that allows transport across machines and languages.

 
The second step of a 'save', could be to flatten it. i.e. write:

"x", 3, true, null, 0, Infinity, null, null, 3
"y", 9, true, null, 2, Infinity, null, null, 9 
"z", null, true, "x*sqrt(y)", -Infinity, Infinity, null, null, null
 
On reading back in, reconstitute to the list of lists so that you can reconstitute to the state required for each parameter.  It is a little more code, but I think it's worth it because it is more readable to write each parameter to a separate line, and we don't need the extra square brackets.

This all seems like more work than we really need to or want to have in the Parameters class.
 
Yeah, I guess that a restore() method is a little trickier than the save() method, as the saved state may not have the same set of parameters as the current parameter list.  So, it's not completely clear whether "restore" means "make the union of" or "completely replace, deleting if necessary".   I think either is fine, as long as it is clearly documented.  

My overwhelming choice here is to completely replace, deleting if necessary.  If you have the union of something you're not reconstructing what you originally had. This is why in my original code I wrote a restore function returning a Parameters instance rather than a method.

I agree that a complete replace (as if making a new Parameters instance) is better.
 
Should the "save()" method take a file handle or file name, or even just return the json string and let the user save it however?   I might opt for the last option (because I might want to dump those to HDF5 or an sqlite database rather than to a textfile), or perhaps take an *optional* filename *and* return the string too.  

Likewise, should restore() take a file handle, file name, or string?   Not sure.

My preference is for save and restore to take a file handle.  Then calling code can deal with all the problems of setting a file up. The file handle doesn't have to be a file, it could be a file-like, e.g. stringIO object.  save could return the string.

Sure.  The alternative would be to just deal with strings and have the user handling I/O however they choose (including SQL, HDF5, or sent over a wire).  It's a small amount of data, so I would probably have used strings, but I do not have a strong preference.     I guess the main question is: what is the likely usage for save()/restore()?   If it's a GUI saving state or a long-running app saving multiple "parameter sets", maybe 1 string per fit result is better than 1 file per fit result.

The json/pickle module have dump and dumps.  The first is write to file, the second is write to string.  Perhaps we could have save/saves, or just use dump/dumps as the names, the first writing to file, the second writing to string.
When my GUI saves state it is pickling, when my user saves a parameter set I want to save text for the reasons outlined above.


Having dump()/dumps()/load()/loads()  would be OK with me.  But having that file being something that users are encouraged to edit seems unnecessary.  I'd suggest something simple like (warning -- untested!)

class Parameters(OrderedDict):

    ....
    def dumps(self, **kws):
        out = [p.__getstate__() for p in self.values()]
        return json.dumps(out, **kws)

    def loads(self, s, **kws):
        self.clear()
        for parstate in json.loads(s, **kws):
            _par = Parameter()
            _par.__setstate__(parstate)
            self.__setitem__(parstate[0], _par)

    def dump(self, fp, **kws):
        return fp.write((self.dumps(**kws))

    def load(self, fp, **kws):
        return self.loads(fp.read(), **kws)
                               
As it is (an even without doc strings), that's a substantial increase in the size of the Parameters class.  I'd be reluctant to make that class mostly an I/O formatting class.

I think this make what you want for your app easier, though it may not everything do what you want your app to do.

--Matt
Reply all
Reply to author
Forward
0 new messages