Serializing sqlalchemy declarative instances with yaml

261 views
Skip to first unread message

Peter Waller

unread,
Oct 23, 2014, 7:43:02 AM10/23/14
to sqlal...@googlegroups.com, David Jones
We would like to freeze the results of a query to my database in a yaml file, so that we can use the results in an app which isn't connected to the database.

It makes sense here to reuse the model classes. Here's an example:

class Foo(declarative_base()):
    __tablename__ = "foo"
    id = S.Column(S.Integer, primary_key=True)

Unfortunately, `yaml.dump(Foo())` gives a surprising result:

/usr/lib/python3/dist-packages/yaml/representer.py in represent_object(self, data)
    311             reduce = copyreg.dispatch_table[cls](data)
    312         elif hasattr(data, '__reduce_ex__'):
--> 313             reduce = data.__reduce_ex__(2)
    314         elif hasattr(data, '__reduce__'):
    315             reduce = data.__reduce__()

/usr/lib/python3.4/copyreg.py in _reduce_ex(self, proto)
     63     else:
     64         if base is self.__class__:
---> 65             raise TypeError("can't pickle %s objects" % base.__name__)
     66         state = base(self)
     67     args = (self.__class__, base, state)

TypeError: can't pickle int objects

It seems that what is happening is that `data` is equal to `Foo`, and `Foo.__reduce_ex__(2)` gives `TypeError: can't pickle int objects`. As does `declarative_base().__reduce_ex__(2)`.

I note that `pickle.dumps` works. But we'd rather use YAML.

Where is the bug? Is it in sqlalchemy, yaml, or python?

Thanks,

- Peter

Michael Bayer

unread,
Oct 23, 2014, 8:55:06 AM10/23/14
to sqlal...@googlegroups.com, David Jones
no clue. ints shouldn’t have an issue with pickle, obviously! for Yaml it would be much more appropriate to build custom per-class serialization in any case since you don’t want all the persistence junk like _sa_instance_state() carried along.


Jonathan Vanasco

unread,
Oct 24, 2014, 10:29:44 AM10/24/14
to sqlal...@googlegroups.com, d...@scraperwiki.com

Usually for this sort of stuff, I serialize the object's data into a JSON dict ( object columns to JSON dict, object relations to a dict, list of dicts, or reference to another object).  ( Custom dump/load is needed to handle Timestamp, Floats, etc).  You might be able to iterate over the data in YAML and not require custom encoding/decoding.  When I need to treat the json data as objects, I'll load them into a custom dict class that will treat attributes as keys.  

The downside of this is that you don't have all the SqlAlchemy relational stuff or any ancillary methods (though they can be bridged in with more work).  The benefit though is that you can get a nearly 1:1 parity between the core needs without much more work.  When using a "read only" context, you can flip between SqlAlchemy objects and dicts.  If you need to use the SqlAlchemy model itself, you could load the column/relationship data into it manually.

Peter Waller

unread,
Oct 24, 2014, 10:39:43 AM10/24/14
to sqlal...@googlegroups.com, David Jones
Well I was hoping to "just use yaml" since yaml understands when two objects refer to the same underlying object. That means you don't have to write any logic to de-duplicate objects through relationships, etc.

Since json doesn't have the notion of referencing, that doesn't seem straightforward there.

I was also hoping to "just use yaml" to avoid writing custom dumping code, since it seems in general like a useful capability. So I may yet try and find the underlying bug and fix it.

On 24 October 2014 15:29, Jonathan Vanasco <jvan...@gmail.com> wrote:

Usually for this sort of stuff, I serialize the object's data into a JSON dict ( object columns to JSON dict, object relations to a dict, list of dicts, or reference to another object).  ( Custom dump/load is needed to handle Timestamp, Floats, etc).  You might be able to iterate over the data in YAML and not require custom encoding/decoding.  When I need to treat the json data as objects, I'll load them into a custom dict class that will treat attributes as keys.  

The downside of this is that you don't have all the SqlAlchemy relational stuff or any ancillary methods (though they can be bridged in with more work).  The benefit though is that you can get a nearly 1:1 parity between the core needs without much more work.  When using a "read only" context, you can flip between SqlAlchemy objects and dicts.  If you need to use the SqlAlchemy model itself, you could load the column/relationship data into it manually.

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
Oct 24, 2014, 12:55:00 PM10/24/14
to sqlal...@googlegroups.com, d...@scraperwiki.com

On Friday, October 24, 2014 10:39:43 AM UTC-4, Peter Waller wrote:
I was also hoping to "just use yaml" to avoid writing custom dumping code, since it seems in general like a useful capability. So I may yet try and find the underlying bug and fix it.

It might not be a bug, and the effect of an "implementation feature" of SqlAlchemy.  I tried (naively) playing around with your example, and thought back to how SqlAlchemy accomplishes much of it's magic by creating custom comparators (and other private methods) on the classes and columns.  

Playing around with it, the problem seems to be with the SqlAlchemy object's __reduce_ex__ method. If you simply use __reduce__ in yaml, it works.  I couldn't figure out what Foo inherits __reduce_ex__ from , or if any of the columns have it.

Peter Waller

unread,
Oct 24, 2014, 1:29:20 PM10/24/14
to sqlal...@googlegroups.com, David Jones
The oddity is that calling `__reduce_ex__` on the instance is fine, but on the class it is not. When serialising a declarative class it finds itself serialising the class type, which fails. This actually fails for the `object`, too (see below).

So I think what's happening is that serialisation fails because `_sa_instance_state` (somewhere inside it) contains a class. This is probably a yaml bug, then.

In [1]: object().__reduce_ex__(2)
Out[1]: (<function copy_reg.__newobj__>, (object,), None, None, None)

In [2]: object.__reduce_ex__(2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-eebec0cadfee> in <module>()
----> 1 object.__reduce_ex__(2)

/usr/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle int objects

--

Sergii Nechuiviter

unread,
Sep 20, 2016, 12:47:41 PM9/20/16
to sqlalchemy, d...@scraperwiki.com
This is Yaml bug: http://pyyaml.org/ticket/245
Reply all
Reply to author
Forward
0 new messages