sqlalchemy object serialization / deserialization

1,032 views
Skip to first unread message

Matt

unread,
Dec 20, 2007, 7:49:10 PM12/20/07
to sqlalchemy
Hi all,

I'm trying to implement simple object serialization (ala the pickle
module) using JSON. I'm pretty far along, but having one problem --
serializing a sqlalchemy object through the __reduce__ method produces
a circular reference through the InstanceState object.

pprint of the structures in question:

account.__reduce__():
(<function _reconstructor at 0x42530>,
(<class 'model.account.Account'>,
<type 'object'>,
None),
{'_comment': None,
'_description': None,
'_display_name': u'm',
'_email': u'm...@foo.com',
'_email_status': 1L,
'_entity_name': None,
'_instance_key': (<class 'model.account.Account'>,
(5L,),
None),
'_name_graphic': u'm',
'_password_hash': u'bar',
'_sa_session_id': 43005744,
'_state': <sqlalchemy.orm.attributes.InstanceState object at
0x286f978>, <---- InstanceState object
'_status': 1L,
'_uri': None,
'_user_name': u'm',
'account_id': 5L,
'avatar_exists': False,
'creation_time': datetime.datetime(2006, 8, 23, 17, 43, 26),
'modification_time': datetime.datetime(2007, 12, 19, 23, 3, 2),
'newsletter': False,
'reviewed': False,
'site_updates': False})

account._state.__reduce__():
(<function _reconstructor at 0x42530>,
(<class 'sqlalchemy.orm.attributes.InstanceState'>, <type 'object'>,
None),
{'committed_state': {'_comment': None,
'_description': None,
'_display_name': u'm',
'_email': u'm...@foo.com',
'_email_status': 1L,
'_name_graphic': u'm',
'_password_hash': u'bar',
'_status': 1L,
'_uri': None,
'_user_name': u'm',
'account_id': 5L,
'avatar_exists': False,
'creation_time': datetime.datetime(2006, 8, 23,
17, 43, 26),
'modification_time': datetime.datetime(2007, 12,
19, 23, 3, 2),
'newsletter': False,
'reviewed': False,
'site_updates': False},
'instance': <model.account.Account object at 0x2971350>, <---
reference back to Account object
'modified': False,
'parents': {}})

So right now, I just break the circular reference the second time I
see the Account object, but this causes problems deserializing since
the InstanceState object is missing some data...

Any thoughts appreciated here... pickle is able to handle this
structure fine, but I'm not sure exactly what algorithm it uses to do
this...

m

Rick Morrison

unread,
Dec 20, 2007, 8:04:16 PM12/20/07
to sqlal...@googlegroups.com

You're not going to be able to serialize Python class instances in JSON: json strings are simple object literals limited to basic Javascript types.  Pickle does some pretty heavy lifting to serialize and reconstitute class instances.

Easiest way to store JSON in the database is to limit the type of data you store in JSON strings to straightforward objects that only use primitive JS types, and then serialize back and forth to Python dictionaries. That's what libraries like SimpleJSON or cJSON do. Using Sqlalchemy, you can then store those JSON strings in database VARCHARS, TEXT and so on fields.


Matt

unread,
Dec 20, 2007, 8:35:56 PM12/20/07
to sqlalchemy
On Dec 20, 5:04 pm, "Rick Morrison" <rickmorri...@gmail.com> wrote:
> You're not going to be able to serialize Python class instances in JSON:
> json strings are simple object literals limited to basic Javascript types.
> Pickle does some pretty heavy lifting to serialize and reconstitute class
> instances.

I've already figured this out -- you can use class "hinting" as
outlined in this page:

http://json-rpc.org/wiki/specification

So a datetime using this format and converted to JSON might look like:

"modification_time": {"__jsonclass__": ["datetime.datetime", [2007,
12, 19, 23, 3, 2, 2]]}

My deserialization routine loads the datetime module, then gets the
datetime attribute (which in this case is a type, but could be a
function too), and calls that with the arguments in the list. This
sort of thing works for generic classes too using the pickle
__reduce__ hooks. The JSON part I'm actually handling through cjson
or simplejson -- my serialization/deserialization is working all on
python objects.

> Easiest way to store JSON in the database is to limit the type of data you
> store in JSON strings to straightforward objects that only use primitive JS
> types, and then serialize back and forth to Python dictionaries. That's what
> libraries like SimpleJSON or cJSON do. Using Sqlalchemy, you can then store
> those JSON strings in database VARCHARS, TEXT and so on fields.

I don't really want to store JSON in the DB, but just use it as a
serialization format for sqlalchemy objects. We want our frontend to
render data from the same type of object with a couple different
possible backend sources. One being the database through the
sqlalchemy ORM and another being some sort of JSON/XML interface that
we can backend from whatever...

Rick Morrison

unread,
Dec 20, 2007, 8:45:46 PM12/20/07
to sqlal...@googlegroups.com

Hey Matt.

Class hinting was a json-rpc 1.0 feature, subsequently dropped in 1.1. But that's a nit. 

The real problem, though is reconstituing those hints -- the machinery for reinstantiating those objects has to come from somewhere, and that somewhere is going to your code -- there is no magic "just add water and reconstitute" my object built into Python.

If you really want to get into writing your own deserialization, you're already heading in the right direction: simplejson and cjson (and maybe others) -- have hooks to plug in your own serialization and deserialization code. I've used them both to do something almost exactly like this to serialize and deserialize JS dates, which normally can't be sent in JSON. You'll find that the built-in serialization works for *most* Python objects, especially for basic types like str(), int(), etc., but most certainly not *all* Python objects. 

> I don't really want to store JSON in the DB, but just use it as a
> serialization format for sqlalchemy objects. We want our frontend to
> render data from the same type of object with a couple different
> possible backend sources. One being the database through the
> sqlalchemy ORM and another being some sort of JSON/XML interface that
> we can backend from whatever...

Your best bet is going to be to deserialize the JSON to some intermediate object and then construct the SqlAlchemy instance. Re-implementing pickle with a JSON storage format is going to be a huge job, and really won't give you anything that you don't already get from pickle -- what other app besides you own is ever going to understand those heavily class-hinted JSON files?


On 12/20/07, Matt <ma...@vazor.com> wrote:

Lele Gaifax

unread,
Dec 21, 2007, 4:16:39 AM12/21/07
to sqlal...@googlegroups.com
On Thu, 20 Dec 2007 20:45:46 -0500
"Rick Morrison" <rickmo...@gmail.com> wrote:

> Re-implementing pickle with a JSON storage format is going to be a
> huge job, and really won't give you anything that you don't already
> get from pickle

That's surely right: consider that Pickle is not simply a "format",
but really a (little) specialized language able to reconstruct a
Python object, maintaining its relations with other objects in the
stream.

See this blog:

http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/

ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
le...@nautilus.homeip.net | -- Fortunato Depero, 1929.

Michael Schlenker

unread,
Dec 21, 2007, 7:35:53 AM12/21/07
to sqlal...@googlegroups.com
Hi all,

i have a legacy system which provides some special database views to provide
schema information for an application in a database independent way
(and with some quirks).

Now i need to port this to sqlalchemy and want to use it as one source of
schema information instead of the default SQLAlchemy reflection. That works fine,
i just added a derived Metadata class that does reflection via my legacy schema
information system.

BUT, as this is legacy stuff it misses out on some of the nicer features of SQLAlchemy
and modern databases like Foreign Keys, Constraints, Defaults and all that stuff.
The legacy app has some of that information available in various layers, so i need to
merge that info into my Metadata, e.g. to add ForeignKeys that are not modelled in the database
layer and all that legacy stuff.

So my basic approach was something like:

# get the table information from the legacy schema info system
meta = db.MyMeta.reflect(only=[...])

Now i have a bunch of Table objects inside the Metadata with my custom datatypes
correctly set and all that, but don't have any foreign key stuff yet.

Is there any good way to merge extra Constraints and Defaults and ForeignKeys into
an already defined Table object?

My naive approach failed:

>>> fk = ForeignKeyConstraint(['initial_manager'],['angestellter.personalnummer'])
>>> proj = Table('project',db.meta,fk)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\sdk\win32\1.0\Python25\lib\site-packages\sqlalchemy-0.4.1-py2.5.egg\s
qlalchemy\schema.py", line 103, in __call__
raise exceptions.ArgumentError("Table '%s' is already defined for this MetaD
ata instance." % key)
sqlalchemy.exceptions.ArgumentError: Table 'project' is already defined for this
MetaData instance.

Michael

--
Michael Schlenker
Software Engineer

CONTACT Software GmbH Tel.: +49 (421) 20153-80
Wiener Straße 1-3 Fax: +49 (421) 20153-41
28359 Bremen
http://www.contact.de/ E-Mail: m...@contact.de

Sitz der Gesellschaft: Bremen | Geschäftsführer: Karl Heinz Zachries
Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215

Michael Schlenker

unread,
Dec 21, 2007, 7:41:25 AM12/21/07
to sqlal...@googlegroups.com
Michael Schlenker schrieb:

>
> Is there any good way to merge extra Constraints and Defaults and ForeignKeys into
> an already defined Table object?
>
Should read the API doc before posting, Table.append_constraint() does what i need.
Reply all
Reply to author
Forward
0 new messages