Suggestion about how turbojson handle SQLAlchemy object circuit.

15 views
Skip to first unread message

一首诗

unread,
Jul 24, 2009, 4:44:50 AM7/24/09
to TurboGears
I'm very glad to know that turbojson could automatic convert
SQLAlchemy object to JSON. But unfortunately, it can not handle SA
object circuit.

For example, there are 2 of SA mapped class, User and Role. User has
more than 1 role and a role could be given to many user. This is a
very common M:N relationship in relational database.

But turbojson can not convert an instance of User to JSON
automatically, because there is a reference loop. What now we can do
is add a "__json__" method to manually convert an object json, which
is very tedious.

So my suggestion is : why not just skipped there objects referenced
more than one time? This will be a behavior easily undertood and will
save a lot of time writing "__json__".

Diez B. Roggisch

unread,
Jul 24, 2009, 5:34:55 AM7/24/09
to turbo...@googlegroups.com
一首诗 schrieb:

There are two big problems with this:

- JSON has no concept of refernces, it's a purely algrebraic
structure. As a consequence, you can't serialize graph-structures with
it. So you need to come up with a special key/value pair that denotes a
reference. This is purely to your own design.

- if that behavior would be default, it would mean that e.g. returning
a user in facebook would return all his 4000 friends. Which is a rather
dumb thing to do if all you want is a user's primary attributes, don't
you agree?

So I'm afraid this can't be implemented the way you wish. I'm not sure
how this is dealt with in TG2, but in TG1 it was very easy to declare a
special rendering for specific objects, based on rules dispatch. You
should try & do that.

Diez

一首诗

unread,
Jul 24, 2009, 6:20:21 AM7/24/09
to TurboGears
That sounds reasonable.

But, for your facebook example, there is another problem. If we
don't want to get these 4000 thousand friends,
we have 2 choices:

1. don't use relationship of SA to use the default encode mechanism of
turbojson
2. write a customized "__json__"

I think, the 1st choice is not acceptable in most cases, but the 2nd
one is also annoying when we have more than 40 tables.

So, how about this :

Find a way to mark fields of an object that should be skimmed when
encode it to JSON.

Diez B. Roggisch

unread,
Jul 24, 2009, 6:37:23 AM7/24/09
to turbo...@googlegroups.com
一首诗 schrieb:

> That sounds reasonable.
>
> But, for your facebook example, there is another problem. If we
> don't want to get these 4000 thousand friends,
> we have 2 choices:
>
> 1. don't use relationship of SA to use the default encode mechanism of
> turbojson
> 2. write a customized "__json__"
>
> I think, the 1st choice is not acceptable in most cases, but the 2nd
> one is also annoying when we have more than 40 tables.
>
> So, how about this :
>
> Find a way to mark fields of an object that should be skimmed when
> encode it to JSON.

But the __json__-method is exactly such a spceification. And both the
__json__-method as well as any declarative approach you suggest (which
would be *very* hard to implement for us, because its's
SQLAlchemy/SQLObject that we'd need to change) are IMHO to limited anyway.

Because there might be occasions where I different json-representations
for the same classes.

So instead, I suggest you familiarize yourself with the simplejson API.
There you can declare custom serializers, and if you want, write one
that knows how to fully traverse an SA-object and it's relations.

This could be very well a recipe in the docs - but I don't think it's
justified to be included in the core, as the need for customization is
very high so everybody will have to write json-encoders anyway.

Diez

David Broudy

unread,
Jul 24, 2009, 12:19:32 PM7/24/09
to turbo...@googlegroups.com
On Jul 24, 2009, at 4:20 AM, 一首诗 wrote:

we have 2 choices:

1. don't use relationship of SA to use the default encode mechanism of
turbojson
2. write a customized "__json__"

I think, the 1st choice is not acceptable in most cases, but the 2nd
one is also annoying when we have more than 40 tables.


There's at least one other choice: Use the jsonify.when decorator. If you can define your criteria using hasattr instead of isinstance, then this should be less tedious than defining __json__ on every class. Sorry, I don't have an example of this I can post, as it was easier in my case to just use __json__  In fact, I'm not sure where the decorator is documented, but looking at the source it's pretty obvious, and it's just a thin wrapper of prioritized_methods.prioritized_when anyway.

Hope that helps.

Dave Broudy
Three Aspen Software
2922 Evergreen Pkwy, Suite 311
Evergreen, CO 80439
Office: 303.278.0908 // Fax: 303.500.3112
AIM/YIM: dbroudy

ob1.data

unread,
Aug 13, 2009, 2:25:14 AM8/13/09
to TurboGears
I'm a newbie, so don't quote me on this but I think I read somewhere
that JSON does not scale well. So I think the alternative of using
XML encoding to serialize might be better. Also, I think you need
Elixir installed on top of SQLAlchemy to handle M:N relationships in
the traditional fashion. Then maybe it'd be simpler.

Jorge Vargas

unread,
Aug 13, 2009, 4:05:29 AM8/13/09
to turbo...@googlegroups.com
I agree with Diez on this.

pulling "everything in" is bad and putting too little is also bad.
__json__ was invented for these cases. Now could you please explain
why 40 tables = 40 __json__'s ? Sorry to say this but if you have back
and forth relationship to everything then there is something wrong
with your DB design instead of the json layer, allll those joins will
be painful and your network objects (size of the json data will be
huge).

Jorge Vargas

unread,
Aug 13, 2009, 4:12:58 AM8/13/09
to turbo...@googlegroups.com
On Fri, Jul 24, 2009 at 12:19 PM, David Broudy<dbr...@threeaspen.com> wrote:
> On Jul 24, 2009, at 4:20 AM, 一首诗 wrote:
>
> we have 2 choices:
>
> 1. don't use relationship of SA to use the default encode mechanism of
> turbojson
> 2. write a customized "__json__"
>
> I think, the 1st choice is not acceptable in most cases, but the 2nd
> one is also annoying when we have more than 40 tables.
>
>
> There's at least one other choice: Use the jsonify.when decorator. If you
> can define your criteria using hasattr instead of isinstance, then this
> should be less tedious than defining __json__ on every class. Sorry, I don't
> have an example of this I can post, as it was easier in my case to just use
> __json__  In fact, I'm not sure where the decorator is documented, but
> looking at the source it's pretty obvious, and it's just a thin wrapper
> of prioritized_methods.prioritized_when anyway.
> Hope that helps.

That behavior is stripped in TG2.1 mainly because no one really used
it. And getting rid of ruledispatch was a big deal. Currently the code
has no alternative. Your solution for this problem could work with the
"when" decorator but I don't see it as the best solution. Which
criteria will you use to strip all unwanted attributes from all
objects? It seems to me like a very big set of hasattr check that will
return false on all objects that don't have it and only true on a flew
that do.

Wouldn't it be better to subclass the serializer and do things from
there? again which will be the criteria? Will anyone come up with an
example where this is needed?

Back to the original point I seriously doubt you will need that many
__json__ methods and I also doubt there is a generic way to serialize
relationships that will make everyone happy.

Jorge Vargas

unread,
Aug 13, 2009, 4:18:04 AM8/13/09
to turbo...@googlegroups.com
On Thu, Aug 13, 2009 at 2:25 AM, ob1.data<ob1....@gmail.com> wrote:
>
> I'm a newbie, so don't quote me on this but I think I read somewhere
> that JSON does not scale well. So I think the alternative of using
> XML encoding to serialize might be better.

that isn't really true. Almost every json string will be smaller than
the XML equivalent. The only exception is some deformation (which by
the way somehow breaks the idea of XML) that some protocols do, for
example the webservices defined by WSGI have some "optimization" where
you can extract a set of the XML and replace it with a reference then
put that set below in the XML stream which saves you space because the
block of code is only one time in the steam and you have several
references to it. That in itself has two mayor problems.
1- It break the concept that XML is human reable, because it breaks
the flow and more importantly
2- It means your data model is broken! Why do you want to store the
same information twice in the same stream??

> Also, I think you need
> Elixir installed on top of SQLAlchemy to handle M:N relationships in
> the traditional fashion. Then maybe it'd be simpler.

huh? have you used sa declarative? I think you are confusing active
record (what you call traditional) with "data mapper ("SA's way")

David Broudy

unread,
Aug 13, 2009, 1:28:34 PM8/13/09
to turbo...@googlegroups.com
On Aug 13, 2009, at 2:12 AM, Jorge Vargas wrote:

Your solution for this problem could work with the
"when" decorator but I don't see it as the best solution. Which
criteria will you use to strip all unwanted attributes from all
objects? It seems to me like a very big set of hasattr check that will
return false on all objects that don't have it and only true on a flew
that do.

Wouldn't it be better to subclass the serializer and do things from
there? again which will be the criteria? Will anyone come up with an
example where this is needed?


The case I was working with was serializing the lists from an AssociationProxy. I'm pretty new to TG, but it seems to me the default behavior isn't great, at least in TurboJson 1.2.1, skipping keys starting with _sa_ but including the private _AssociationProxy_* collections, which aren't really in a usable format. I think it should either skip the private proxy collection too, or include it in a usable form. Undoubtedly skipping it makes more sense as a default behavior. Maybe this is really a SA issue, because not all of their private keys start with _sa_, but it seems easy enough to fix at the turbojson level.

My model only had one class that used an association proxy, but if you had a bunch of classes that had a reasonably finite set of objects in an association proxy (say <10 keywords each), I think that's a use case that justifies a global way to do it for a given project, such as the when decorator. And if they were all called "keywords", then it's only one hasattr to check.

I hadn't thought about subclassing the whole serializer. I think that's a little bit more intimidating, but like I said, I'm pretty new to TG, so I'm still trying to absorb it all. In my case, subclassing would have made sense, because I wanted to take out something (_AssociationProxy_* keys) and I couldn't figure out how to do that in a when decorator without basically overriding the whole process anyway.

I think the decorator syntax is nice and clean, and the best examples are in TurboJson it self: datetime, decimal and explicit. Presumably those are handled in TG 2.1 as well, so hopefully there's a nice way to subclass and override similar types of objects, such as a duration or a currency.

Jorge Vargas

unread,
Aug 13, 2009, 10:32:47 PM8/13/09
to turbo...@googlegroups.com
On Thu, Aug 13, 2009 at 1:28 PM, David Broudy<dbr...@threeaspen.com> wrote:
>
> On Aug 13, 2009, at 2:12 AM, Jorge Vargas wrote:
>
> Your solution for this problem could work with the
> "when" decorator but I don't see it as the best solution. Which
> criteria will you use to strip all unwanted attributes from all
> objects? It seems to me like a very big set of hasattr check that will
> return false on all objects that don't have it and only true on a flew
> that do.
>
> Wouldn't it be better to subclass the serializer and do things from
> there? again which will be the criteria? Will anyone come up with an
> example where this is needed?
>
>
> The case I was working with was serializing the lists from an
> AssociationProxy. I'm pretty new to TG, but it seems to me the default
> behavior isn't great, at least in TurboJson 1.2.1, skipping keys starting
> with _sa_ but including the private _AssociationProxy_* collections, which
> aren't really in a usable format. I think it should either skip the private
> proxy collection too, or include it in a usable form. Undoubtedly skipping
> it makes more sense as a default behavior. Maybe this is really a SA issue,
> because not all of their private keys start with _sa_, but it seems easy
> enough to fix at the turbojson level.

Umm this is interesting AssociationProxy is an extension that doesn't
follow the _sa_ convention. And I agree this is a rather simple patch.
In fact it should be as simple as adding a new function to
http://svn.turbogears.org/projects/TurboJson/trunk/turbojson/jsonify.py
will you work on a patch + test for it? I don't have any models around
that use that extension.

> My model only had one class that used an association proxy, but if you had a
> bunch of classes that had a reasonably finite set of objects in an
> association proxy (say <10 keywords each), I think that's a use case that
> justifies a global way to do it for a given project, such as the when
> decorator. And if they were all called "keywords", then it's only one
> hasattr to check.
> I hadn't thought about subclassing the whole serializer. I think that's a little bit more intimidating, but like I said, I'm pretty new to TG, so I'm still trying to absorb it all.
> In my case, subclassing would have made sense, because I wanted to take out
> something (_AssociationProxy_* keys) and I couldn't figure out how to do
> that in a when
> decorator without basically overriding the whole process anyway.

In TurboJson this will be done with a simple jsonify.when decorated
function that will magically get called by TurboJson. As explained
here http://docs.turbogears.org/1.0/JsonifyDecorator

Now as I said that has been removed (although an equivalent
implementation could be set in place for 2.1, the question is do we
want it?) the proper way will be to subclass the Encoder here and (for
now monkey patch L39: _instance = GenericJSON()) to put in your
modified instance. Suggestions on how to make that better are welcome.

So in summary I believe subclassing GenericJSON + telling TG to use
your version is WAY simpler than having someone go try to understand
generic functions and then implement one.

Back to the specific case I really believe this should be patched into
tg2.1 and TurboJson as AssociationProxy needs to be filtered (at least
by default) just like every other SA relation.

>
> I think the decorator syntax is nice and clean, and the best examples are in
> TurboJson it self: datetime, decimal and explicit. Presumably those are
> handled in TG 2.1 as well, so hopefully there's a nice way to subclass and
> override similar types of objects, such as a duration or a currency.

yes indeed the functionality is 99% equivalent the only thing missing
is the "when", the problem is that the old implementation used generic
functions which well created a ton of dependencies and did not bring
in a lot. I know it sounds like a lot but pretty much all of TurboJson
was replaced by one 50 lines file
http://hg.turbogears.org/tg-21/src/tip/tg/jsonify.py which is
equivalent and I'm pretty sure it's faster.

David Broudy

unread,
Aug 14, 2009, 11:45:45 AM8/14/09
to turbo...@googlegroups.com

On Aug 13, 2009, at 8:32 PM, Jorge Vargas wrote:

>
> Umm this is interesting AssociationProxy is an extension that doesn't
> follow the _sa_ convention. And I agree this is a rather simple patch.
> In fact it should be as simple as adding a new function to
> http://svn.turbogears.org/projects/TurboJson/trunk/turbojson/
> jsonify.py
> will you work on a patch + test for it? I don't have any models around
> that use that extension.

Yeah, I should be able to do that next week. Though looking at 2.1 (or
2.0 for that matter) I don't think it can be done in a function, but
is a one line change to the existing conditional.:

if not key.startswith('_sa_') and not
key.startswith('_AssociationProxy_':

Are there instructions to get a source environment for 2.1 anywhere,
or is it just as simple as replacing subversion with mercurial from
the 2.0 docs?

> Now as I said that has been removed (although an equivalent
> implementation could be set in place for 2.1, the question is do we
> want it?) the proper way will be to subclass the Encoder here and (for
> now monkey patch L39: _instance = GenericJSON()) to put in your
> modified instance. Suggestions on how to make that better are welcome.

I'm not familiar with what "monkey patch L39" references. As far as
making is easier to subclass, I think some additional callback to
deal with relationships would be extremely helpful, and save the
inheritor from having to reimplement the entire is_saobject(obj)
condition, including filtering out of private SA attributes, which may
be subject to change. Just implementing your own default won't do it,
because the key is skipped before the value. Actually, what about
moving the startswith _sa_ check to it's own function that could then
be overridden? That would allow derived classes to exclude at the
field or relationship level, using the key, while still calling
through to the super class that can implement the private key
behavior. Is that what you meant by a new function above? If so, I
guess it just took me a little while to get there :)

Jorge Vargas

unread,
Aug 15, 2009, 1:25:10 AM8/15/09
to turbo...@googlegroups.com
On Fri, Aug 14, 2009 at 11:45 AM, David Broudy<dbr...@threeaspen.com> wrote:
>
>
> On Aug 13, 2009, at 8:32 PM, Jorge Vargas wrote:
>
>>
>> Umm this is interesting AssociationProxy is an extension that doesn't
>> follow the _sa_ convention. And I agree this is a rather simple patch.
>> In fact it should be as simple as adding a new function to
>> http://svn.turbogears.org/projects/TurboJson/trunk/turbojson/
>> jsonify.py
>> will you work on a patch + test for it? I don't have any models around
>> that use that extension.
>
> Yeah, I should be able to do that next week. Though looking at 2.1 (or
> 2.0 for that matter) I don't think it can be done in a function, but
> is a one line change to the existing conditional.:
>
> if not key.startswith('_sa_') and not
> key.startswith('_AssociationProxy_':
>
that will be the fix for TG2.1 for 2.0 you will need to add the
function to TurboJson I think you confused both implementations.

> Are there instructions to get a source environment for 2.1 anywhere,
> or is it just as simple as replacing subversion with mercurial from
> the 2.0 docs?
>
Right, that hasn't been updated yet. You could just do the mercurial
checkouts or you can fork and request for a pull later.

>> Now as I said that has been removed (although an equivalent
>> implementation could be set in place for 2.1, the question is do we
>> want it?) the proper way will be to subclass the Encoder here and (for
>> now monkey patch L39: _instance = GenericJSON()) to put in your
>> modified instance. Suggestions on how to make that better are welcome.
>
> I'm not familiar with what "monkey patch L39" references. As far as
> making is easier to subclass,
you need to tell TG to use your class currently there is no way of
doing that. Therefore you will have to do something like this in your
code.

tg.json._instance = YourJSONClass() <-- that is referred to as monkey
patching. Because you are changing the behavior of a module from
outside the module which in some cases could generate very bad things.
In this case it is safe but not nice coding.



> I think some additional callback to
> deal with relationships would be extremely helpful, and save the
> inheritor from having to reimplement the entire is_saobject(obj)
> condition, including filtering out of private SA attributes, which may
> be subject to change.

This is correct it has changed a lot (take a look at TurboJson) but
since TG2 depends on SA<0.5 we can safely ignore those for now. THe
check for SA will probably evolve if SA>0.6 changes the way we can
detect it's non-column attributes.

> Just implementing your own default won't do it,
> because the key is skipped before the value.

I'm sorry I don't get this? you mean is_saobject returns false for
your extension?

> Actually, what about
> moving the startswith _sa_ check to it's own function that could then
> be overridden? That would allow derived classes to exclude at the
> field or relationship level, using the key, while still calling
> through to the super class that can implement the private key
> behavior. Is that what you meant by a new function above? If so, I
> guess it just took me a little while to get there :)
>
adding a callback could be a good idea. What we need to be careful is
that if you overwrite that call you won't overwrite everything else.

I suggest you start by coming up with a check to filter out the
AssociationProxy, then evolve that into something that will check for
SA too, with that we can figure out which is the best way to integrate
it back into json.py

>
> >
>
Reply all
Reply to author
Forward
0 new messages