#7052 - Fixing serialization for content types and auth

Russell Keith-Magee

unread,

Nov 5, 2009, 10:29:54 AM11/5/09

to Django Developers

Hi all,

Next on my pony list for v1.2: #7052 - fixing the serializers to work
around the problem of serializing dynamically created objects, such as
those produced by contrib.auth and contrib.contenttypes. I need some
feedback on how much of this solution we need, want, and are
comfortable seeing in trunk. Apologies in advance for the long post.

For those not familiar with the problem - these two apps dynamically
create data as part of the syncdb process. As a result, the primary
keys for these objects aren't necessarily consistent after a syncdb,
so fixtures can't reliably refer to auth permissions or content types.
The problem is more general than these two apps specifically, but
these two are the ones that most people get bitten by early on in the
testing process.

The solution that I've been intending to implement for a while is an
extension to Django's serialization syntax: wherever a primary key is
legal, we will also allow a dictionary-like structure (whatever the
serialization format allows) that equates to the kwargs that will be
passed to a Model.objects.get() call.

So - instead of a JSON fixture reading:

{
"pk": 1,
"model": "myapp.mymodel",
"fields": {
"name": "foobar",
"content_type": 3
}
}

which hardcodes the primary key value of 3 for a content type, we would allow:

{
"pk": 1,
"model": "myapp.mymodel",
"fields": {
"name": "foobar",
"content_type": {
"app_name": "otherapp",
"model": "othermodel"
}
}
}

The serializer will then do
ContentType.objects.get(app_name='otherapp', model='othermodel') to
resolve the actual primary key at runtime. Analogous syntax would
exist for XML, PyYAML, etc.

Now, there are two parts to the solution. The deserializer is easy -
write a handler for the dictionary syntax for primary keys, and you're
done. Easy to implement, easy to test.

The serializer isn't so easy, however. Determining when to output a
lookup dictionary for a primary key isn't trivial. Here are some
options:

Option 1: Ignore the problem
-----------------------------------------

Implement the deserializer, but don't try and solve the serialization
problem. Treat the lookup syntax for primary keys as a nifty extra you
can exploit by hand if need be. Serialization generates integer
primary keys, and you can hand modify fixtures to use lookup syntax if
you want to.

Option 2: Add a Meta argument for serialization
--------------------------------------------------------------------

This is essentially what the patch on #7052 currently implements.

Under this approach, a model that is known to engage in dynamic data
creation can mark itself for dynamic dumping, indicating the fields
that should be used for that dump. For example, ContentTypes would
contain something like:

class Meta:
...
dump_related = ('app_label','model')

which indicates the two fields that should be used to construct the
lookup dictionary whenever a ContentType object is serialized.

The problem with this approach is that hard-codes a single aspect of
serialization into the model. If someone has a different set of
requirements for serializing content types under particular
circumstances, they will be out of luck.

Option 3: Add flags/arguments to the serializer to control dynamic dumping
------------------------------------------------------------------------------------------------------------

i.e.,
./manage.py dumpdata myapp --format=json --indent=2
--lookup=contenttypes.contenttype(app_label,model)

It might be possible to simplify this a little by saying that when
--lookup=contenttypes.contenttype is specfied, the first
unique_together tuple will be used to construct the lookup.

This puts complete control in the hand of the user at serialization
time. However, the syntax isn't especially elegant, especially given
that every single serialization of contenttypes and permissions will,
in practice, need to use the --lookup argument.

Option 4: An all-singing, all-dancing serialization framework rework
------------------------------------------------------------------------------------------------

Django's serialization format is fairly limited, and there have been
many proposals to add features to the output format (serializing
non-model properties, reverse relations, deep relations, etc). I've
been holding off on these in favour of a larger rework of the
serialization framework.

In my minds eye, I have a vision of a serialization framework that
would allow for registration of different serialization formats - not
just JSON/XML, but the fields and internal structure of a JSON
fixture, etc. Describing which fields should be rendered as lookups,
how the lookup would be determined, and under what conditions a lookup
should be used would all just be a configuration items on a
serialization definition.

This is obviously a much larger body of work, and certainly wouldn't
get done for v1.2 - if only because I haven't done any planning,
prototype implementation, or community review.

Option 5: Something else
-------------------------------------

I'm open to any other suggestion.

The good news in all this is that Option 1 isn't mutually exclusive to
the other options - we can land Option 1 right now and get the
advantages of dynamic lookups, and then worry about how to close the
loop as a second problem.

So - feedback welcome. Which option should we pursue?

Yours,
Russ Magee %-)

Eric Holscher

unread,

Nov 5, 2009, 10:46:25 AM11/5/09

to django-d...@googlegroups.com

On Thu, Nov 5, 2009 at 9:29 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:

Hi all,

Next on my pony list for v1.2: #7052 - fixing the serializers to work
around the problem of serializing dynamically created objects, such as
those produced by contrib.auth and contrib.contenttypes. I need some
feedback on how much of this solution we need, want, and are
comfortable seeing in trunk. Apologies in advance for the long post.

No worries. This is one of my ponies as well.

Option 1: Ignore the problem
-----------------------------------------

Implement the deserializer, but don't try and solve the serialization
problem. Treat the lookup syntax for primary keys as a nifty extra you
can exploit by hand if need be. Serialization generates integer
primary keys, and you can hand modify fixtures to use lookup syntax if
you want to.

I think this is a viable option, as you say just to get the code checked in. A surprising number of people write fixtures by hand, and this would allow them to get nice control over loading. Editing generated ones automatically or by hand afterwards also isn't that hard. That said, I think there is a better middle ground.

Option 2: Add a Meta argument for serialization
--------------------------------------------------------------------

The problem with this approach is that hard-codes a single aspect of
serialization into the model. If someone has a different set of
requirements for serializing content types under particular
circumstances, they will be out of luck.

I agree that this approach feels a bit heavy. I don't know if we really need something on the model that tells us how to dump something. Don't we already have that data? unique_together and unique=True should provide us enough information for making at least a naive implementation.

Option 3: Add flags/arguments to the serializer to control dynamic dumping
------------------------------------------------------------------------------------------------------------

i.e.,
./manage.py dumpdata myapp --format=json --indent=2
--lookup=contenttypes.contenttype(app_label,model)

It might be possible to simplify this a little by saying that when
--lookup=contenttypes.contenttype is specfied, the first
unique_together tuple will be used to construct the lookup.

This puts complete control in the hand of the user at serialization
time. However, the syntax isn't especially elegant, especially given
that every single serialization of contenttypes and permissions will,
in practice, need to use the --lookup argument.

This is somewhere along the lines of the approach that I was considering. I'll explain below.

Option 4: An all-singing, all-dancing serialization framework rework
------------------------------------------------------------------------------------------------

Django's serialization format is fairly limited, and there have been
many proposals to add features to the output format (serializing
non-model properties, reverse relations, deep relations, etc). I've
been holding off on these in favour of a larger rework of the
serialization framework.

In my minds eye, I have a vision of a serialization framework that
would allow for registration of different serialization formats - not
just JSON/XML, but the fields and internal structure of a JSON
fixture, etc. Describing which fields should be rendered as lookups,
how the lookup would be determined, and under what conditions a lookup
should be used would all just be a configuration items on a
serialization definition.

This is obviously a much larger body of work, and certainly wouldn't
get done for v1.2 - if only because I haven't done any planning,
prototype implementation, or community review.

Yes please, but lets get something usable into 1.2.

Option 5: Something else
-------------------------------------

I'm open to any other suggestion.

The good news in all this is that Option 1 isn't mutually exclusive to
the other options - we can land Option 1 right now and get the
advantages of dynamic lookups, and then worry about how to close the
loop as a second problem.

So - feedback welcome. Which option should we pursue?

Well, since you asked for it ;)

I have solved this in a proof of concept way in my playground code[0]. Pinax also took this implementation and adapted it to work with json[1].

The basic approach, is that on serialization, you introspect the model, and see if it has any unique or unique_together fields. If there are these fields present, then those should be queried and used as the unique constraints when outputting the fixture.

This gives us a neat fallback, where if the only unique constraint is the Autofield generated, then it automatically just outputs the query as the pk query that is used now.

I think this gives us the most amount of automatic ability, outputting things where we obviously know they are unique. I think combining this with some version of Option 3, allowing users to specify on the command line which fields they wish to output, gets us most of the rest of the way.

My use case for the command line options is normally having a Slugfield on a model that isn't declared as unique=True, but for the data set that I have, it is.

Cheers,
Eric

[0] http://github.com/ericholscher/sandbox/blob/d32da8c36f257bb973a5c0b0fd8f9bca79062f11/serializers/yamlfk.py#L101
[1] http://github.com/pinax/pinax/blob/025113cbd1e36eefae375493569bb8817c7095fe/pinax/core/serializers/jsonfk.py#L74

Jacob Kaplan-Moss

unread,

Nov 5, 2009, 11:21:50 AM11/5/09

to django-d...@googlegroups.com

On Thu, Nov 5, 2009 at 9:29 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:

> Option 1: Ignore the problem

One step at a time; my preference would be to get this done first,
then start looking for better approaches on the serialization side.
I'd rather get *something* done than spend time looking for a perfect
approach.

That said, I do think Eric's suggestion of automatically introspecting
for unique/unique_together is the best idea I've seen yet. Well, the
best idea that doesn't require a ground-up rewrite of serialization,
that is.

Jacob

Travis Cline

unread,

Nov 5, 2009, 12:55:08 PM11/5/09

to Django developers

On Nov 5, 10:21 am, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
> That said, I do think Eric's suggestion of automatically introspecting
> for unique/unique_together is the best idea I've seen yet. Well, the
> best idea that doesn't require a ground-up rewrite of serialization,
> that is.
>
> Jacob

I wanted to point out I have a little project on github with the aim
of dictionary-based references in fixtures:
http://github.com/traviscline/django-smart-serializers

Needs some work and more tests but some of the bits mentioned above
are there.

My initial aim is to be independent of core but last item on my list
of goals is to provide the work as a patch.

Travis

mattim...@gmail.com

unread,

Nov 5, 2009, 10:41:53 PM11/5/09

to Django developers

Hi Russ,

+1. I need to ship a default set of groups with permissions assigned
to them in my app. This would make it much easier to do.

>
> Option 2: Add a Meta argument for serialization
> --------------------------------------------------------------------
>
> This is essentially what the patch on #7052 currently implements.
>
> Under this approach, a model that is known to engage in dynamic data
> creation can mark itself for dynamic dumping, indicating the fields
> that should be used for that dump. For example, ContentTypes would
> contain something like:
>
> class Meta:
> ...
> dump_related = ('app_label','model')
>
> which indicates the two fields that should be used to construct the
> lookup dictionary whenever a ContentType object is serialized.
>
> The problem with this approach is that hard-codes a single aspect of
> serialization into the model. If someone has a different set of
> requirements for serializing content types under particular
> circumstances, they will be out of luck.
>

-1

I have an obvious bias [1] but this would be my preferred option with
Option 3 being implemented on top of it. I believe my Django Full
Serializers implement 90% of what you are after (there is also a patch
for reverse relations in the issue tracker). It is missing
deserializing "full" serialized models and the ability to customize
the internal structure of the output. I have a test suite for it but
it needs to be extracted/rewritten as it depends on models in an
internal project.

Would it be worth my while reworking my code as a patch against Django
trunk?

[1] http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers

regards

Matthew

Russell Keith-Magee

unread,

Nov 5, 2009, 11:39:08 PM11/5/09

to django-d...@googlegroups.com

I can see what you're driving at here, but I have some reservations.

Firstly, not all models with unique fields need to be serialized as
lookups. This introduces a potential efficiency issue. Assigning a
literal PK is much faster than a database lookup. Fixture loading is
already one of the two weak points in testing speed. Introducing
lookups that aren't strictly necessary into fixtures will just make
the matter worse.

There is also potentially a complexity issue. At the moment, circular
dependencies are easy to resolve - assign a numerical PK, and defer
integrity checks to the end of the transaction. Since FK values can
themselves be part of unique_together constraints, you could end up
with a situation where object 1 needs to lookup object 2, but object 2
needs to lookup object 1.

Secondly, not all models that need to be serialized as lookups have
unique fields. It's easy to create a management.py trigger to create
dynamic content, but there's no requirement that the models involved
have unique non-primary keys.

In summary - I think the 'first unique/unique together field'
technique is an excellent way to determine which fields should be used
for serialization once you have determined that a model should be
serialized as a lookup. However, I don't think it's a very good
criterion for determining that a lookup is required in the first
place.

Manually specifying models at the command line (--lookup) is a
longwinded approach, and it's easy to accidentally forget which models
need to be serialized as lookups. However, it doesn't suffer from the
flaws that the automatic unique-based criterion has. It's not pretty,
but it might be enough to tide us over until a full serialization
rewrite can happen.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Nov 5, 2009, 11:48:55 PM11/5/09

to django-d...@googlegroups.com

On Fri, Nov 6, 2009 at 11:41 AM, mattim...@gmail.com
<mattim...@gmail.com> wrote:
>
>> In my minds eye, I have a vision of a serialization framework that
>> would allow for registration of different serialization formats - not
>> just JSON/XML, but the fields and internal structure of a JSON
>> fixture, etc. Describing which fields should be rendered as lookups,
>> how the lookup would be determined, and under what conditions a lookup
>> should be used would all just be a configuration items on a
>> serialization definition.
>>
>> This is obviously a much larger body of work, and certainly wouldn't
>> get done for v1.2 - if only because I haven't done any planning,
>> prototype implementation, or community review.
>
> I have an obvious bias [1] but this would be my preferred option with
> Option 3 being implemented on top of it. I believe my Django Full
> Serializers implement 90% of what you are after (there is also a patch
> for reverse relations in the issue tracker). It is missing
> deserializing "full" serialized models and the ability to customize
> the internal structure of the output.

I've seen your Full Serializers before, and it looks like good stuff -
but I disagree about your '90% of the problem' analysis. To my mind,
including extra properties into the serialized data is the easy part
of the problem. The hard bit is getting real control over the
serialization format. Once you have good control over the
serialization format, serializing extra fields or deep relations is
almost a trivial afterthought.

I'm also hesitant to revisit the sins of modelform_factory and generic
views. Django has repeatedly discovered that trying to push
configuration through the argument list of a function is a recipe for
problems in the long term. To whit, from your own docs:

serializers.serialize('json', Group.objects.all(), indent=4,
relations={'permissions':{'relations':{'content_type':{'excludes':('app_label',)}}}})

The real solution is to use a full class-based representation.

> Would it be worth my while reworking my code as a patch against Django
> trunk?

At this point, probably not. A full serialization rewrite is on my
list of medium-term things to do (or list of things to try and
encourage other people to do :-), which is why the tickets around
'full serialization' have languished for so long. This is one of those
occasions where I'm deliberately not fixing part of the problem to
encourage attempts to fix the whole problem.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Nov 6, 2009, 12:03:42 AM11/6/09

to django-d...@googlegroups.com

Hi Travis,

The patch on #7052 already does the dictionary-based lookup part of
the problem, as well as implementing option 2 from my original mail.
We probably won't end up using the serialization part (the Meta flag)
but the deserializers already work pretty well. I expect that the
deserialization/lookup part of the patch will be in trunk quite soon.

Yours,
Russ Magee %-)

mattim...@gmail.com

unread,

Nov 6, 2009, 1:38:48 AM11/6/09

to Django developers

On Nov 6, 3:48 pm, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> On Fri, Nov 6, 2009 at 11:41 AM, mattimust...@gmail.com

I've just been looking at #6735 - Class based Generic Views, and the
patches attached there don't seem to eliminate the convention of using
dicts to pass arguments/configuration to the views. They just make it
easier to override the behaviour.

How do you imagine this being different in (more) class-based
serialization?

The way I think of my current implementation is that the serializer
options are simple configuration (except for very deep relations), and
if indented well is not that daunting to maintain.

The current serializer classes have start_serialization,
end_serialization, start_object, end_object methods that can be
overridden, aren't these enough to handle custom serialization
formats?

> > Would it be worth my while reworking my code as a patch against Django
> > trunk?
>
> At this point, probably not. A full serialization rewrite is on my
> list of medium-term things to do (or list of things to try and
> encourage other people to do :-), which is why the tickets around
> 'full serialization' have languished for so long. This is one of those
> occasions where I'm deliberately not fixing part of the problem to
> encourage attempts to fix the whole problem.

Given my interest in the area I would be willing to work on this if we
can come to a consensus on what will fix the whole problem.

regards

Matthew

>
> Yours,
> Russ Magee %-)

Russell Keith-Magee

unread,

Nov 6, 2009, 6:30:54 PM11/6/09

to django-d...@googlegroups.com

I just had another look at Jacob's github branch for #6735 - I
actually wasn't aware that Jacob had made the constructor to the class
based generic views so open ended.

However, the important feature of class-based generic views isn't the
constructor - it's that you can override the functions that return
those values. Sure, by default get_foo just returns the foo argument,
but the interesting feature of class-based views is that you can
replace get_foo with an implementation that does any sort of
conditional processing based on state, arguments, etc.

> The way I think of my current implementation is that the serializer
> options are simple configuration (except for very deep relations), and
> if indented well is not that daunting to maintain.
>
> The current serializer classes have start_serialization,
> end_serialization, start_object, end_object methods that can be
> overridden, aren't these enough to handle custom serialization
> formats?

Python provides the tools to write HTML headers. Isn't that enough to
build a website? :-)

I'm looking at the serialization framework as a way of turning a
model, or list of models, or list of models plus some metadata, into
an JSON/XML/YAML data structure that can be consumed by an arbitrary
endpoint. For example, I don't consider it a given that Django should
serialize "pk", "model" and "fields" as the top level structure of an
object. I also don't consider it a given that a serialization should
necessarily be deserializable.

>> > Would it be worth my while reworking my code as a patch against Django
>> > trunk?
>>
>> At this point, probably not. A full serialization rewrite is on my
>> list of medium-term things to do (or list of things to try and
>> encourage other people to do :-), which is why the tickets around
>> 'full serialization' have languished for so long. This is one of those
>> occasions where I'm deliberately not fixing part of the problem to
>> encourage attempts to fix the whole problem.
>
> Given my interest in the area I would be willing to work on this if we
> can come to a consensus on what will fix the whole problem.

Excellent. I've got a busy weekend ahead of me, so I don't have time
right now to formalize the ideas I have. However, I tried
(unsuccessfully) to motivate a couple of students to follow this topic
for the GSoC this year; there are a couple of threads where I tried to
express the ideas that I had in mind. It's not an ideal starting
point, but hopefully it will give you an insight into what I have in
mind.

Once my weekend settles down, I'll try to get some ideas down on
paper. We probably won't be able to get anything this big into v1.2,
but if we can nail down a spec (and maybe an implementation) over the
next couple of months, we'll be in a really good position for v1.3 -
or at least have a really good external project for the community to
use.

Yours,
Russ Magee %-)

Rob Madole

unread,

Nov 6, 2009, 7:37:56 PM11/6/09

to Django developers

(accidentally sent this directly to Russell, he has offered to help me
re-construct this where it belongs)

I am under-the-covers-down-and-naughty familiar with this one.

This one has caused pain for us. We had a situation very similar to
what #7052 outlines and I ended up forking Django just to solve it and
implementing the attached patch. So, from my selfish point of view I
don't care where this one ends up I just want:

# To not fork Django
# To have my tests work

+1. I agree this is a good features.

>
> Option 2: Add a Meta argument for serialization
> --------------------------------------------------------------------
>
> This is essentially what the patch on #7052 currently implements.
>
> Under this approach, a model that is known to engage in dynamic data
> creation can mark itself for dynamic dumping, indicating the fields
> that should be used for that dump. For example, ContentTypes would
> contain something like:
>
> class Meta:
> ...
> dump_related = ('app_label','model')
>
> which indicates the two fields that should be used to construct the
> lookup dictionary whenever a ContentType object is serialized.
>
> The problem with this approach is that hard-codes a single aspect of
> serialization into the model. If someone has a different set of
> requirements for serializing content types under particular
> circumstances, they will be out of luck.
>

-1. But it DOES work. And it works really well. If I remember
correctly, part of the work the patch does is alter the ContentType
model to add the dump_related exactly as you mentioned.

I wonder how nasty it would be if ContentType and Permission used this
in a very undocumented way until the major serialization re-write
could be accomplished.

> Option 3: Add flags/arguments to the serializer to control dynamic dumping
> --------------------------------------------------------------------------- ---------------------------------
>
> i.e.,
> ./manage.py dumpdata myapp --format=json --indent=2
> --lookup=contenttypes.contenttype(app_label,model)
>
> It might be possible to simplify this a little by saying that when
> --lookup=contenttypes.contenttype is specfied, the first
> unique_together tuple will be used to construct the lookup.
>
> This puts complete control in the hand of the user at serialization
> time. However, the syntax isn't especially elegant, especially given
> that every single serialization of contenttypes and permissions will,
> in practice, need to use the --lookup argument.
>

My concern for this one is forgetting to add the option every time I
dump the fixtures. (Which is quite often in our environment). I
would end up writing my own management command to wrap this so I
wouldn't forget.

Let's say dump_related is out in the Model class, I agree this is not
where this belongs. What if we had a slightly altered way of
describing it. In it you could have something like this:

from django.contrib.contenttypes.models import ContentType
from django.core.serializers import register_serializer_style

register_serializer_style(ContentType, related=('app_label',
'model',))

I could put this one-liner anywhere. I would probably put it in my
models.py right after the class definition, or I might put it in the
project itself and leave it out of the app altogether. The important
thing is it gets it out of the Model class definition Meta section.
It could also be last-in wins, so if my way of serializing ContentType
differs from the default (which might be built-in to contrib) I could
change it somewhere else. I can't think of a good use case for why
you would need to serialize it differently but Russell did mention it
as a possibility.

One feature that we've been playing with is using a Python module to
seed our serialization. It looks something like this:

from django.contrib.auth.models import *
from myproject.models import *
from myproject.utils.serializer import S #'S' for "seed"

seed = (
S(User.objects.filter(staff_status=false)),
S(Group.objects.all()),
S(Permission.objects.all()),
S(BlogEntry.ojbects.all()[:200], recursive=False),
)

Our system is complex enough that we need to dump subsets of data,
filter by date, and some other junk before we ultimately pass it to
the serializer. I would like to see something like this added to the
next version as well.

J Meier

unread,

Nov 7, 2009, 12:48:01 AM11/7/09

to Django developers

I've implemented a nearly identical solution for this problem before,
and while it worked, it felt dirty.

It strikes me that the problem is to do with our "surrogate" primary
key ids, which don't relate to the data at all. For most models, this
is fine. The problem here is that the ids do not meaningfully identify
particular rows, and so we can't refer to them without psychic id-
guessing powers.

Wouldn't a simpler solution be to change ContentType and such to have
meaningful primary keys?

For ContentType, at least, this would require either creating a new
surrogate primary key that contained both the app and model names, or
else compound primary key support.

For you example, it would be something like

{

"pk": 1,
"model": "myapp.mymodel",
"fields": {
"name": "foobar",

"content_type": "otherapp.othermodel"
}
}

Or with a compound key,
{

"pk": 1,
"model": "myapp.mymodel",
"fields": {
"name": "foobar",

"content_type": ("otherapp","othermodel")
}
}

This would solve both the serialization and deserialization sides of
the problem.

Am I off the mark?

-Jim Meier

On Nov 5, 9:29 am, Russell Keith-Magee <freakboy3...@gmail.com> wrote:

Russell Keith-Magee

unread,

Nov 7, 2009, 7:19:36 PM11/7/09

to django-d...@googlegroups.com

On Sat, Nov 7, 2009 at 8:37 AM, Rob Madole <robm...@gmail.com> wrote:
>
> (accidentally sent this directly to Russell, he has offered to help me
> re-construct this where it belongs)

(and, repeating my response that got sent privately to Rob)

> I am under-the-covers-down-and-naughty familiar with this one.
>
> This one has caused pain for us. We had a situation very similar to
> what #7052 outlines and I ended up forking Django just to solve it and
> implementing the attached patch. So, from my selfish point of view I
> don't care where this one ends up I just want:
>
> # To not fork Django
> # To have my tests work

Fear not - my intention is that *something* will get into v1.2. I've
started this discussion to try and work out if there is anything
simple we can do for the serialization case, but if we can't come to a
consensus, I'm more than happy to commit just the Option 1,
deserialization only solution.

>> class Meta:
>> ...
>> dump_related = ('app_label','model')
>>
>> which indicates the two fields that should be used to construct the
>> lookup dictionary whenever a ContentType object is serialized.
>>
>> The problem with this approach is that hard-codes a single aspect of
>> serialization into the model. If someone has a different set of
>> requirements for serializing content types under particular
>> circumstances, they will be out of luck.
>>
>
> -1. But it DOES work. And it works really well. If I remember
> correctly, part of the work the patch does is alter the ContentType
> model to add the dump_related exactly as you mentioned.

Correct. If a Meta option was the best option, the patch on the ticket
is pretty much pret-a-porter. However, "it works" isn't really the
quality bar we're aiming for :-)

> I wonder how nasty it would be if ContentType and Permission used this
> in a very undocumented way until the major serialization re-write
> could be accomplished.

This is an interesting approach. My only hesitation is that, I'm not
sure how I feel about introducing a temporary option, not documenting
it, and then sorta-kinda encouraging people to use it. However, if we
approach the documentation process carefully, it might be manageable
(maybe introduce it into the docs, but marking it as deprecated from
the start)

>> Option 3: Add flags/arguments to the serializer to control dynamic dumping
>> --------------------------------------------------------------------------- ---------------------------------
>>
>> i.e.,
>> ./manage.py dumpdata myapp --format=json --indent=2
>> --lookup=contenttypes.contenttype(app_label,model)
>>
>> It might be possible to simplify this a little by saying that when
>> --lookup=contenttypes.contenttype is specfied, the first
>> unique_together tuple will be used to construct the lookup.
>>
>> This puts complete control in the hand of the user at serialization
>> time. However, the syntax isn't especially elegant, especially given
>> that every single serialization of contenttypes and permissions will,
>> in practice, need to use the --lookup argument.
>>
>
> My concern for this one is forgetting to add the option every time I
> dump the fixtures. (Which is quite often in our environment). I
> would end up writing my own management command to wrap this so I
> wouldn't forget.

This is also my primary concern with this technique.

> Let's say dump_related is out in the Model class, I agree this is not
> where this belongs. What if we had a slightly altered way of
> describing it. In it you could have something like this:
>
> from django.contrib.contenttypes.models import ContentType
> from django.core.serializers import register_serializer_style
>
> register_serializer_style(ContentType, related=('app_label',
> 'model',))
>
> I could put this one-liner anywhere. I would probably put it in my
> models.py right after the class definition, or I might put it in the
> project itself and leave it out of the app altogether. The important
> thing is it gets it out of the Model class definition Meta section.
> It could also be last-in wins, so if my way of serializing ContentType
> differs from the default (which might be built-in to contrib) I could
> change it somewhere else. I can't think of a good use case for why
> you would need to serialize it differently but Russell did mention it
> as a possibility.

And that's exactly what I've got in my mind's eye for Option 4 - the
complete rewrite. However, I don't want to commit to part of an API
without knowing what the rest of the API will look like.

> One feature that we've been playing with is using a Python module to
> seed our serialization. It looks something like this:
>
> from django.contrib.auth.models import *
> from myproject.models import *
> from myproject.utils.serializer import S #'S' for "seed"
>
> seed = (
> S(User.objects.filter(staff_status=false)),
> S(Group.objects.all()),
> S(Permission.objects.all()),
> S(BlogEntry.ojbects.all()[:200], recursive=False),
> )
>
> Our system is complex enough that we need to dump subsets of data,
> filter by date, and some other junk before we ultimately pass it to
> the serializer. I would like to see something like this added to the
> next version as well.

This sounds like it might be out of scope for v1.2 (simply because
this is a completely new idea that hasn't been discussed before).

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Nov 7, 2009, 7:46:07 PM11/7/09

to django-d...@googlegroups.com

On Sat, Nov 7, 2009 at 1:48 PM, J Meier <jim-g...@dsdd.org> wrote:
>
> I've implemented a nearly identical solution for this problem before,
> and while it worked, it felt dirty.
>
> It strikes me that the problem is to do with our "surrogate" primary
> key ids, which don't relate to the data at all. For most models, this
> is fine. The problem here is that the ids do not meaningfully identify
> particular rows, and so we can't refer to them without psychic id-
> guessing powers.
>
> Wouldn't a simpler solution be to change ContentType and such to have
> meaningful primary keys?

There are three issues here.

Firstly, changing the definition of ContentType to avoid the problem
has been proposed in the past. However, this would be a *massive*
backwards incompatibility. Making this change isn't something we would
do lightly.

Secondly, the "right way" in a relational sense would be to have a
composite primary key over app_label and model. However, Django
doesn't currently support composite primary keys. So - in order to
follow this approach, we would need a finish a major rework of primary
key handling first.

Thirdly, that would solve the problem for ContentType, but not the
general problem. The general problem is that it is possible to
dynamically create data as part of a syncdb trigger, so any
autogenerated primary key isn't useful. Sure - we can change
ContentType to avoid the problem, but that doesn't make the problem go
away for the general case. It's easy to define a model for which an
autogenerated integer primary key is appropriate, which has
dynamically created syncdb content. These are the models that aren't
currently serializable.

> For ContentType, at least, this would require either creating a new
> surrogate primary key that contained both the app and model names, or
> else compound primary key support.
>
> For you example, it would be something like
>
> {
> "pk": 1,
> "model": "myapp.mymodel",
> "fields": {
> "name": "foobar",
> "content_type": "otherapp.othermodel"
> }
> }
>
> Or with a compound key,
> {
> "pk": 1,
> "model": "myapp.mymodel",
> "fields": {
> "name": "foobar",
> "content_type": ("otherapp","othermodel")
> }
> }
>
> This would solve both the serialization and deserialization sides of
> the problem.
>
> Am I off the mark?

Not entirely. The idea of having a surrogate primary key is
interesting - it essence it isn't that far from Option 2 in my
original list.

Option 2 introduced the idea of defining a 'dump_relation' key to Meta
to define the columns that should be used to define serialization. In
essence, this is defining the surrogate primary key for the model. In
most practical uses, this will be either a unique column, or a
unique_together tuple - which means that the grouping will effectively
be a surrogate primary key.

The syntactical difference in your proposal (i.e., 'myapp.mymodel',
rather than {'app_label': 'myapp', 'model': 'mymodel'} requires that
the model provide specific serialization/deserialization tools for
your model.

So if we said that if a model provides a get_surrogate_key() and the
model manager provides a get_instance_by_surrogate_key() method (not
happy with the names, but it's a bikeshed for the moment), the
serializer/deserializer will use those methods instead of to_python()
on the PK for (de)serializing the model.

The downside of this approach is exactly the same as it is for the
original Option 2 - it means you can only define one way to serialize
the model. However, this may not be such a problem as we're expressing
this in terms of surrogate keys, rather than a specific serialization
property that might change with the serialization technique.

And, again - this is actually compatible with Option 1. As syntactic
sugar, we can still include dictionary based lookup syntax if we want
to - we just won't need to provide query based serialization support.

Yours,
Russ Magee %-)

Reply all

Reply to author

Forward