Contenttype Generation Inconsistency During Serialization

9 views
Skip to first unread message

james...@gmail.com

unread,
Feb 11, 2009, 2:48:40 PM2/11/09
to Django developers
There is a small road block that makes contenttype a little dangerous
to use during application development. Especially in regards to
serializing your data to different databases. During syncdb the
contenttypes are generated in a way that makes regeneration at a later
date inconsistent with the previously generated primary keys.

The contenttype IDs can be different depending on when your syncdb was
run in the development of your application. In addition, loaddata and
dumpdata are prevented from working correctly if the contenttypes have
already been created (integrity errors).

My use pattern for this is during application development I would
architect all of my data models before adding any explicit indexes.
After the models are complete and data is loaded I will analyze the
use patterns and index accordingly. Since django has no way of syncing
indexes my approach would be to dump the data to a JSON file, drop the
database and use syncdb to create the canonical copy.

My experience was as follows:
1st Try:
1. Dump data from old db using management command (dumpdata)
2. Drop DB and use django to create the database via syncdb
3. Load data using management command on the new database
4. Become irritated with integrity errors while the load tries to
import the contenttype table which already exists.
2nd Try:
1. Dump data from old db using management command (dumpdata),
excluding the contenttype table (-e contenttypes)
2. Drop DB and use django to create the database via syncdb
3. Load data using management command on the new database
4. Realize all data is completely useless since contenttype's PK's
are not connected to the same models as before.
3rd Try:
1. Dump data from old db using management command (dumpdata)
2. Drop DB and use django to create the database via syncdb
3. Truncate contenttype table
4. Load data using management command on the new database


Possible solution that doesn't suck a lot:
I came up with quite a few different ways to handle this, but the best
so far (even thought it's not stellar) is to create a new column in
contenttypes that's a combined column. The combined column would
contain the app_label and model_name.

GenericForeignKey could use the combined column instead of the PK to
keep the references pointing to the same locations. I understand there
are some performance implications here, but it's the best I can come
up with. I would love to hear thoughts on this topic.

Eric Holscher

unread,
Feb 11, 2009, 3:45:06 PM2/11/09
to django-d...@googlegroups.com
I have run into this problem as well, and have come up with a basic solution (for content types). The code is here: http://dpaste.com/119487/ . It is implemented as a serializer, which you would plug into django, and then use for serialization and deserialization of models with content types.

It is rather simple (only about 10 lines of additional code). When it is dumping data, it checks to see if the field it is dumping is a content type, and if so, it dumps a dictionary of app_label and model. Then, when this fixture is loaded back in, it runs a query against the Content Types for that object. Then plugs that in for the content type.

This fixes the problem of content types being an ID, and the ID's not matching when you move across databases (Your try #2).

I have also been working on a more generic solution to this problem. I have a copy of it on github(1). The approach taken there is similar. When it loads a ForeignKey field to be serialized, it checks the related model (the one being pointed to) for any unique constraints. If any of these exist, then the model is dumped as a dictionary of kwargs containing the key/value pair for these unique constraints.

The content type model doesn't define app_label and model as unique, which is a problem for this approach. If this ever gets into django core, it's going to require a special case for content type things (or some other approach which I haven't thought of). Having references to contrib apps is frowned upon, so I think having a third party serializer that does this is the answer for now.

Hope this helps

1. http://github.com/ericholscher/sandbox/blob/d32da8c36f257bb973a5c0b0fd8f9bca79062f11/serializers/yamlfk.py

--
Eric Holscher
Web Developer at The World Company in Lawrence, Ks
http://ericholscher.com
er...@ericholscher.com

james...@gmail.com

unread,
Feb 11, 2009, 5:59:03 PM2/11/09
to Django developers
This is a great solution; when I wrote this post I was sure no one had
really run into the problem. I will use this for serializing my DB in
the future. Though, the last paragraph of your reply states that
content type doesn't define app_label and model as unique. I believe
that this is true now, at least mine appears to have a unique
constraint on it right away.

This is still a legitimate issue during serialization, it's great to
see someone has made steps in the right direction.

On Feb 11, 3:45 pm, Eric Holscher <eric.holsc...@gmail.com> wrote:
> On Wed, Feb 11, 2009 at 1:48 PM, jameslon...@gmail.com <
> (for content types). The code is here:http://dpaste.com/119487/. It is
> implemented as a serializer, which you would plug into django, and then use
> for serialization and deserialization of models with content types.
>
> It is rather simple (only about 10 lines of additional code). When it is
> dumping data, it checks to see if the field it is dumping is a content type,
> and if so, it dumps a dictionary of app_label and model. Then, when this
> fixture is loaded back in, it runs a query against the Content Types for
> that object. Then plugs that in for the content type.
>
> This fixes the problem of content types being an ID, and the ID's not
> matching when you move across databases (Your try #2).
>
> I have also been working on a more generic solution to this problem. I have
> a copy of it on github(1). The approach taken there is similar. When it
> loads a ForeignKey field to be serialized, it checks the related model (the
> one being pointed to) for any unique constraints. If any of these exist,
> then the model is dumped as a dictionary of kwargs containing the key/value
> pair for these unique constraints.
>
> The content type model doesn't define app_label and model as unique, which
> is a problem for this approach. If this ever gets into django core, it's
> going to require a special case for content type things (or some other
> approach which I haven't thought of). Having references to contrib apps is
> frowned upon, so I think having a third party serializer that does this is
> the answer for now.
>
> Hope this helps
>
> 1.http://github.com/ericholscher/sandbox/blob/d32da8c36f257bb973a5c0b0f...
>
> --
> Eric Holscher
> Web Developer at The World Company in Lawrence, Kshttp://ericholscher.com
> e...@ericholscher.com

Russell Keith-Magee

unread,
Feb 11, 2009, 6:56:54 PM2/11/09
to django-d...@googlegroups.com
On Thu, Feb 12, 2009 at 4:48 AM, james...@gmail.com
<james...@gmail.com> wrote:
>
> There is a small road block that makes contenttype a little dangerous
> to use during application development. Especially in regards to
> serializing your data to different databases. During syncdb the
> contenttypes are generated in a way that makes regeneration at a later
> date inconsistent with the previously generated primary keys.
>
> The contenttype IDs can be different depending on when your syncdb was
> run in the development of your application. In addition, loaddata and
> dumpdata are prevented from working correctly if the contenttypes have
> already been created (integrity errors).

This is a well known, well understood problem with at least one
solution that has been designed, but not implemented:

http://code.djangoproject.com/ticket/7052

In short, the solution I have historically preferred is to modify the
serialization language to allow queries to take the place of literal
primary keys - that way, you can ask in a fixture for "the article
content type", rather than content type 37.

However, I am open to other suggestions. Eric Holscher has been
working in this area recently, and he has made some interesting
progress with some slightly different approaches.

> Possible solution that doesn't suck a lot:
> I came up with quite a few different ways to handle this, but the best
> so far (even thought it's not stellar) is to create a new column in
> contenttypes that's a combined column. The combined column would
> contain the app_label and model_name.

This has been proposed in the past, but is problematic because it is
backwards incompatible. There is a very large existing codebase that
uses the current implementation of ContentType; changing this model
would be non-trivial.

Yours,
Russ Magee %-)

Eric Holscher

unread,
Feb 11, 2009, 7:13:47 PM2/11/09
to django-d...@googlegroups.com
On Wed, Feb 11, 2009 at 4:59 PM, james...@gmail.com <james...@gmail.com> wrote:

This is a great solution; when I wrote this post I was sure no one had
really run into the problem. I will use this for serializing my DB in
the future. Though, the last paragraph of your reply states that
content type doesn't define app_label and model as unique. I believe
that this is true now, at least mine appears to have a unique
constraint on it right away.

Ah yes, you're right. I was looking at the model definition, but they're declared unique in the Meta unique_together. 


This is still a legitimate issue during serialization, it's great to
see someone has made steps in the right direction.

Glad it's been helpful. I want to get this into a more generic solution, and hopefully get part of it into django or a real third party app.
 
er...@ericholscher.com
Reply all
Reply to author
Forward
0 new messages