Non-primary key AutoField

2,863 views
Skip to first unread message

NewSpire

unread,
Feb 28, 2009, 11:27:58 PM2/28/09
to Django developers
In ticket #8576 a request was made to remove the restriction that
AutoFields must be primary keys so that multiple AutoFields could be
used in a model. In MySQL multiple auto-increment fields does not
make sense and probably does not make sense in any other database.
However, I do think the case can be made to remove the primary key
restriction on AutoFields.

First, this restriction is not there in the database. Auto-increment
fields do not have to be primary keys. Imposing this restriction
could cause a problem when trying to use legacy tables in a Django
environment. Is there a compelling reason for Django to impose this
restriction?

Second, is the use case where the primary key is not an AutoField but
the record insert order needs to be tracked. For instance, if the
primary key is a phone number or UUIDField it would be useful to also
have an AutoField so that records can be sorted in the order they were
inserted into the table. This is a real world problem I am faced with
right now.

Thanks!
Andy



Alex Myodov

unread,
Mar 1, 2009, 3:37:27 AM3/1/09
to Django developers
As a person that raised the problem initially, I can only add several
words to the current discussion thread on the issue: kmtracey
mentioned that "as I can find no mention of AutoField on the
referenced IRC page, so I'll just put it back to DDN", so, in fact my
explanation of the use case is present at the page
http://botland.oebfare.com/logger/django/2008/8/26/1/

Alex Myodov

unread,
Mar 1, 2009, 7:41:31 AM3/1/09
to Django developers
Well, according to "If there's some really strong use-case that's been
overlooked here, it can be brought up in a thread on django-dev", I
think I need to explain the feature a bit better.

Why we may ever need to have autofields being non-primary-keys?
To be able to use several autofields in a single model.

Why this?
To be able to have several conveniently auto-maintainable order
relations for the model instances.

When this may be needed? Why everything should not be derived from a
single AutoField?
Because sometimes we would like to be able to adjust the order
relations manually; therefore, in some cases the autofields will be
manually assigned!
Can the second order relation be derived from the first one inside the
model, like, in model's save() method? Not always; sometimes the logic
controlling these two order relations is external to the model.


We now have two different real-world use cases when this may be
needed. Or at least, when having this restriction removed would allow
to implement simpler solutions than with the restriction enabled.

One use case is mine; I have a model which was shared between two
different applications (and in fact, even between two different
sites). In my case, I have two AutoFields in a model, and one of them,
"id", was acting as a primary key; the other, "per_hosting_id", was
not, though!
Why? Because... seemingly, the values of both fields should always be
equal. But they were not. We can assign an arbitrary value to one of
the fields, and after that they start to differ from each other. This
is what should have happened in my applications: in one of the
applications (being set up on several hosts), both of the AutoFields
were acting automatically, being a unique primary key (id) and a clone
of a primary key (per_hosting_id); but then, from each of the
hostings, the data was synchronized to the second application. And on
the second application, only one autofield (id) was allowed to act
automatically; the value for the second autofield (per_hosting_id) was
generated manually - in my case, the generation formula was
(hosting_code * 0x1000000 + incoming_per_hosting_id).

That is, in short: in my case I was having a model which had two
AutoFields, and this model was shared between two applications, and
the two applications had different logic of autofield auto-assigning.
Seems like a pretty valid case.



The second use case is a NewSpire's one. I cannot fully agree with his
decision in "For instance, if the primary key is a phone number or
UUIDField it would be useful to also have an AutoField so that records
can be sorted in the order they were inserted into the table." As for
me, I would still made the AutoField a primary key, and just put a
"unique" requirement on the phone number/uuid field. But this made me
think of slightly different and more general use case when the two
AutoFields in a form would be useful.
Yes, this is again related to the situation when sometimes the auto
fields are assigned manually. But now, there is a different purpose
for it.
Autofields are good to specify some default ordering inside the table,
like "the order they were inserted into the table". But they also
should act as a primary key into the table, what may contradict to the
ordering functionality. What if we would ever want to adjust the
ordering manually, with the buttons like "move earlier"/"move later"?
Adjusting the primary key value directly is wrong, cause it may be
referred by some foreign keys, so we must adjust a clone of it -
obviously, some second autofield. Non-adjusted, it will defined the
"default ordering", which is equal to the ordering how the instances
were added.

That is the second use case. Hope my explanation is sufficient for the
measured decision. And hope I am not too garrulous.

On Mar 1, 7:27 am, NewSpire <newsp...@gmail.com> wrote:

Karen Tracey

unread,
Mar 1, 2009, 7:57:10 AM3/1/09
to django-d...@googlegroups.com
On Sun, Mar 1, 2009 at 7:41 AM, Alex Myodov <amy...@gmail.com> wrote:

Well, according to "If there's some really strong use-case that's been
overlooked here, it can be brought up in a thread on django-dev", I
think I need to explain the feature a bit better.

Why we may ever need to have autofields being non-primary-keys?
To be able to use several autofields in a single model.

I don't have time to read this right now because I am on my out the door, but another thing beyond a use case needed to get this reconsidered is a way to actually make it work.  The initial description contended that all that was needed was removal of two asserts in the current code.  That is not true, at least for a few of the backends.  MySQL, if I recall correctly, flat-out will not support this at all.  I dislike the idea of adding any (more) behavior that works differently across backends, so it would be nice to hear that I'm wrong on that, but I haven't heard it yet.  PostgresSQL and Oralce didn't work right out of the box either, though there may be other code added that could make them work.  But someone interested in making this work will need to do the legwork here to articulate what exactly would be needed to make this work across all supported backends, or at least which ones can support it and how.

Karen

NewSpire

unread,
Mar 1, 2009, 10:25:42 AM3/1/09
to Django developers
> MySQL, if I recall correctly,
> flat-out will not support this at all.

Correct! MySQL definitely does not support multiple auto-increment
fields. However, that is not what I'm talking about. I would just
like the primary key requirement removed from the AutoField. MySQL
definitely does not require that an auto-increment field be a primary
key. I've never used Postgres or Oracle, but I do not see any
indication that they require their sequence fields to be primary keys
either.

Let me shore up my use case a bit. In my application I cannot have
AutoField primary keys. My primary keys must be UUIDFields. This is
because my records are created in several satellite systems and then
later aggregated back to a central database. If I use AutoFields as
the primary key I would get key violations when the satellite tables
are aggregated into a single central table. This is because I would
have one record in every satellite system with id=1.

Andy

Alex Myodov

unread,
Mar 1, 2009, 11:05:37 AM3/1/09
to Django developers
> my records are created in several satellite systems

Ah, that is, the legacy database is involved. It makes the example
even more vivid - having a legacy primary key while still having the
convenience of a non-primary autofield.

I fully agree that we better should NOT add any backend-dependent
behaviour. This is not what the ticket suggests, in fact. The ticket
is about the unnecessary (in several legal cases) restriction, which
is not covered by the documentation yet (or at least was not, when the
ticket was raised) at all. So the idea is not to add any backend-
specificity - the idea is to remove an unnecessary (at least in some
backends) restriction instead, bringing the code back to the
documentation, and just refrain from specifying this behaviour.

NewSpire

unread,
Mar 1, 2009, 2:34:22 PM3/1/09
to Django developers
Now for some leg work.

The simple part is to remove the primary key assert in
AutoField.__init__.

The hard part is that django creates tables with multiple sql
statements. One for the table create and then one for each index.
MySQL requires that auto_increment fields must be indexed. Since
django does not include the indexes in the table create statement the
create fails with the following error.

(1075, 'Incorrect table definition; there can be only one auto column
and it must be defined as a key')

So for the MySQL backend the index creation would need to be moved to
the table create statement. I'll note that if the AutoField is
defined with unique=True the table create works. This is because the
index is created with the UNIQUE key on the field definition and not
in a separate CREATE INDEX statement.

Andy

Marty Alchin

unread,
Mar 2, 2009, 9:39:29 AM3/2/09
to django-d...@googlegroups.com
On Sat, Feb 28, 2009 at 11:27 PM, NewSpire <news...@gmail.com> wrote:
> First, this restriction is not there in the database.  Auto-increment
> fields do not have to be primary keys.  Imposing this restriction
> could cause a problem when trying to use legacy tables in a Django
> environment.  Is there a compelling reason for Django to impose this
> restriction?

Django is optimized for the common case. Sure, it may well be that
it's easy to accommodate uncommon cases as well, and when it's
reasonable, we should definitely do so, but the reason it hasn't been
up to this point is simply that not enough people need it. I highly
doubt any of the core developers were actively seeking to restrict
people from using non-primary-key auto-incrementing fields; Django was
simply written to address the vastly most common need.

> Second, is the use case where the primary key is not an AutoField but
> the record insert order needs to be tracked.  For instance, if the
> primary key is a phone number or UUIDField it would be useful to also
> have an AutoField so that records can be sorted in the order they were
> inserted into the table.  This is a real world problem I am faced with
> right now.

Call me crazy, but why would an AutoField help you with that? Since
you later explain that the records are created in satellite systems,
then aggregated, wouldn't you still run into problems ordering by
something that's generated in isolation? It sounds like what you
really want is a DateTimeField(default=datetime.datetime.now). That
way, you get orderable fields that can be combined from any number of
sources without worrying about collisions. Plus, you get the added
bonus of having a real-world value that you can use for more than just
ordering.

Mind you, I'm not trying to say that a non-pk AutoField is a bad idea
in itself. I'm just not convinced that this use case is a very good
justification.

-Gul

Ian Kelly

unread,
Mar 2, 2009, 3:25:10 PM3/2/09
to django-d...@googlegroups.com
On Sun, Mar 1, 2009 at 8:25 AM, NewSpire <news...@gmail.com> wrote:
>
>> MySQL, if I recall correctly,
>> flat-out will not support this at all.
>
> Correct!  MySQL definitely does not support multiple auto-increment
> fields.  However, that is not what I'm talking about.  I would just
> like the primary key requirement removed from the AutoField.  MySQL
> definitely does not require that an auto-increment field be a primary
> key.  I've never used Postgres or Oracle, but I do not see any
> indication that they require their sequence fields to be primary keys
> either.

The limitation in Oracle is in the backend, not in the database. To
support an AutoField, we create a sequence and a trigger, the names of
which are derived from the name of the table. To support multiple
AutoFields, the names would have to be distinct -- most likely they
would be derived from a combination of the names of the table and the
column. But for backward compatibility, the backend needs to continue
to generate the same sequence name when the model has only one
AutoField. That's not impossible, but the refactoring necessary to
meet both of those goals is not trivial.

Regards,
Ian

NewSpire

unread,
Mar 3, 2009, 11:19:32 AM3/3/09
to Django developers
> Django is optimized for the common case. Sure, it may well be that
> it's easy to accommodate uncommon cases as well, and when it's
> reasonable, we should definitely do so, but the reason it hasn't been
> up to this point is simply that not enough people need it.

I agree fully. I've only needed to do this a couple of times.

> Call me crazy, but why would an AutoField help you with that? Since
> you later explain that the records are created in satellite systems,
> then aggregated, wouldn't you still run into problems ordering by
> something that's generated in isolation? It sounds like what you
> really want is a DateTimeField(default=datetime.datetime.now). That
> way, you get orderable fields that can be combined from any number of
> sources without worrying about collisions. Plus, you get the added
> bonus of having a real-world value that you can use for more than just
> ordering.

I was hoping to spare us the gory detail.

Crazy Gul, ;-) the order of the records, in my case, is only
important in the context of a foreign in key. The records in this
table are grouped based on records in another table. I only need the
correct order for the group, not the entire table or aggregated
tables. Since entire groups are created on single satellite servers,
never spanning servers, the group order is maintained when the records
are aggregated. Using a datatime will not work if you are adding a
thousand records per second or if somebody or some thing "adjusts"
your system clock. It can happen...


Cheers,
Andy



Shai Berger

unread,
Mar 3, 2009, 2:35:36 PM3/3/09
to django-d...@googlegroups.com
on Sun, 1 Mar 2009 11:34 -0800 (PST) NewSpire wrote:

> I'll note that if the AutoField is defined with unique=True the table create
> works.

but On Mar 3, 6:19 pm, NewSpire <newsp...@gmail.com> wrote:
>
> Crazy Gul, ;-) the order of the records, in my case, is only
> important in the context of a foreign in key. The records in this
> table are grouped based on records in another table. I only need the
> correct order for the group, not the entire table or aggregated
> tables.

So -- you can't put a unique index on your auto-field.

There is one point that is unclear to me from the discussion so far -- are
your servers symmetric? Because if there are separate "sattelites"
and "aggregators", then you really have two different models: Satellites with
Auto pk=order & unique UUID, and aggregators with pk=UUID & unrestricted
order field.

Reply all
Reply to author
Forward
0 new messages