Problems with DatabaseCreation, table names, and index names

47 views
Skip to first unread message

Andrew Godwin

unread,
Mar 10, 2010, 5:55:09 PM3/10/10
to Django Developers
Hi all,

I'm posting this here, not as a ticket, as I'm not entirely sure it's
Django's problem, and so would like some input*.

[Base]DatabaseCreation has a method sql_indexes_for_field, which,
handily, returns the SQL indexes for a field. Less usefully, both
PostgreSQL and MySQL have limits on the length of table/index names (63
and 50 bytes by default, respectively).

Firstly, this means that if you have a very long model (table) name
(say, 64 characters), Django's syncdb thinks it's not in the database
(since your database has truncated the table name, and they now don't
match), and tries to add it every time you run syncdb. This is an issue
to itself, and should probably have either a more useful error or be
fixed, although it's not easy to detect this has happened.


Secondly, if you have an indexed field on that same model, or two
indexed fields with common prefixes on a model with a slightly shorter
name, the method will happily spit out SQL with index names that are
identical to the table/other index name when truncated to 63 characters,
meaning the indices aren't added.

The reason I came across this is because a South user has filed a
similar bug against South that I traced back to this behaviour (South
uses sql_indexes_for_field, as it means special GeoDjango indexes get
created as well). I have, however, replicated the bug with plain syncdb,
using the following Django model:

class
VeryLongModelNameThatHappensToBeOverSixtyThreeCharsLongEventually(models.Model):

my_slightly_less_long_field_name = models.IntegerField(db_index=True)
my_slightly_more_long_field_name = models.IntegerField(db_index=True)

The question is, is it even worth fixing? I'm tempted to conclude that
you're limited to shorter model/field names by your database (or use
db_table/db_column), but there's also the possibility of that method
using a much shorter way of generating the names (i.e. using the first
40 bytes or so, then appending an 8-byte hash of the rest). I can easily
write and submit a patch to change the function to use this behaviour -
we used to do it in South, before I started using the ``creation``
methods - but it's catering to such a small audience I'm not sure it's
worth it, and documenting this behaviour might be a better solution.

Andrew

* Providing it doesn't significantly impact fixing of actual Django
bugs, that affect more than 0.01% of the users.

James Bennett

unread,
Mar 10, 2010, 6:13:03 PM3/10/10
to django-d...@googlegroups.com
On Wed, Mar 10, 2010 at 4:55 PM, Andrew Godwin <and...@aeracode.org> wrote:
> The question is, is it even worth fixing? I'm tempted to conclude that
> you're limited to shorter model/field names by your database (or use
> db_table/db_column), but there's also the possibility of that method using a
> much shorter way of generating the names (i.e. using the first 40 bytes or
> so, then appending an 8-byte hash of the rest). I can easily write and
> submit a patch to change the function to use this behaviour - we used to do
> it in South, before I started using the ``creation`` methods - but it's
> catering to such a small audience I'm not sure it's worth it, and
> documenting this behaviour might be a better solution.

At first glance, my preferred solution is:

1. Document (probably in the database notes) the length constraints
for table names and how they affect index names.

2. Maybe a bit of a pony: have the backends know the maximum table
length for the DBs they handle. Then we can have syncdb and other
tools raise model-class validation errors and tell people to use
db_table to avoid the issue.

As for timing: I don't think this is critical enough for 1.2, but it's
something we should do and probably could go immediately on the 1.3
milestone.


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Russell Keith-Magee

unread,
Mar 10, 2010, 8:05:54 PM3/10/10
to django-d...@googlegroups.com
On Thu, Mar 11, 2010 at 7:13 AM, James Bennett <ubern...@gmail.com> wrote:
> On Wed, Mar 10, 2010 at 4:55 PM, Andrew Godwin <and...@aeracode.org> wrote:
>> The question is, is it even worth fixing? I'm tempted to conclude that
>> you're limited to shorter model/field names by your database (or use
>> db_table/db_column), but there's also the possibility of that method using a
>> much shorter way of generating the names (i.e. using the first 40 bytes or
>> so, then appending an 8-byte hash of the rest). I can easily write and
>> submit a patch to change the function to use this behaviour - we used to do
>> it in South, before I started using the ``creation`` methods - but it's
>> catering to such a small audience I'm not sure it's worth it, and
>> documenting this behaviour might be a better solution.
>
> At first glance, my preferred solution is:
>
> 1. Document (probably in the database notes) the length constraints
> for table names and how they affect index names.
>
> 2. Maybe a bit of a pony: have the backends know the maximum table
> length for the DBs they handle. Then we can have syncdb and other
> tools raise model-class validation errors and tell people to use
> db_table to avoid the issue.

We have an incentive to fix it too -- #12977 points out that Django's
own test suite runs foul of this problem.

Just documenting the problem isn't really a solution, IMHO. MySQL is
the fly in the ointment here, because it doesn't raise errors for a
long index name: it raises a warning and truncates the name. If you
have two similarly named long fields... hilarity ensues. At the very
least, we should be raising validation errors. Preferably, this would
be at a consistent length (rather than a backend dependent length) so
that you don't accidentally write a model that is fine under Postgres,
but can't be synchronized under MySQL.

However, I'm not sure whether truncation+hashing would improve things.
Hashing has the advantage of allowing any name you want, but it does
result in some less-than-pretty database names. It also means
introducing a change in naming policy, and putting on my "hey, I used
to run a migration project too" hat -- changes in naming policy in the
base project were really annoying to track and handle.

If we do go with the validation approach, I would make one additional
suggestion. You can manually specify database table names, but you
can't (currently) manually specify index names. Given that long index
names are the bigger problem (since they are composed from database
table and column names), we might also need to introduce a syntax to
allow manual index naming. For example, we could allow
db_index='foobar' to manually specify an index name; db_index=True
would continue to use the default generated name.

> As for timing: I don't think this is critical enough for 1.2, but it's
> something we should do and probably could go immediately on the 1.3
> milestone.

Agreed.

Yours,
Russ Magee %-)

Andrew Godwin

unread,
Mar 11, 2010, 3:57:22 AM3/11/10
to django-d...@googlegroups.com
On 11/03/10 01:05, Russell Keith-Magee wrote:
> We have an incentive to fix it too -- #12977 points out that Django's
> own test suite runs foul of this problem.
>

Ah, that's not good.

> Just documenting the problem isn't really a solution, IMHO. MySQL is
> the fly in the ointment here, because it doesn't raise errors for a
> long index name: it raises a warning and truncates the name.

From my experiments, PostgreSQL is even worse; it will silently
truncate the name without even raising a warning. You only know it's
happened when you get to "relation already exists".

> If you
> have two similarly named long fields... hilarity ensues. At the very
> least, we should be raising validation errors. Preferably, this would
> be at a consistent length (rather than a backend dependent length) so
> that you don't accidentally write a model that is fine under Postgres,
> but can't be synchronized under MySQL.
>
> However, I'm not sure whether truncation+hashing would improve things.
> Hashing has the advantage of allowing any name you want, but it does
> result in some less-than-pretty database names. It also means
> introducing a change in naming policy, and putting on my "hey, I used
> to run a migration project too" hat -- changes in naming policy in the
> base project were really annoying to track and handle.
>

I don't particularly care about naming policy; South looks at the
database's meta info to work out what constraints there are on any given
column, it doesn't try and work out possible names, and anyone that does
otherwise should really steal my code!

> If we do go with the validation approach, I would make one additional
> suggestion. You can manually specify database table names, but you
> can't (currently) manually specify index names. Given that long index
> names are the bigger problem (since they are composed from database
> table and column names), we might also need to introduce a syntax to
> allow manual index naming. For example, we could allow
> db_index='foobar' to manually specify an index name; db_index=True
> would continue to use the default generated name.
>

That sounds like a good idea to me; the amount of people this affects is
so small, adding extra parameters probably won't be annoying for most
people.

>> As for timing: I don't think this is critical enough for 1.2, but it's
>> something we should do and probably could go immediately on the 1.3
>> milestone.
>>

I agree as well; I'll be happy to write the patch when we get around to
1.3 coding time.

Andrew

adball

unread,
Apr 22, 2010, 6:37:35 PM4/22/10
to Django developers
Like this? It's a bit ugly with the hardcoded 50, but if the database
backends would implement the max_name_length() method in their
DatabaseWrapper classes, it could be made smarter.

Index: django/db/backends/creation.py
===================================================================
--- django/db/backends/creation.py (revision 7107)
+++ django/db/backends/creation.py (working copy)
@@ -256,8 +256,14 @@
tablespace_sql = ''
else:
tablespace_sql = ''
+
+ pristine_index_name = '%s_%s' % (model._meta.db_table,
f.column)
+ index_name = pristine_index_name
+ if len(pristine_index_name) >= 50:
+ index_name = pristine_index_name[:40] +
self._digest(pristine_index_name)
+
output = [style.SQL_KEYWORD('CREATE INDEX') + ' ' +
- style.SQL_TABLE(qn('%s_%s' % (model._meta.db_table,
f.column))) + ' ' +
+ style.SQL_TABLE(qn(index_name)) + ' ' +
style.SQL_KEYWORD('ON') + ' ' +
style.SQL_TABLE(qn(model._meta.db_table)) + ' ' +
"(%s)" % style.SQL_FIELD(qn(f.column)) +


> Andrew
>
> * Providing it doesn't significantly impact fixing of actual Django
> bugs, that affect more than 0.01% of the users.

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Reply all
Reply to author
Forward
0 new messages