I'm posting this here, not as a ticket, as I'm not entirely sure it's
Django's problem, and so would like some input*.
[Base]DatabaseCreation has a method sql_indexes_for_field, which,
handily, returns the SQL indexes for a field. Less usefully, both
PostgreSQL and MySQL have limits on the length of table/index names (63
and 50 bytes by default, respectively).
Firstly, this means that if you have a very long model (table) name
(say, 64 characters), Django's syncdb thinks it's not in the database
(since your database has truncated the table name, and they now don't
match), and tries to add it every time you run syncdb. This is an issue
to itself, and should probably have either a more useful error or be
fixed, although it's not easy to detect this has happened.
Secondly, if you have an indexed field on that same model, or two
indexed fields with common prefixes on a model with a slightly shorter
name, the method will happily spit out SQL with index names that are
identical to the table/other index name when truncated to 63 characters,
meaning the indices aren't added.
The reason I came across this is because a South user has filed a
similar bug against South that I traced back to this behaviour (South
uses sql_indexes_for_field, as it means special GeoDjango indexes get
created as well). I have, however, replicated the bug with plain syncdb,
using the following Django model:
class
VeryLongModelNameThatHappensToBeOverSixtyThreeCharsLongEventually(models.Model):
my_slightly_less_long_field_name = models.IntegerField(db_index=True)
my_slightly_more_long_field_name = models.IntegerField(db_index=True)
The question is, is it even worth fixing? I'm tempted to conclude that
you're limited to shorter model/field names by your database (or use
db_table/db_column), but there's also the possibility of that method
using a much shorter way of generating the names (i.e. using the first
40 bytes or so, then appending an 8-byte hash of the rest). I can easily
write and submit a patch to change the function to use this behaviour -
we used to do it in South, before I started using the ``creation``
methods - but it's catering to such a small audience I'm not sure it's
worth it, and documenting this behaviour might be a better solution.
Andrew
* Providing it doesn't significantly impact fixing of actual Django
bugs, that affect more than 0.01% of the users.
At first glance, my preferred solution is:
1. Document (probably in the database notes) the length constraints
for table names and how they affect index names.
2. Maybe a bit of a pony: have the backends know the maximum table
length for the DBs they handle. Then we can have syncdb and other
tools raise model-class validation errors and tell people to use
db_table to avoid the issue.
As for timing: I don't think this is critical enough for 1.2, but it's
something we should do and probably could go immediately on the 1.3
milestone.
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
We have an incentive to fix it too -- #12977 points out that Django's
own test suite runs foul of this problem.
Just documenting the problem isn't really a solution, IMHO. MySQL is
the fly in the ointment here, because it doesn't raise errors for a
long index name: it raises a warning and truncates the name. If you
have two similarly named long fields... hilarity ensues. At the very
least, we should be raising validation errors. Preferably, this would
be at a consistent length (rather than a backend dependent length) so
that you don't accidentally write a model that is fine under Postgres,
but can't be synchronized under MySQL.
However, I'm not sure whether truncation+hashing would improve things.
Hashing has the advantage of allowing any name you want, but it does
result in some less-than-pretty database names. It also means
introducing a change in naming policy, and putting on my "hey, I used
to run a migration project too" hat -- changes in naming policy in the
base project were really annoying to track and handle.
If we do go with the validation approach, I would make one additional
suggestion. You can manually specify database table names, but you
can't (currently) manually specify index names. Given that long index
names are the bigger problem (since they are composed from database
table and column names), we might also need to introduce a syntax to
allow manual index naming. For example, we could allow
db_index='foobar' to manually specify an index name; db_index=True
would continue to use the default generated name.
> As for timing: I don't think this is critical enough for 1.2, but it's
> something we should do and probably could go immediately on the 1.3
> milestone.
Agreed.
Yours,
Russ Magee %-)
Ah, that's not good.
> Just documenting the problem isn't really a solution, IMHO. MySQL is
> the fly in the ointment here, because it doesn't raise errors for a
> long index name: it raises a warning and truncates the name.
From my experiments, PostgreSQL is even worse; it will silently
truncate the name without even raising a warning. You only know it's
happened when you get to "relation already exists".
> If you
> have two similarly named long fields... hilarity ensues. At the very
> least, we should be raising validation errors. Preferably, this would
> be at a consistent length (rather than a backend dependent length) so
> that you don't accidentally write a model that is fine under Postgres,
> but can't be synchronized under MySQL.
>
> However, I'm not sure whether truncation+hashing would improve things.
> Hashing has the advantage of allowing any name you want, but it does
> result in some less-than-pretty database names. It also means
> introducing a change in naming policy, and putting on my "hey, I used
> to run a migration project too" hat -- changes in naming policy in the
> base project were really annoying to track and handle.
>
I don't particularly care about naming policy; South looks at the
database's meta info to work out what constraints there are on any given
column, it doesn't try and work out possible names, and anyone that does
otherwise should really steal my code!
> If we do go with the validation approach, I would make one additional
> suggestion. You can manually specify database table names, but you
> can't (currently) manually specify index names. Given that long index
> names are the bigger problem (since they are composed from database
> table and column names), we might also need to introduce a syntax to
> allow manual index naming. For example, we could allow
> db_index='foobar' to manually specify an index name; db_index=True
> would continue to use the default generated name.
>
That sounds like a good idea to me; the amount of people this affects is
so small, adding extra parameters probably won't be annoying for most
people.
>> As for timing: I don't think this is critical enough for 1.2, but it's
>> something we should do and probably could go immediately on the 1.3
>> milestone.
>>
I agree as well; I'll be happy to write the patch when we get around to
1.3 coding time.
Andrew