Searching the tracker reveals:
http://code.djangoproject.com/ticket/5725
I am not sure the situation is quite as complicated as it is thought to be in that ticket. Discussion in the ticket seems to think that the "right" answer to return is going to be dependent on the table charset (actually, it would need to be column, since the charset can be set per-column).
However, near as I can tell from a couple of brief experiments, that's not the case. For a latin1 encoded table with varchar(50) column the value determined by inspectdb is 150. Similarly, for a utf-8 encoded table with a varchar(32) column, the value determined by inspectdb is 96. In both cases the value determined by inspectdb is 3x higher than the actual number of characters that can be stored in the column, no matter the cholumn's charset.
Where is the 3x factor coming from? In the ticket it is mentioned that it's related to the connection charset being utf-8. Switch the connection charset to latin1, and the numbers get reported properly (at least for latin1-encoded tables).
The number in question here is the internal_size element of the description returned by the connection cursor. The value is defined by the Python DB API (
http://www.python.org/dev/peps/pep-0249/) but I can find no good description of what it is supposed to be, exactly.
MySQLdb (or underlying code it is using) appears to be implementing this as the maximum number of bytes that may be needed to hold a value returned from this column on this connection. That is, since on the DB side the length specification (since MySQL 4.1) describes the number of characters that may be stored in the column, and since a character may require as many as 3 bytes in utf-8 encoding (MySQL does not implement 4-byte utf-8 support), some code somewhere is taking the max-length-in-characters value and multiplying it by 3 to come up with a maximum number of bytes that may be required to store a value from this column in the connection's charset.
Since Django is always going to set the connection charset to utf-8, and since inspectdb should be reporting character lengths, not byte lengths, it might be sufficient to take the internal_size value and divide by 3 to get character length values. That might work so long as the underlying value returned by MySQLdb doesn't change, yet this page:
http://benanne.net/code/?p=352states the value returned is wrong and there's a fix (without giving any details on how it is wrong, what the fix is, nor when it might appear in a release of MySQLdb). And I don't have any more time to play with investigation on this...but if different versions of MySQLdb are going to be reporting different values here then fixing this in Django will be a it more complicated than unconditionally dividing by 3....though still not quite as bad as thought to be in the ticket, I don't think.
Karen