inspectdb generates CharFields with 3x max_length

118 views
Skip to first unread message

Brian Morton

unread,
Oct 9, 2009, 12:40:37 PM10/9/09
to Django users
This is a very strange problem, so I thought I would post here and see
if anyone else had seen this problem.

I introspected a MySQL database with Python2.6 using Django SVN HEAD
and it produced my models as expected. However, all CharFields have
the max_length set to 3x the actual varchar field length in the db.
For example, all char(1) or varchar(1) fields were represented with a
max_length of 3. Has anyone ever seen this issue before?

Karen Tracey

unread,
Oct 9, 2009, 2:18:51 PM10/9/09
to django...@googlegroups.com

Searching the tracker reveals:

http://code.djangoproject.com/ticket/5725

I am not sure the situation is quite as complicated as it is thought to be in that ticket.  Discussion in the ticket seems to think that the "right" answer to return is going to be dependent on the table charset (actually, it would need to be column, since the charset can be set per-column). 

However, near as I can tell from a couple of brief experiments, that's not the case.  For a latin1 encoded table with varchar(50) column the value determined by inspectdb is 150.  Similarly, for a utf-8 encoded table with a varchar(32) column, the value determined by inspectdb is 96.  In both cases the value determined by inspectdb is 3x higher than the actual number of characters that can be stored in the column, no matter the cholumn's charset.

Where is the 3x factor coming from?  In the ticket it is mentioned that it's related to the connection charset being utf-8.  Switch the connection charset to latin1, and the numbers get reported properly (at least for latin1-encoded tables).

The number in question here is the internal_size element of the description returned by the connection cursor.  The value is defined by the Python DB API (http://www.python.org/dev/peps/pep-0249/) but I can find no good description of what it is supposed to be, exactly. 

MySQLdb (or underlying code it is using) appears to be implementing this as the maximum number of bytes that may be needed to hold a value returned from this column on this connection.  That is, since on the DB side the length specification (since MySQL 4.1) describes the number of characters that may be stored in the column, and since a character may require as many as 3 bytes in utf-8 encoding (MySQL does not implement 4-byte utf-8 support), some code somewhere is taking the max-length-in-characters value and multiplying it by 3 to come up with a maximum number of bytes that may be required to store a value from this column in the connection's charset.

Since Django is always going to set the connection charset to utf-8, and since inspectdb should be reporting character lengths, not byte lengths, it might be sufficient to take the internal_size value and divide by 3 to get character length values.  That might work so long as the underlying value returned by MySQLdb doesn't change, yet this page:

http://benanne.net/code/?p=352

states the value returned is wrong and there's a fix (without giving any details on how it is wrong, what the fix is, nor when it might appear in a release of MySQLdb).  And I don't have any more time to play with investigation on this...but if different versions of MySQLdb are going to be reporting different values here then fixing this in Django will be a it more complicated than unconditionally dividing by 3....though still not quite as bad as thought to be in the ticket, I don't think.

Karen

Brian Morton

unread,
Oct 9, 2009, 2:29:20 PM10/9/09
to Django users
Thanks for the informative reply. I searched the tracker and didn't
find anything about it, but I think I was not using the right terms.

I had a feeling it had to do with charset and MySQLdb, and I
definitely agree it is not a Django thing. I'll keep my eye on
Ubuntu's changelog for MySQLdb and post here and in the ticket if I
can figure anything out.

On Oct 9, 2:18 pm, Karen Tracey <kmtra...@gmail.com> wrote:

Karen Tracey

unread,
Oct 9, 2009, 2:43:37 PM10/9/09
to django...@googlegroups.com
On Fri, Oct 9, 2009 at 2:29 PM, Brian Morton <rokcl...@gmail.com> wrote:

Thanks for the informative reply.  I searched the tracker and didn't
find anything about it, but I think I was not using the right terms.


FYI I tend to use the Search tab on code.djangoproject.com and limit the results to Tickets.  I find that easier than getting the right Custom Query under View Tickets set up.  Search will find hits in both the ticket summary and the comments, and isn't case-sensitive, it's much more likely to find hits than the Custom Query option.  This search:

http://code.djangoproject.com/search?q=MySQL+inspectdb+length&noquickjump=1&ticket=on

is how I found the ticket for this issue.

Karen
Reply all
Reply to author
Forward
0 new messages