Opinions sought about PostgreSQL type casting

18 views
Skip to first unread message

Malcolm Tredinnick

unread,
Mar 21, 2008, 8:27:36 AM3/21/08
to django-d...@googlegroups.com
Ticket #6605 is causing me some grief. Actually, I know exactly what I
want to do with it (invalid/wontfix), but I suspect there will be
complaints, so I'd like to see what the consensus is. Originally, I
though it might be easily fixable, but the performance implications are
ugly.

The issue is that PostgreSQL 8.3 no longer automatically casts
non-character-type fields to text before using them in comparisons. This
is probably a Good Thing. However, some people were relying on the fact
that older versions did implicitly cast to TEXT before doing things like
"LIKE" comparisons. That ticket is asking that the feature be support in
Django for all version of PostgreSQL.

My opinion is that if your database stores numbers and you always want
them to be automatically treated as characters, your data model is
broken as designed. You should have been using a VARCHAR field in the
first place. If you need to sometimes treat things as text, you should
explicitly cast them (which means using a custom Django field).

The practical consideration here is that adding this extra processing
into lookup_type computations is not free. Every lookup for those
particular types of comparisons needs to check the database version (or
be an extra function call into the DatabaseOperations class) and then
work out the type of the value (or field, which is information we don't
currently pass down that level) and *maybe* do the case. I'm far from
convinced we should be paying this penalty on every single
comparison-like lookup just for the relatively few who need this and
don't want to write a custom field for it. We don't just want to cast
every field for those lookup types, because that has an overhead at the
database level.

So obviously I don't like this idea (it started out as me feeling merely
uncomfortable, but now that I've tried to implement it, I actively
dislike the idea). If we just say "you're lucky it worked up to now and
that was unintended" it isn't a showstopper for anybody. A custom model
field can happily define its own db_prep_lookup function to handle this.
So I feel this isn't necessarily our problem and the costs to many
outweigh the small benefits of the few.

Thoughts?

Malcolm

--
Why be difficult when, with a little bit of effort, you could be
impossible.
http://www.pointy-stick.com/blog/

Mike H

unread,
Mar 21, 2008, 9:23:26 AM3/21/08
to django-d...@googlegroups.com

On 21 Mar 2008, at 12:27, Malcolm Tredinnick wrote:
> My opinion is that if your database stores numbers and you always want
> them to be automatically treated as characters, your data model is
> broken as designed. You should have been using a VARCHAR field in the
> first place. If you need to sometimes treat things as text, you should
> explicitly cast them (which means using a custom Django field).


My opinion, for what it's worth, is the same. In the case given in the
ticket, I'd have a CharField with unique=True. Depending on how many
records you have, casting to a varchar and using LIKE is going to be
horribly slow anyway. I'd much prefer that if I tried to do a
substring search on an int field, I got an error so I could adjust my
model, no matter what database backend I was using. Although that's
probably an even more controversial thing to say.

Mike

Russell Keith-Magee

unread,
Mar 27, 2008, 6:16:23 AM3/27/08
to django-d...@googlegroups.com
On Fri, Mar 21, 2008 at 9:27 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:

Apologies for taking so long to get back to you on this one - it's
been sitting in my "I really must look into that" pile over Easter.

> My opinion is that if your database stores numbers and you always want
> them to be automatically treated as characters, your data model is
> broken as designed. You should have been using a VARCHAR field in the
> first place. If you need to sometimes treat things as text, you should
> explicitly cast them (which means using a custom Django field).

I agree. A string is a string, a number is a number. If you need a
number to be treated like a string, it isn't a number - it's a string.
I don't see any value in encouraging suspect design decisions,
especially if it introduces implementation headaches.

Yours,
Russ Magee

Karen Tracey

unread,
Mar 27, 2008, 5:28:29 PM3/27/08
to django-d...@googlegroups.com
On Fri, Mar 21, 2008 at 8:27 AM, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:

Ticket #6605 is causing me some grief. Actually, I know exactly what I
want to do with it (invalid/wontfix), but I suspect there will be
complaints, so I'd like to see what the consensus is. Originally, I
though it might be easily fixable, but the performance implications are
ugly.

The issue is that PostgreSQL 8.3 no longer automatically casts
non-character-type fields to text before using them in comparisons. This
is probably a Good Thing. However, some people were relying on the fact
that older versions did implicitly cast to TEXT before doing things like
"LIKE" comparisons. That ticket is asking that the feature be support in
Django for all version of PostgreSQL.

It seems there is some code relying on this behavior in Django: ticket #6523 (http://code.djangoproject.com/ticket/6523) is a result of this PostgreSQL change as well.  The code that's trying to do LIKE comparisons between integers is Django's old manipulator code, specifically the code that validates unique_together: http://code.djangoproject.com/browser/django/trunk/django/db/models/manipulators.py#L280

(From a brief scan of that file the manipulator_validator_unique_for_date may be doing the same thing.)

From the discussion here I'm guessing the patch proposed in #6523 (change mapping for iexact on PostgreSQL from 'ILIKE %s' to '::text ILIKE %s' isn't the way to go?  An alternative approach would be to change the manipulator code to use exact instead of iexact -- it isn't clear to me why it is doing case-insensitive comparisons here?  I know the old manipulator code is on its way out, but there are 3-4 people who have hit this and commented in the ticket, so I thought it was worth mentioning. 

Karen

Karen Tracey

unread,
Mar 27, 2008, 6:03:10 PM3/27/08
to django-d...@googlegroups.com

OK, I see there is already a ticket on this manipulator using exact instead of iexact: #4562.  From the comments it sounds like that may cause a problem with the default (case-insensitive) collation on MySQL.  So admin won't catch what MySQL will consider a duplicate, and the user will see an IntegrityError or server error instead of a nice admin message.  So neither iexact nor exact work perfectly for everyone.

Sorry, didn't mean to hijack this thread and go off on a tangent.  FWIW I agree in principle that if you've got code comparing apples to oranges it ought to be responsible for ensuring they are comparable.  But I wonder if there isn't other code like this in the manipulator that was relying on the old behavior without even realizing it.

Karen

Reply all
Reply to author
Forward
0 new messages