Targetting SQLite versions

14 views
Skip to first unread message

Malcolm Tredinnick

unread,
May 15, 2006, 6:43:23 AM5/15/06
to django-d...@googlegroups.com
Michael Radjiez quite reasonably raised a problem with a recent change I
made to fix a bug with QuerySet.count() (it was previously ignoring any
"distinct" modifiers). The problem is that if you are using a version of
SQLite prior to version 3.2.6 (released in September 2005),
QuerySet.distinct().count() generates a query that is not understood by
your version of SQLite. :-(

So what is our policy on working around arguably broken databases? This
is a little trickier than "some MySQL installs don't do transactions",
since it isn't just ignored -- it crashes Django.

A couple of solutions (from my favourite to absolutely least
favourite):

(1) If you are using a sufficiently old version of SQLite, the SQL
produced by distinct().count() does not respect the "distinct" modifier.

GOOD:
- Code runs to completion. You always get an answer. It is, in
fact, the same as you got with Django last week. If you really
need distinct counting, upgrade SQLite.

- Analogous to the way we treat versions of MySQL without
transactions (and probably a reasonable approach).

BAD:
- You get different answers to other people in some queries. And
your answers are wrong (see the final point for why this might
be a problem).

- The tests are going to be a real pain to make work
conditionally like this (again, maybe the MYSql-transaction
approach might be reasonable: the test will always fail and you
will know it's because of your SQLite?!)

(2) Inside the SQLite backend, try to work out if the version is
sufficiently old and do the "count(distinct(..))" bit by hand by doing a
"SELECT DISTINCT..." and then counting the number of results (sadly, in
pysqlite2, cursor.rowcount is always -1 after a query, so can't cheat).

GOOD:
- you get the same (right) answer under all versions and so your
applications work the same, no matter which database is
installed.

BAD:
- it's going to be a little inefficient and very memory hungry
if you have a "bad" version of SQLite. Counting the results is
going to require calling len() on the queryset (or equivalent).
Very large result sets on a shared server will run out of
memory.

(3) We don't support QuerySet.distinct().count() on older versions of
SQLite. We can make the tests conditional, so that if you get the
"unsupported" error, it just informs you (raises an exception?). We
might even be able to make the tests always pass.

GOOD:
- no more accidental huge memory consumption. You are explicitly
told when you're in trouble.

- upgrading SQLite if you want things to work properly is really
not that big a deal (as compared to, say, PostgreSQL or MySQL),
since it's such a tiny, binary (36K when stripped on my machine)
linked to some very common libraries. Even a static version is
pretty small. It was originally designed to be built into
distributed packaged. So we are still not making it that hard
for people to use SQLite + Django.

BAD:
- if you have an older version of SQLite and want to run an app
that uses distinct().count(), you have to do some sysadmin work
(that is why the first version may be better -- at least the
code runs, if not necessarily very efficiently).

(4) We back out this change.

GOOD:
- Works the same for everybody again.

BAD:
- QuerySets now give wrong results and behave
counter-intuitively.

- Some not uncommon situations, such as pagination, become
problematic without manually coding SQL. A typical example is as
in the test suite: articles can be assigned to multiple
publications. You want to display all articles from a set of
publications. So you need distinct(). Trying to work out the
pagination, you obviously need distinct().count(), since count()
over-counts the number of articles (every article that is in
multiple magazines is counted multiple times) and would result
in page breaks coming too early.

Cheers,
Malcolm

Malcolm Tredinnick

unread,
May 15, 2006, 7:16:51 AM5/15/06
to django-d...@googlegroups.com
On Mon, 2006-05-15 at 20:43 +1000, Malcolm Tredinnick wrote:
> Michael Radjiez quite reasonably raised a problem with a recent change I
> made to fix a bug with QuerySet.count() (it was previously ignoring any
> "distinct" modifiers). The problem is that if you are using a version of
> SQLite prior to version 3.2.6 (released in September 2005),
> QuerySet.distinct().count() generates a query that is not understood by
> your version of SQLite. :-(

I may have omitted to be clear about the details here: in versions prior
to 3.2.6,

SELECT DISTINCT ...

and

SELECT COUNT(...) ...

both work. But

SELECT COUNT(DISTINCT(...)) ...

is considered illegal SQL. COUNT() was enhanced in 3.2.6 to support the
DISTINCT function.

Malcolm


Michael Radziej

unread,
May 15, 2006, 7:39:12 AM5/15/06
to django-d...@googlegroups.com
Hey Malcolm,

I was not aware that this behaviour changed between minor versions :-(
You're right: after updating from 3.2.1 to 3.2.8 no problems.

Ubuntu breeze currently provides 3.2.1, dapper will provide 3.2.8.

Malcolm Tredinnick wrote:
> (1) If you are using a sufficiently old version of SQLite, the SQL
> produced by distinct().count() does not respect the "distinct" modifier.

This will make finding related bugs really hard. There shouldn't be
hidden surprises when you switch from one database to another.


> (2) Inside the SQLite backend, try to work out if the version is
> sufficiently old and do the "count(distinct(..))" bit by hand by doing a
> "SELECT DISTINCT..." and then counting the number of results (sadly, in
> pysqlite2, cursor.rowcount is always -1 after a query, so can't cheat).

Hmm. Somehow I don't like that the behaviour should change like this,
and I also don't like Django fixing things that belong into the
database. There should be at least a warning on startup.

> (3) We don't support QuerySet.distinct().count() on older versions of
> SQLite. We can make the tests conditional, so that if you get the
> "unsupported" error, it just informs you (raises an exception?). We
> might even be able to make the tests always pass.

I'm all for this. I would like to see a test on startup within the
sqlite3 backend if count(disctinct(...)) is supported, and give a red
warning message, like

"Your database version does not provide the functionality to execute
count(disctinct(...)). If you use sqlite, updating to ... should help.
If you don't use count().distinct() in your code, there should be no
further problems."

Tests should fail, since the db does not provide the functionality, but
there might be a more helpful message, could be the same as above.

> (4) We back out this change.

Hmm. I personally don't like this. If this policy of the least common
denominator was applied to all databases, there wouldn't be transaction
control.

Just my 2c,

Michael

Reply all
Reply to author
Forward
0 new messages