MySQL index hints

Simon Litchfield

unread,

Jul 4, 2010, 1:10:24 AM7/4/10

to Django developers

For those of us using MySQL, having large tables, whilst also wanting
queryset goodness --
http://code.djangoproject.com/ticket/11003

It goes a little something like this --
Model.objects.filter(field=value).with_hints('my_index')

All those in favour say I.

Simon

Russell Keith-Magee

unread,

Jul 4, 2010, 11:01:51 AM7/4/10

to django-d...@googlegroups.com

Nay. :-)

Or, at least, not until you demonstrate how the same syntax is
interpreted in Postgres, SQLite, Oracle, any other SQL database, and
potentially on MongoDB, Cassandra, BigTable, or other noSQL datastore.

This was the biggest problem when I closed #11003 originally; it
remains the biggest problem. I'm not in favor of adding an ORM
extension that is only useful for MySQL's query optimizer. That
defeats the purpose of having a cross-database ORM syntax. If you are
at the point of optimizing your system and you discover that you have
a dataset and query that actually require database-specific fine
tuning, I don't see it as an unreasonable requirement that you crank
out some database-specific raw SQL.

Django's ORM isn't intended to be perfect for all situations. It's
intended to make the 80% case trivial so you can bootstrap a site
without having to worry about the little details. Once you've hit that
point and you're tuning performance, you're going to need to pay close
attention to the little details like exactly how your site uses SQL.
Attempting to find a way to express every possible SQL statement,
including backend-specific optimizations, into an ORM syntax is folly.
After all, we already have a syntax that is able to express every
possible SQL statement... it's called SQL :-)

That said -- I'm not fundamentally opposed to the idea of introducing
hinting tools to the ORM, as long as:
* they are demonstrably rich enough to get the job done,
* have an interpretation that is compatible with the cross-platform,
reusable-app message that underpins Django's design philosophy.

To that end, the 'with_hint()' proposal appeals to me a lot more than
the previous suggestion of Yet Another Extra Argument. However, the
patch as provided:

1) Doesn't contain tests or docs
2) Will break queries if the user includes the with_hint() clause on
any database other than MySQL
3) Even then, only supports the USE INDEX keyword, not FORCE INDEX or
IGNORE INDEX
4) Doesn't suggest how hinting strategies would be implemented on a
database other than MySQL

If you can come up with answers to these points, I might get
interested. 1 and 2 are fairly trivial; I can think of some obvious
answers for 3, but 4 is the big problem, and will require some serious
research and consideration.

I would also add that if this is a query extension that you really do
want, it's something that should be possible to add by writing a
custom database backend that extends Django's MySQL backend and
provides a custom Query base class -- and if this can't be done
relatively easily, then a refactoring of the sql.Query or SQLCompiler
interface to allow easier query rewriting by a backend *is* something
I could get behind very easily.

Yours,
Russ Magee %-)

George Vilches

unread,

Jul 4, 2010, 9:07:30 AM7/4/10

to django-d...@googlegroups.com

A very big aye from here. For a very high traffic project backed by 20+ DB servers, not being able to hint inline was the #1 reason we had to write queries by hand for the public-facing portions, and it was a noticeable percentage of our total public queries (easily >10%).

George

hinnack

unread,

Jul 5, 2010, 3:08:48 AM7/5/10

to Django developers

Thats interesting.
Can you explain, how the search keyword made it into the source?
Entry.objects.filter(headline__search="+Django -jazz Python")
SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz
Python IN BOOLEAN MODE);
Seems to be very MySQL specific...

regards

Henrik

Simon Litchfield

unread,

Jul 5, 2010, 3:59:42 AM7/5/10

to Django developers

> If you can come up with answers to these points, I might get
> interested. 1 and 2 are fairly trivial; I can think of some obvious
> answers for 3, but 4 is the big problem, and will require some serious
> research and consideration.

Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
Re 4 though, every db does it differently. In practice, most don't
need hints like MySQL does <sigh>, because their query optimisers do a
much better job.

I'd be happy to use raw(); but then you lose len(), slicing,
pagination, filter chaining, etc. My problem came about because admin
change lists were unusably slow on large tables. With_hints allowed a
simple monkey patch to admin, dropping 2s per page down to ~0.01s.

MySQL is still the most used backend AFAIK, and it's the one that
really needs the hints (poor old thing it is). Adding with_hints()
will only serve to encourage support from other backends too, where
possible.

Maybe we can make the with_hints() syntax more generic with both
common and backend-specific kwargs --

.with_hints(index='my_index') (string implies index on queryset base
model)
.with_hints(index={Model:'my_index', RelatedModel:'index2'})
(dictionary allows defining index on a per-model basis)

So the Oracle backend, for example, could implement --

.with_hints(hash=True, index={Model:'my_index',
RelatedModel:'index2'}) (

On Oracle there are dozens of possible hints, so I'd say unsupported
kwargs could simply be ignored. This would ensure seamless database
portability.

Simon Litchfield

unread,

Jul 5, 2010, 4:01:23 AM7/5/10

to Django developers

On Jul 5, 5:08 pm, hinnack <henrik.gens...@miadi.net> wrote:
> Thats interesting.
> Can you explain, how the search keyword made it into the source?
> Entry.objects.filter(headline__search="+Django -jazz Python")
> SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz
> Python IN BOOLEAN MODE);
> Seems to be very MySQL specific...

Yes Henrik, it's true- this is a classic example of a *very useful*
MySQL-specific feature already in the ORM.

Luke Plant

unread,

Jul 5, 2010, 8:11:55 AM7/5/10

to django-d...@googlegroups.com

On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote:

> I'd be happy to use raw(); but then you lose len(), slicing,
> pagination, filter chaining, etc. My problem came about because admin
> change lists were unusably slow on large tables. With_hints allowed a
> simple monkey patch to admin, dropping 2s per page down to ~0.01s.

So the underlying problem is: how to rewrite a query (whether generated
by my code, someone else's code, or a mixture) so that it performs
better on my database. There will always be third party code which
generates queries that are really bad for performance, depending on your
project or your database. This also applies to tickets like
http://code.djangoproject.com/ticket/11604

This makes me think that what we need is a mechanism to do an equivalent
to your monkey patching that is easier and more general. Some kind of
"replace the auto-generated SQL with this SQL" method could be really
useful.

Could this be solved by writing your own cursor wrapper which checks the
SQL against a list and rewrites as necessary?

(You could set it up quickly like this:
http://chris-lamb.co.uk/2010/06/01/appending-request-url-sql-statements-django/

That still involves monkey patching, but it is possible to do the same
thing by writing a database backend).

The major drawback is fragility - doing a replacement at this level
could easily break, for example if you added a field to a model.
Adequate testing could catch this if your cursor wrapper had a debug
mode that could report what was replaced.

If we tried to do the replacement at a higher level, then you run into
the problem of how to recognise a query and replace it, which could be a
much more expensive operation if we are trying to compare objects of
type django.db.models.sql.query.Query, for example, and you also might
have the problem of not having full access to SQL statements.

However, if this method does work, if would be good to document how to
do it, and provide some better builtin hooks if necessary.

Luke

--
"We may not return the affection of those who like us, but we
always respect their good judgement." -- Libbie Fudim

Luke Plant || http://lukeplant.me.uk/

Russell Keith-Magee

unread,

Jul 5, 2010, 8:16:00 AM7/5/10

to django-d...@googlegroups.com

On Mon, Jul 5, 2010 at 3:08 PM, hinnack <henrik....@miadi.net> wrote:
> Thats interesting.
> Can you explain, how the search keyword made it into the source?
> Entry.objects.filter(headline__search="+Django -jazz Python")
> SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz
> Python IN BOOLEAN MODE);
> Seems to be very MySQL specific...

Yes; As implemented, __search is MySQL specific, and the API is
covered by our backwards compatibility policy, so we will continue to
support it. However, that doesn't mean it should be used as an example
or excuse for other database-specific features in the ORM.

The only explanation I can give for this particular feature is age.
The __search keyword was added over 4 years ago (Revision r3073,
ticket #573). Django was still in a period of rapid growth at that
time. Some of the features that were added at that time might not
survive if they were proposed today (either because of hindsight, or
because of changes in the perceived role of the ORM over time).

If a fulltext search operator were proposed today, it might still be
accepted -- after all, a query of text data using a rich full-text
search syntax is something that transcends SQL and could be considered
a primitive operation on object data. However, before I would accept
such a proposal, I would need to be convinced that the proposed syntax
would be supported on other backends. Postgres 8.3, for instance, has
fulltext search indexing capabilities. I would also need to be
convinced that the proposed syntax was rich enough to potentially
cover the range of features provided by MySQL. As it stands, we only
support queries "IN BOOLEAN MODE", not "IN NATURAL LANGUAGE MODE" or
"WITH QUERY EXPANSION".

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Jul 5, 2010, 8:38:51 AM7/5/10

to django-d...@googlegroups.com

On Mon, Jul 5, 2010 at 3:59 PM, Simon Litchfield <si...@slicmedia.com> wrote:
>> If you can come up with answers to these points, I might get
>> interested. 1 and 2 are fairly trivial; I can think of some obvious
>> answers for 3, but 4 is the big problem, and will require some serious
>> research and consideration.
>
> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
> Re 4 though, every db does it differently. In practice, most don't
> need hints like MySQL does <sigh>, because their query optimisers do a
> much better job.
>
> I'd be happy to use raw(); but then you lose len(), slicing,
> pagination, filter chaining, etc. My problem came about because admin
> change lists were unusably slow on large tables. With_hints allowed a
> simple monkey patch to admin, dropping 2s per page down to ~0.01s.
>
> MySQL is still the most used backend AFAIK, and it's the one that
> really needs the hints (poor old thing it is).

I would like to know how you're validating your assertion that MySQL
is the most used backend. It doesn't match my experience or
observation.

The fact that this is a MySQL-specific issue is perhaps the biggest
impediment to my enthusiasm about this specific feature (as proposed).
I've spent enough time in the past covering for the inadequacies of
MySQL. I don't particularly relish the thought of adding more time to
that particular ledger. :-)

> Adding with_hints()
> will only serve to encourage support from other backends too, where
> possible.
>
> Maybe we can make the with_hints() syntax more generic with both
> common and backend-specific kwargs --
>
> .with_hints(index='my_index') (string implies index on queryset base
> model)
> .with_hints(index={Model:'my_index', RelatedModel:'index2'})
> (dictionary allows defining index on a per-model basis)
>
> So the Oracle backend, for example, could implement --
>
> .with_hints(hash=True, index={Model:'my_index',
> RelatedModel:'index2'}) (
>
> On Oracle there are dozens of possible hints, so I'd say unsupported
> kwargs could simply be ignored. This would ensure seamless database
> portability.

Having an implementation ignore kwargs is the obvious approach to
providing this feature cross platform. The issue for me is what
exactly the kwargs should be, and how they would be interpreted. It
isn't clear to be how 'index="my_index"' maps to the application of
USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the
dictionary syntax would map to the situation where you have two
appearances of a given table in a query.

Yours,
Russ Magee %-)

Simon Litchfield

unread,

Jul 5, 2010, 11:19:17 AM7/5/10

to Django developers

> I would like to know how you're validating your assertion that MySQL
> is the most used backend. It doesn't match my experience or
> observation.

Nobody knows for sure. I'd put my money on it though.

> The fact that this is a MySQL-specific issue is perhaps the biggest
> impediment to my enthusiasm about this specific feature (as proposed).
> I've spent enough time in the past covering for the inadequacies of
> MySQL. I don't particularly relish the thought of adding more time to
> that particular ledger. :-)

I know, I've seen you pulling your hair out :-)

> Having an implementation ignore kwargs is the obvious approach to
> providing this feature cross platform. The issue for me is what
> exactly the kwargs should be, and how they would be interpreted. It
> isn't clear to be how 'index="my_index"' maps to the application of
> USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the
> dictionary syntax would map to the situation where you have two
> appearances of a given table in a query.

Great, so lets define the syntax then and away we go.

1) I'd say index= would map to MySQL's USE INDEX and Oracle's index().
The others you mention there would be MySQL-specific kwargs. Or, more
simply, ignore and force could be '-index' and '+index'. Bit like we
use + and - for order_by.

2) The multiple table issue is edge case but worth supporting if it
doesn't complicate the syntax too much. Rather than use models as keys
we could use the filter/exclude name; eg {'relatedmodel__field':
'index'} would apply 'index' specifically to the 'relatedmodel__field'
join.

Kevin Howerton

unread,

Jul 6, 2010, 8:07:29 PM7/6/10

to django-d...@googlegroups.com

> For a very high traffic project backed by 20+ DB servers

Slightly OT, but MySQL performance related...

If you are write heavy, there's another issue that I imagine would
bring significant gains on a MyISAM setup of that proportion... and
possibly other databases to a lesser degree (the locking mechanisms
with MyISAM are different).

Basically, right now the ORM will UPDATE or INSERT based off of a
SELECT query it does when you hit save(). So even if the ORM should
already know it's an UPDATE it will run this logic and these queries.

There's "force_update=True" which I imagine would skip the SELECT, but
I was looking into it because the logic here has some other
side-effects as well.

One of the side effects is that if you change a PK, and try to save it
you end up with two rows. I guess the PK probably shouldn't be
mutable, or we can fix that logic and possible see some gains in
speed.

I have some tests somewhere to try and illustrate how MyISAM locks,
and there might be a couple open tickets relating to side effects...
if there's any interest... kind of waiting until things settle with
the refactor, as I have some ideas I want to try out.

-k

Russell Keith-Magee

unread,

Jul 7, 2010, 9:36:01 AM7/7/10

to django-d...@googlegroups.com

On Wed, Jul 7, 2010 at 8:07 AM, Kevin Howerton <kevin.h...@gmail.com> wrote:
>> For a very high traffic project backed by 20+ DB servers
> Slightly OT, but MySQL performance related...
>
> If you are write heavy, there's another issue that I imagine would
> bring significant gains on a MyISAM setup of that proportion... and
> possibly other databases to a lesser degree (the locking mechanisms
> with MyISAM are different)

Yes - this is a well known limitation of MyISAM tables. This is a
situation where you need to pick the right tool for the job; if you're
on a write heavy site, MyISAM probably isn't going to be a good
choice.

> Basically, right now the ORM will UPDATE or INSERT based off of a
> SELECT query it does when you hit save(). So even if the ORM should
> already know it's an UPDATE it will run this logic and these queries.
>
> There's "force_update=True" which I imagine would skip the SELECT, but
> I was looking into it because the logic here has some other
> side-effects as well.
>
> One of the side effects is that if you change a PK, and try to save it
> you end up with two rows. I guess the PK probably shouldn't be
> mutable, or we can fix that logic and possible see some gains in
> speed.

If you've got any ideas on how to optimize the object saving process,
we're happy to entertain them -- but keep in mind Django's backwards
compatibility policy. Creating new objects by manually setting the
primary key is a well established idiom. We can't eliminate this
behavior without breaking backwards compatibility.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Jul 7, 2010, 10:16:00 AM7/7/10

to django-d...@googlegroups.com

On Mon, Jul 5, 2010 at 11:19 PM, Simon Litchfield <si...@slicmedia.com> wrote:
>> I would like to know how you're validating your assertion that MySQL
>> is the most used backend. It doesn't match my experience or
>> observation.
>
> Nobody knows for sure. I'd put my money on it though.
>
>> The fact that this is a MySQL-specific issue is perhaps the biggest
>> impediment to my enthusiasm about this specific feature (as proposed).
>> I've spent enough time in the past covering for the inadequacies of
>> MySQL. I don't particularly relish the thought of adding more time to
>> that particular ledger. :-)
>
> I know, I've seen you pulling your hair out :-)
>
>> Having an implementation ignore kwargs is the obvious approach to
>> providing this feature cross platform. The issue for me is what
>> exactly the kwargs should be, and how they would be interpreted. It
>> isn't clear to be how 'index="my_index"' maps to the application of
>> USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the
>> dictionary syntax would map to the situation where you have two
>> appearances of a given table in a query.
>
> Great, so lets define the syntax then and away we go.
>
> 1) I'd say index= would map to MySQL's USE INDEX and Oracle's index().
> The others you mention there would be MySQL-specific kwargs. Or, more
> simply, ignore and force could be '-index' and '+index'. Bit like we
> use + and - for order_by.

The problem with sharing a common keyword is that an index hint
specified for Oracle won't be useful (or even legal, necessarily) for
MySQL. If I'm going to ship my reusable app, and I know that one
particular query will perform badly under MySQL, it would be good to
be able to ship with whatever hinting strategy will help. However,
that same hint may not be appropriate for other backends, even if
those backends do support hinting.

Effectively, there's a namespace issue here; we need to be able to
specify "MySQL" hints independently of "Oracle" hints. As a further
complication -- as of Django 1.2, database backends are a fully
qualified path, so "mysql" isn't necessarily sufficient to identify a
backend (although I'm slightly more comfortable putting this sort of
collision in the category of "acceptable risk" if there's no obvious
way to avoid it).

As for the "+/-" syntax; I'm not wild about it. I'm not an Oracle
expert, but from a quick inspection it appears that there are dozens
of optimizations that could potentially be applied, and none of them
fall into the "USE/FORCE/IGNORE" triptych (many of them don't even
apply to indexes directly). Even on MySQL -- optimizations like "USE
INDEX x FOR ORDER BY" won't be covered by a simple syntax.

I'm not expecting that you will provide a patch that will implement
all possible hints, but I don't want to adopt a syntax that will cut
off future options. I want to have some confidence that it would be
*possible* to implement Oracle hinting or MySQL FOR ORDER BY hinting
if they were so inclined.

> 2) The multiple table issue is edge case but worth supporting if it
> doesn't complicate the syntax too much. Rather than use models as keys
> we could use the filter/exclude name; eg {'relatedmodel__field':
> 'index'} would apply 'index' specifically to the 'relatedmodel__field'
> join.

Using the filter/exclude name sounds interesting, but skips the first
step of putting an index hint on the origin table. For example,
Book.objects.filter(author__name='foo') -- now put an index hint on
Book; now put one on Author. If Author and Book are an a m2m relation
-- put a hint on the m2m table.

I'd need to see a more complete set of worked examples to see how this
would pan out.

Yours,
Russ Magee %-)

Simon Riggs

unread,

Jul 8, 2010, 4:51:32 PM7/8/10

to django-d...@googlegroups.com

On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote:

> > If you can come up with answers to these points, I might get
> > interested. 1 and 2 are fairly trivial; I can think of some obvious
> > answers for 3, but 4 is the big problem, and will require some
> serious
> > research and consideration.
>
> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
> Re 4 though, every db does it differently. In practice, most don't
> need hints like MySQL does <sigh>, because their query optimisers do a
> much better job.

The big problem I see is that hints are simply the wrong approach to
handling this issue, which I do see as an important one.

The SQL optimizer can't work out how you're going to handle the queryset
if all you mention is the filter(). SQL being a set-based language the
optimizer won't know whether you wish to retrieve 0, 1 or 1 million
rows. In many cases it actively avoids using the index for what it
thinks will be larger retrievals.

The number of rows required from the queryset is an important part of
defining the access path. Hints are only required because we didn't
specify the queryset in sufficient detail for the optimizer to
understand what we wanted.

I would say the correct approach here is to use slicing, since this will
add a LIMIT clause and provide full usage information to the SQL
optimizer about the queryset.

Can I suggest an auto-slice parameter so that we can add LIMIT N onto
the bottom of every query by default, if a slice is not already defined.
That would work for all DBMS.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

Alex Gaynor

unread,

Jul 8, 2010, 4:58:21 PM7/8/10

to django-d...@googlegroups.com

On Thu, Jul 8, 2010 at 3:51 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote:
>> > If you can come up with answers to these points, I might get
>> > interested. 1 and 2 are fairly trivial; I can think of some obvious
>> > answers for 3, but 4 is the big problem, and will require some
>> serious
>> > research and consideration.
>>
>> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
>> Re 4 though, every db does it differently. In practice, most don't
>> need hints like MySQL does <sigh>, because their query optimisers do a
>> much better job.
>
> The big problem I see is that hints are simply the wrong approach to
> handling this issue, which I do see as an important one.
>
> The SQL optimizer can't work out how you're going to handle the queryset
> if all you mention is the filter(). SQL being a set-based language the
> optimizer won't know whether you wish to retrieve 0, 1 or 1 million
> rows. In many cases it actively avoids using the index for what it
> thinks will be larger retrievals.
>

That's categorically untrue. One of the major functions of an
optimizer is too try to figure out the approximate result size so it
can better establish index vs. data cost.

> The number of rows required from the queryset is an important part of
> defining the access path. Hints are only required because we didn't
> specify the queryset in sufficient detail for the optimizer to
> understand what we wanted.
>

Uhh, querysets don't really generate SQL that's in any way different
from what a normal person would write.

> I would say the correct approach here is to use slicing, since this will
> add a LIMIT clause and provide full usage information to the SQL
> optimizer about the queryset.
>
> Can I suggest an auto-slice parameter so that we can add LIMIT N onto
> the bottom of every query by default, if a slice is not already defined.
> That would work for all DBMS.
>

And auto_slice parameter to what exactly?

> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Training and Services
>

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Simon Riggs

unread,

Jul 9, 2010, 9:58:57 AM7/9/10

to django-d...@googlegroups.com

On Thu, 2010-07-08 at 15:58 -0500, Alex Gaynor wrote:
> On Thu, Jul 8, 2010 at 3:51 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> > On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote:
> >> > If you can come up with answers to these points, I might get
> >> > interested. 1 and 2 are fairly trivial; I can think of some obvious
> >> > answers for 3, but 4 is the big problem, and will require some
> >> serious
> >> > research and consideration.
> >>
> >> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
> >> Re 4 though, every db does it differently. In practice, most don't
> >> need hints like MySQL does <sigh>, because their query optimisers do a
> >> much better job.
> >
> > The big problem I see is that hints are simply the wrong approach to
> > handling this issue, which I do see as an important one.
> >
> > The SQL optimizer can't work out how you're going to handle the queryset
> > if all you mention is the filter(). SQL being a set-based language the
> > optimizer won't know whether you wish to retrieve 0, 1 or 1 million
> > rows. In many cases it actively avoids using the index for what it
> > thinks will be larger retrievals.
> >
>
> That's categorically untrue. One of the major functions of an
> optimizer is too try to figure out the approximate result size so it
> can better establish index vs. data cost.

Perhaps I need to explain some more, since what I've said is correct.
The optimizer does work out the number of rows it thinks it will access;
whether you retrieve all of those rows is a different and important
issue.

For example, if we have a 1 million row table called X with a column
called TwoValues. In TwoValues there are 2 values, value=1 and value=2.
There are 999,999 rows with value=1. An index is built on TwoValues.

If I then issue this query:

SELECT * FROM X WHERE TwoValues = 1;

then the optimizer will deduce that I will access 999,999 rows out of a
million and its best strategy is to avoid using an index. If I issue the
same query for value 2 then it will retrieve 1 row and hence use the
index.

In most cases the application won't want to bring back all 999,999 rows,
though the optimizer doesn't know that unless we tell it. If we assume
that we actually only want 10 rows then the situation is open for change
to this form of SQL

SELECT * FROM X WHERE TwoValues = 1 LIMIT 10;

which will use the index, so demonstrating that the optimizer is
designed to offer an appropriate plan when presented with the full info.

Slicing provides the full information for the use case and a hint should
not be required to allow index use.

(The situation is more complex in the case of parameterised prepared
statements and in the case of stale statistics, though neither case is
important here).

Simon Litchfield

unread,

Jul 15, 2010, 10:43:22 PM7/15/10

to Django developers

Simon --

Slicing result sets is clearly the developer's responsibility IMHO.

Even with hard limits, MySQL still fails miserably in many situations;
hence the proposal to support hinting from the ORM.

Russ --

Firstly, re namespacing. No worries, let's just keep it RDBMS-
specific, ie --

MySQL - with_hints(use={}, force={}, ignore={})
Oracle - with_hints(index={}, ...)

That gives plenty of name space to breathe in.

Next --

> Using the filter/exclude name sounds interesting, but skips the first
> step of putting an index hint on the origin table. For example,
> Book.objects.filter(author__name='foo') -- now put an index hint on
> Book; now put one on Author. If Author and Book are an a m2m relation
> -- put a hint on the m2m table.

How about --

{'book': 'book_idx', 'author': 'author_idx', 'author__name':
'm2m_idx'}

Or, if you want to cover absolutely every possibility --

{'book': 'book_idx', 'author__name__author': 'author_idx',
'author__name': 'm2m_idx'}

That is, require indexes on the right side of joins to be named
explicitly, just in case Author is used in another join.

Sounds complicated, but obviously edge case. Common usage will be
simple, eg {'book':'book_idx'}.

Let me know what you think.

Russell Keith-Magee

unread,

Jul 22, 2010, 12:03:06 AM7/22/10

to django-d...@googlegroups.com

On Fri, Jul 16, 2010 at 10:43 AM, Simon Litchfield <si...@slicmedia.com> wrote:
> Russ --
>
> Firstly, re namespacing. No worries, let's just keep it RDBMS-
> specific, ie --
>
> MySQL - with_hints(use={}, force={}, ignore={})
> Oracle - with_hints(index={}, ...)
>
> That gives plenty of name space to breathe in.

True, I don't think it's a really nice API. Why is "index" a command
for Oracle? What happens if some other backend accepts something with
a "index"? Personally, I'd rather see this be an explicit namespace:

with_hints(mysql={...}, oracle={...})

That guarantees that there isn't any cross-database clashes with hint
names, only with database backend names. A backend can choose look at
the full list of hints and determine which hint set it should use;
potentially, it could even look at another backend's hints and act on
them (although I don't know why this would be helpful, except perhaps
in the case of a custom MySQL backend that extends the base
capabilities).

> Next --
>
>> Using the filter/exclude name sounds interesting, but skips the first
>> step of putting an index hint on the origin table. For example,
>> Book.objects.filter(author__name='foo') -- now put an index hint on
>> Book; now put one on Author. If Author and Book are an a m2m relation
>> -- put a hint on the m2m table.
>
> How about --
>
> {'book': 'book_idx', 'author': 'author_idx', 'author__name':
> 'm2m_idx'}
>
> Or, if you want to cover absolutely every possibility --
>
> {'book': 'book_idx', 'author__name__author': 'author_idx',
> 'author__name': 'm2m_idx'}
>
> That is, require indexes on the right side of joins to be named
> explicitly, just in case Author is used in another join.
>
> Sounds complicated, but obviously edge case. Common usage will be
> simple, eg {'book':'book_idx'}.
>
> Let me know what you think.

What about when the query constructs multiple joins on a related
table? How does the hint get named and applied then? Is it a
consistent "a join between A and B will always get hint X"? Or do
different joins need different hints?

Yours,
Russ Magee %-)

Santiago Perez

unread,

Jul 22, 2010, 1:59:21 PM7/22/10

to django-d...@googlegroups.com

> I'd be happy to use raw(); but then you lose len(), slicing,
> pagination, filter chaining, etc. My problem came about because admin
> change lists were unusably slow on large tables. With_hints allowed a
> simple monkey patch to admin, dropping 2s per page down to ~0.01s.

How about a raw_hints() or something that expects you to insert
additional SQL code that won't affect the result so it can still
return a QuerySet that supports len(), slicing, etc...?

Alex Gaynor

unread,

Jul 22, 2010, 2:01:58 PM7/22/10

to django-d...@googlegroups.com

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

You mean extra() ;)

Michael Manfre

unread,

Aug 24, 2010, 5:11:39 PM8/24/10

to Django developers

I don't think namespacing is a valid objection against query hints. By
design, query hints are specific to a database version and if a
generic django app were to include them, then the author is doing
something wrong or setting the expectation of only supporting a very
specific environment. Any core implementation for supporting this
should be minimalist and only provide a mechanism for passing some
kind of data from the QuerySet to the SQLCompiler that each backend
can optionally choose to do something with it.

I would be thrilled to be able to do use

with_hints('HASH GROUP', 'FAST 10')

and have it set

Query.query_hints = ['HASH GROUP', 'FAST 10']

I'm currently patching in support so I can do something like this for
django-mssql.

Regards,
Michael Manfre

On Jul 22, 12:03 am, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

Russell Keith-Magee

unread,

Aug 24, 2010, 9:30:14 PM8/24/10

to django-d...@googlegroups.com

On Wed, Aug 25, 2010 at 5:11 AM, Michael Manfre <mma...@gmail.com> wrote:
> I don't think namespacing is a valid objection against query hints. By
> design, query hints are specific to a database version and if a
> generic django app were to include them, then the author is doing
> something wrong or setting the expectation of only supporting a very
> specific environment.

I don't follow your logic here.

If I'm writing a reusable app, I define some models, and I define the
views that use those models. Those views issue queries. Therefore, I
know both the tables that are involved, and the queries that will be
issued on those tables. If I'm really tuning things, I might even be
using custom SQL, shipped with my project, to create custom indices
for those tables.

Since I'm in control of the table and how the table is used, I should
also be in a position to provide any query plan hints that will assist
the known usage patterns.

If I'm distributing my application to many people, different people in
my audience will be using different databases. That puts me in a
position of (potentially) having to provide hints for multiple
databases as part of my shipped application.

Therefore, there is a namespace problem. I need MySQL hints to be
applied for users that have MySQL. I need Oracle hints to be applied
for users that have Oracle, ignoring the MySQL hints. I need Postgres
to ignore any hint that is applied.

On top of that, there's a possibility that a hint name (and value
format) for one database might clash with another. I don't know of any
specific examples of overlap, but it isn't hard to manufacture one
using generic hinting keywords like "index = foo".

> Any core implementation for supporting this
> should be minimalist and only provide a mechanism for passing some
> kind of data from the QuerySet to the SQLCompiler that each backend
> can optionally choose to do something with it.
>
> I would be thrilled to be able to do use
>
> with_hints('HASH GROUP', 'FAST 10')
>
> and have it set
>
> Query.query_hints = ['HASH GROUP', 'FAST 10']
>
> I'm currently patching in support so I can do something like this for
> django-mssql.

And that's the proposal on the table, except that you would need to
specify that this is a MSSQL hint, since the hint you're making isn't
of any use to a user of any database other than MSSQL.

Yours,
Russ Magee %-)

Michael Manfre

unread,

Aug 25, 2010, 12:09:06 PM8/25/10

to Django developers

On Aug 24, 9:30 pm, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

> I don't follow your logic here.

I guess I'm more of the mindset of query hints having as much to do
with the stored data as the schema and most hints would probably be a
premature optimization or specific to a failing with a specific
database version. I do acknowledge that there are situations where
being able to ship hint corrections for known issues would be useful,
but I can't escape from the thought that those situations would attach
an expectation about which database backend should, or should probably
not be used with the app.

> And that's the proposal on the table, except that you would need to
> specify that this is a MSSQL hint, since the hint you're making isn't
> of any use to a user of any database other than MSSQL.

I'm assuming the current proposal will allow for arbitrary namespaces
that are defined simply by a backend willing to look for the specific
name. If not, there would need to be discussion about which namespaces
are defined by the core and how unsupported backends would register
new namespaces.

Regards,
Michael Manfre

Russell Keith-Magee

unread,

Aug 25, 2010, 7:59:31 PM8/25/10

to django-d...@googlegroups.com

On Thu, Aug 26, 2010 at 12:09 AM, Michael Manfre <mma...@gmail.com> wrote:
>
> On Aug 24, 9:30 pm, Russell Keith-Magee <russ...@keith-magee.com>
> wrote:
>> I don't follow your logic here.
>
> I guess I'm more of the mindset of query hints having as much to do
> with the stored data as the schema and most hints would probably be a
> premature optimization or specific to a failing with a specific
> database version. I do acknowledge that there are situations where
> being able to ship hint corrections for known issues would be useful,
> but I can't escape from the thought that those situations would attach
> an expectation about which database backend should, or should probably
> not be used with the app.

A reasonable objection. The problem is that a query is contained in
code; if I (as an end-user) discover that a particular query needs to
be hinted, I don't have an entry point.

Putting a with_hint() keyword into the query provides that entry
point, but then means that we need to be able to account for the case
where the same app is deployed under MySQL and Postgres. A namespace
is the best way I can think to acheive this.

Of course, I'm open to any other suggestion for how we can separate
the location where the hint is required from the actual hint
definition.

>> And that's the proposal on the table, except that you would need to
>> specify that this is a MSSQL hint, since the hint you're making isn't
>> of any use to a user of any database other than MSSQL.
>
> I'm assuming the current proposal will allow for arbitrary namespaces
> that are defined simply by a backend willing to look for the specific
> name. If not, there would need to be discussion about which namespaces
> are defined by the core and how unsupported backends would register
> new namespaces.

Correct. The current proposal makes the namespace the last part of the
backend's import path (e.g., mysql for the django.db.backends.mysql
backend). This does have the potential for collisions if you write a
custom mysql backend in a different location. However, there isn't a
huge population of backends, so the practical likelihood of collision
is relatively small.

Yours,
Russ Magee %-)

Reply all

Reply to author

Forward