MS-SQL server LIMIT/OFFSET implementation

76 views
Skip to first unread message

DanH

unread,
Jul 19, 2006, 8:33:15 PM7/19/06
to Django developers
Hi,

I would like to use Django with MS-SQL server and the svn version
doesn't work yet.
I submitted a patch to make it work:
http://code.djangoproject.com/ticket/2358
It's not ideal, just a quick fix until I get a better understanding of
Django internals.

What is missing right now is LIMIT/OFFSET functionality.
In SQL 2005 there is a ROW_NUMBER function which would make the
implementation quite simple:
http://msdn2.microsoft.com/en-us/library/ms186734.aspx

For SQL 2000 however a more complicated SQL construct is necessary to
make it work.
I found an older patch for Django with an implementation:
http://code.djangoproject.com/ticket/225 but it doesn't seem to be
present in the code.

Is someone working on MS-SQL support?
I would like to help if possible with the implementation and testing.

Thank you,
Dan.

Ivan Sagalaev

unread,
Jul 20, 2006, 3:54:47 AM7/20/06
to django-d...@googlegroups.com
DanH wrote:
>
> Is someone working on MS-SQL support?
> I would like to help if possible with the implementation and testing.

Yes, testing is the main thing that holds it. The patch was developed
sporadically and as far as I remember all the theoretical problems were
known how to solve. It's just needed someone to dedicate the time to
test it.

P.S. I'm not the one of the core devs so it makes sense anyway to wait
an answer from Jacob or Adrian.

Dan Hristodorescu

unread,
Jul 20, 2006, 5:34:19 PM7/20/06
to django-d...@googlegroups.com
Hi Ivan,

I looked in the code and it seems that QuerySet class needs some serious refactoring to support MSSQL paging.
For SQL 2005 the SQL string should look like this:

WITH myTable AS
(SELECT fields, ROW_NUMBER() OVER (order by orderclause) AS RowNumber FROM table)
SELECT * FROM myTable
WHERE RowNumber between offset AND offset+limit

and for SQL 2000 should look like this:

SELECT fields FROM table
        WHERE primary key IN
            (SELECT TOP limit primary_key FROM table
                WHERE primary_key NOT IN
                    (SELECT TOP offset primary_key FROM table
                        WHERE filter_conditions
                        ORDER BY sort_field)
                AND filter_criteria
                ORDER BY sort_field)
        ORDER BY sort_field
       
And with join tables it looks completely crazy (I've only used it using DISTINCT with joins), but that's the optimal way to do.

A proper implementation should consider both cases since the ROW_NUMBER in SQL 2005 is the native implementation and it's a lot faster.
I'll start working on it although during the process I might need feedback from Django lead developers since I'm trying to refactor a core class.
I wouldn't want the patch to be rejected in the end.

Thanks,
Dan.

Malcolm Tredinnick

unread,
Jul 20, 2006, 7:37:22 PM7/20/06
to django-d...@googlegroups.com

Heads up: that file (django/db/models/query.py) is undergoing a bit of a
rewrite at the moment (as per [1]), so you may wish to hold off for a
little while until its done. Or maybe just keep charging ahead and
backport in the near future. Anyway, just letting you know in case you
hadn't noticed this.

[1]
http://groups.google.com/group/django-developers/browse_frm/thread/1026181083385c58/08f0d7258c0659af?lnk=gst&q=freeze&rnum=1#08f0d7258c0659af

Regards,
Malcolm

Dan Hristodorescu

unread,
Jul 20, 2006, 9:01:56 PM7/20/06
to django-d...@googlegroups.com
Thanks Malcolm,

I wasn't aware of it.
I suspect Adrian intends to refactor the code along the same lines I would have to.
Basically a more fine grain construction of the SQL string, providing internal access the different parts of the SQL: the field list, the FROM table list, the WHERE clause and the ORDER BY.
Having those accessible it becomes a simple matter of a different assembly of the final SELECT string.

I will wait for the new code to be committed.

Regards,
Dan. 

DavidA

unread,
Jul 21, 2006, 11:48:42 AM7/21/06
to Django developers
> > On Thu, 2006-07-20 at 17:34 -0400, Dan Hristodorescu wrote:
> > >
> > > and for SQL 2000 should look like this:
> > >
> > > SELECT fields FROM table
> > > WHERE primary key IN
> > > (SELECT TOP limit primary_key FROM table
> > > WHERE primary_key NOT IN
> > > (SELECT TOP offset primary_key FROM table
> > > WHERE filter_conditions
> > > ORDER BY sort_field)
> > > AND filter_criteria
> > > ORDER BY sort_field)
> > > ORDER BY sort_field
> > >
> > > And with join tables it looks completely crazy (I've only used it
> > > using DISTINCT with joins), but that's the optimal way to do.
> > >

Since its so complicated for SQL 2000, couldn't you just cheat a bit
and do some in SQL and some in Python?

SELECT TOP limit+offset FROM table ...

and then in the backend:

return cursor.fetchall()[offset:]

(OK, I know its more complicated than that, but you get the idea).

That's typically the way I've seen paging done with SQL 2k in the past.
-Dave

Dan Hristodorescu

unread,
Jul 21, 2006, 7:55:43 PM7/21/06
to django-d...@googlegroups.com
Yes, you could do that, but it's not what you would want.
It will become visible slower when the offset increases because you have to iterate through the previous rows.
The nice Python syntax hides this detail but the driver has do it.
This is because the cursor is a forward cursor and you have to iterate row by row.
Even when using scrollable cursors instead of a regular select it's the same situation, opening the cursor is slow because the server will have to prepare a big result set.

I wish it was easier and have just generic database code in the framework but in the end you'll have to run that kind of query to achieve at least some decent performance.


Regards,
Dan.
Reply all
Reply to author
Forward
0 new messages