new LIMIT/OFFSET Support for oracle - huge speed penalty

414 views
Skip to first unread message

Ralph Heinkel

unread,
Jun 23, 2010, 3:50:36 AM6/23/10
to sqlal...@googlegroups.com
Hi,

we are about upgrading our sqlalchemy library from 0.4.8 to something
newer and during this process we have detected that the LIMIT/OFFSET
support for oracle has been changed, from using “ROW NUMBER OVER...”
to a wrapped subquery approach in conjunction with ROWNUM as described
in
http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-offset-support


Unfortunately this approch is about 10 times slower for large tables
which is mainly related to the fact that the innermost subquery has to
sort the entire table with a plain 'order by'.
Interestingly the
ROW_NUMBER() OVER (ORDER BY some db fields)
is so much more efficient than the normal order by approach.

I don't know the reasons behind the decision for getting rid of the
"row number over" approach, but could it make sense to reimplement
this algorithm again into the current SA as an optional feature?

Any help/feedback is very appreciated.

Thanks,

Ralph

Ian Kelly

unread,
Jun 23, 2010, 11:29:44 AM6/23/10
to sqlal...@googlegroups.com
On Wed, Jun 23, 2010 at 1:50 AM, Ralph Heinkel <ralph....@web.de> wrote:
> Hi,
>
> we are about upgrading our sqlalchemy library from 0.4.8 to something newer
> and during this process we have detected that the LIMIT/OFFSET support for
> oracle has been changed, from using  “ROW NUMBER OVER...” to a wrapped
> subquery approach in conjunction with ROWNUM as described in
> http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-offset-support
>
>
> Unfortunately this approch is about 10 times slower for large tables which
> is mainly related to the fact that the innermost subquery has to sort the
> entire table with a plain 'order by'.
> Interestingly the
>        ROW_NUMBER() OVER (ORDER BY some db fields)
> is so much more efficient than the normal order by approach.

Do you have benchmarks to back that up? In Django, we switched from
using row_number to rownum after a contributor convinced me that
rownum was faster. See:

http://code.djangoproject.com/ticket/9136

Thanks,
Ian

Michael Bayer

unread,
Jun 23, 2010, 12:04:24 PM6/23/10
to sqlal...@googlegroups.com

A full history of this feature is here:

http://www.sqlalchemy.org/trac/ticket/536

The rationale is based on the bug described in that ticket, as well as that we preferred to go with an approach that was recommended by a lead engineer at Oracle.

The dialect includes an option to add the /*+ FIRST_ROWS(N) */ directive, by specifying the "optimize_limits" keyword to create engine - we originally had that in the query in all cases, until some folks chimed in that we shouldn't make that decision by default. I don't know if that helps your use case.

The previous system can be restored using a @compiles directive. I have documented that recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/OracleRowNumberOver .


Ralph Heinkel

unread,
Jun 24, 2010, 6:23:30 AM6/24/10
to sqlalchemy
I checked the results (for my case) for the two variants of ordering
our table
(actually a view) and I could not find any difference. No duplicates,
everything in
same order.

Next monday I can provide you with some benchmarks. Some more details:
- the select was run on a view with 20 columns, coming from 7 tables
- no index on the ordered column (users are able to sort arbitrarily
over
any column through a web interface)

The difference was huge, the ROW_NUMBER() OVER approach was finished
after 1.7s, the newer nested approach took about 25s. More details
next week.

Ciao ciao

Ralph


On Jun 23, 5:29 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Wed, Jun 23, 2010 at 1:50 AM, Ralph Heinkel <ralph.hein...@web.de> wrote:
> > Hi,
>
> > we are about upgrading our sqlalchemy library from 0.4.8 to something newer
> > and during this process we have detected that the LIMIT/OFFSET support for
> > oracle has been changed, from using  “ROW NUMBER OVER...” to a wrapped
> > subquery approach in conjunction with ROWNUM as described in
> >http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-o...

Ralph Heinkel

unread,
Jun 24, 2010, 6:31:58 AM6/24/10
to sqlalchemy
Hi Michael,

We have also tried the /*+ FIRST_ROWS(N) */ optimization hint, it
only gave a 25% speed improvement, but the result was still 5 or 7
times slower than the ROW_NUMBER() OVER approach.
I'll provide benchmark details on Monday, also details about table
(actually a view) layout, indices, etc.

On Jun 23, 6:04 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
> A full history of this feature is here:
>
> http://www.sqlalchemy.org/trac/ticket/536
>
> The rationale is based on the bug described in that ticket, as well as that we preferred to go with an approach that was recommended by a lead engineer at Oracle.
>
> The dialect includes an option to add the /*+ FIRST_ROWS(N) */ directive, by specifying the "optimize_limits" keyword to create engine - we originally had that in the query in all cases, until some folks chimed in that we shouldn't make that decision by default.    I don't know if that helps your use case.
>
> The previous system can be restored using a @compiles directive.  I have documented that recipe athttp://www.sqlalchemy.org/trac/wiki/UsageRecipes/OracleRowNumberOver.

I don't understand yet how the @compiles directive works, but I'm
also not that familiar with SA internals. Could you point me to some
URL where this is described?

Thanks,

Ralph

Michael Bayer

unread,
Jun 24, 2010, 10:48:12 AM6/24/10
to sqlal...@googlegroups.com

Michael Bayer

unread,
Jun 26, 2010, 5:24:32 PM6/26/10
to sqlal...@googlegroups.com

On Jun 24, 2010, at 6:23 AM, Ralph Heinkel wrote:

> I checked the results (for my case) for the two variants of ordering
> our table
> (actually a view) and I could not find any difference. No duplicates,
> everything in
> same order.
>
> Next monday I can provide you with some benchmarks. Some more details:
> - the select was run on a view with 20 columns, coming from 7 tables
> - no index on the ordered column (users are able to sort arbitrarily
> over
> any column through a web interface)
>
> The difference was huge, the ROW_NUMBER() OVER approach was finished
> after 1.7s, the newer nested approach took about 25s. More details
> next week.

Here's another post that finds ROWNUM to be significantly faster than ROW_NUMBER():

http://explainextended.com/2009/05/06/oracle-row_number-vs-rownum/

>
> Ciao ciao
>
> Ralph
>
>
> On Jun 23, 5:29 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>> On Wed, Jun 23, 2010 at 1:50 AM, Ralph Heinkel <ralph.hein...@web.de> wrote:
>>> Hi,
>>
>>> we are about upgrading our sqlalchemy library from 0.4.8 to something newer
>>> and during this process we have detected that the LIMIT/OFFSET support for
>>> oracle has been changed, from using “ROW NUMBER OVER...” to a wrapped
>>> subquery approach in conjunction with ROWNUM as described in
>>> http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-o...
>>
>>> Unfortunately this approch is about 10 times slower for large tables which
>>> is mainly related to the fact that the innermost subquery has to sort the
>>> entire table with a plain 'order by'.
>>> Interestingly the
>>> ROW_NUMBER() OVER (ORDER BY some db fields)
>>> is so much more efficient than the normal order by approach.
>>
>> Do you have benchmarks to back that up? In Django, we switched from
>> using row_number to rownum after a contributor convinced me that
>> rownum was faster. See:
>>
>> http://code.djangoproject.com/ticket/9136
>>
>> Thanks,
>> Ian
>

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Ralph Heinkel

unread,
Jul 7, 2010, 3:44:06 AM7/7/10
to sqlalchemy
Hi,

sorry for the late reply. I've now made some measurements, and
repeated those two a couple times to avoid caching effects. I've
reduced it to the innermost sql statement. Here they are:

SELECT entrypoint_id, ...<20 about fields> ..., ROW_NUMBER() OVER
(ORDER BY creationdate DESC, entrypoint_id DESC) AS ora_rn FROM
browseentrypoint
--> 1.7 secs

Now the same in 'traditional style':
SELECT entrypoint_id, ...<20 about fields> ... FROM browseentrypoint
ORDER BY creationdate DESC, entrypoint_id DESC
--> 27 secs

The results are exactly the same. So there is a quite a difference
btw. those two statements.
I've to say that we don't have indices since users can (through the
web frontend) sort after arbitrary fields (all 20). Having 20
different indices on one table was not feasible.

Ralph
Reply all
Reply to author
Forward
0 new messages