we are about upgrading our sqlalchemy library from 0.4.8 to something
newer and during this process we have detected that the LIMIT/OFFSET
support for oracle has been changed, from using “ROW NUMBER OVER...”
to a wrapped subquery approach in conjunction with ROWNUM as described
in
http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-offset-support
Unfortunately this approch is about 10 times slower for large tables
which is mainly related to the fact that the innermost subquery has to
sort the entire table with a plain 'order by'.
Interestingly the
ROW_NUMBER() OVER (ORDER BY some db fields)
is so much more efficient than the normal order by approach.
I don't know the reasons behind the decision for getting rid of the
"row number over" approach, but could it make sense to reimplement
this algorithm again into the current SA as an optional feature?
Any help/feedback is very appreciated.
Thanks,
Ralph
Do you have benchmarks to back that up? In Django, we switched from
using row_number to rownum after a contributor convinced me that
rownum was faster. See:
http://code.djangoproject.com/ticket/9136
Thanks,
Ian
A full history of this feature is here:
http://www.sqlalchemy.org/trac/ticket/536
The rationale is based on the bug described in that ticket, as well as that we preferred to go with an approach that was recommended by a lead engineer at Oracle.
The dialect includes an option to add the /*+ FIRST_ROWS(N) */ directive, by specifying the "optimize_limits" keyword to create engine - we originally had that in the query in all cases, until some folks chimed in that we shouldn't make that decision by default. I don't know if that helps your use case.
The previous system can be restored using a @compiles directive. I have documented that recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/OracleRowNumberOver .
> I checked the results (for my case) for the two variants of ordering
> our table
> (actually a view) and I could not find any difference. No duplicates,
> everything in
> same order.
>
> Next monday I can provide you with some benchmarks. Some more details:
> - the select was run on a view with 20 columns, coming from 7 tables
> - no index on the ordered column (users are able to sort arbitrarily
> over
> any column through a web interface)
>
> The difference was huge, the ROW_NUMBER() OVER approach was finished
> after 1.7s, the newer nested approach took about 25s. More details
> next week.
Here's another post that finds ROWNUM to be significantly faster than ROW_NUMBER():
http://explainextended.com/2009/05/06/oracle-row_number-vs-rownum/
>
> Ciao ciao
>
> Ralph
>
>
> On Jun 23, 5:29 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>> On Wed, Jun 23, 2010 at 1:50 AM, Ralph Heinkel <ralph.hein...@web.de> wrote:
>>> Hi,
>>
>>> we are about upgrading our sqlalchemy library from 0.4.8 to something newer
>>> and during this process we have detected that the LIMIT/OFFSET support for
>>> oracle has been changed, from using “ROW NUMBER OVER...” to a wrapped
>>> subquery approach in conjunction with ROWNUM as described in
>>> http://www.sqlalchemy.org/docs/reference/dialects/oracle.html#limit-o...
>>
>>> Unfortunately this approch is about 10 times slower for large tables which
>>> is mainly related to the fact that the innermost subquery has to sort the
>>> entire table with a plain 'order by'.
>>> Interestingly the
>>> ROW_NUMBER() OVER (ORDER BY some db fields)
>>> is so much more efficient than the normal order by approach.
>>
>> Do you have benchmarks to back that up? In Django, we switched from
>> using row_number to rownum after a contributor convinced me that
>> rownum was faster. See:
>>
>> http://code.djangoproject.com/ticket/9136
>>
>> Thanks,
>> Ian
>
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>