orm query that returns millions of rows

Chris Withers

unread,

Mar 2, 2012, 4:21:44 AM3/2/12

to sqlal...@googlegroups.com

Hi All,

I need to do a query that, in its simplest form, would be:

for row in session.query(Message.body).order_by(Message.session_num):
...do read-only stuff with row and then discard...

However, as that query will return roughly 20-40 million rows per time,
I'm guessing the above may not be what I need.

What should I use?

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk

Michael Bayer

unread,

Mar 2, 2012, 10:59:45 AM3/2/12

to sqlal...@googlegroups.com

On Mar 2, 2012, at 4:21 AM, Chris Withers wrote:

> Hi All,
>
> I need to do a query that, in its simplest form, would be:
>
> for row in session.query(Message.body).order_by(Message.session_num):
> ...do read-only stuff with row and then discard...
>
> However, as that query will return roughly 20-40 million rows per time, I'm guessing the above may not be what I need.
>
> What should I use?

it's a very simple query, so if you were using psycopg2, query(..).yield_per(1000) would pull rows over in chunks of 1000 at a time. Should work with Oracle as well. Other DBAPIs more or less buffer the entire result set at the moment though some might have ways to get around it (I have a vague recollection that OurSQL does).

the other recipe is the windowed range query which I normally use for this at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery, that's also a Postgresql/Oracle/MSSQL only thing.

Finally there's just doing LIMIT/OFFSET though you might find the result slowing down considerably as you start doing OFFSETs of like 30 million.

Chris Withers

unread,

Mar 6, 2012, 1:37:13 AM3/6/12

to sqlal...@googlegroups.com, Michael Bayer

On 02/03/2012 15:59, Michael Bayer wrote:
>
> the other recipe is the windowed range query which I normally use for this at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery,

So, this would be the way to go for Microsoft SQL Server?

Michael Bayer

unread,

Mar 6, 2012, 11:02:23 AM3/6/12

to sqlal...@googlegroups.com

On Mar 6, 2012, at 1:37 AM, Chris Withers wrote:

> On 02/03/2012 15:59, Michael Bayer wrote:
>>
>> the other recipe is the windowed range query which I normally use for this at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery,
>
> So, this would be the way to go for Microsoft SQL Server?