Question about 'iterable cursors'

Frank Millman

unread,

Nov 6, 2011, 3:54:48 AM11/6/11

to pytho...@python.org

Hi all

I am using a few DB_API adaptors - ceODBC for Sql Server, psycopg2 for
PostgreSQL, and sqlite3 for sqlite3.

They all offer the feature that if a cursor executes a SELECT, the cursor
returns an iterator which can be used to fetch one row at a time. I have
been using this feature for a while and it seems like a 'good thing'.

Now I am not so sure. I am using a connection pool to maintain connections
to the database. A principle I am following is that a connection must be
returned quickly, so that it is available for reuse.

I have been happily returning the connection, but keeping the cursor open
while processing the rows selected. I now realise that this is dangerous.
Therefore I have changed my system to execute fetchall() on the cursor
before returning the connection. This obviously loses the benefit of the
iterator.

I would appreciate confirmation that my thinking is correct on this issue.
Or is there any way that I can have my cake and eat it?

Thanks

Frank Millman

Alain Ketterlin

unread,

Nov 6, 2011, 4:16:22 AM11/6/11

to

Your thinking is correct: you need to keep the connection while
processing the cursor. Databases are made to scale, you may well be
processing the first lines of the result before the DBMS has even
finished scanning tables. View this as a pipe, the cursor being one end
of the pipe. The usual setting, fetching one line at a time, lets you
overlap your processing with the network transfers.

Fetching all data, returning the connection, and then start processing
only makes sense if the processing take a lot of time (I mean: a lot
more than fetching results), which is a rare case. Unless you are in
such an extreme situation, I would suggest leaving the optimization to
the connection pool, which is here to solve what you are trying to
solve.

-- Alain.

Frank Millman

unread,

Nov 6, 2011, 4:39:56 AM11/6/11

to pytho...@python.org

"Alain Ketterlin" <al...@dpt-info.u-strasbg.fr> wrote

Thank you, Alain. That is very clear.

So my analysis of the problem is correct, but my solution is wrong.

Instead of executing fetchall() and returning the connection, I should
retain the connection until I have exhausted the cursor.

That makes a lot of sense.

Frank

Message has been deleted

John Nagle

unread,

Nov 7, 2011, 1:04:43 AM11/7/11

to

On 11/6/2011 12:04 PM, Dennis Lee Bieber wrote:
> On Sun, 6 Nov 2011 11:39:56 +0200, "Frank Millman"<fr...@chagford.com>
> declaimed the following in gmane.comp.python.general:

>
>>
>> So my analysis of the problem is correct, but my solution is wrong.
>>
>> Instead of executing fetchall() and returning the connection, I should
>> retain the connection until I have exhausted the cursor.
>>
>> That makes a lot of sense.
>>

> Especially if all you are processing are read-only activities.
>
> If you have a connection/cursor doing write operations, you may not
> be able to commit those writes until all reading cursors have closed.
> (Read the documentation on the SQLite3 locking system -- though the
> newest version has added a second type of locking which may complicate
> the matter. The original/normal scheme has potential readers "outside"
> SQLite3, active readers "inside" SQLite3 -- when an active reader cursor
> advances to a pending write, it blocks all the potential readers from
> entering, but is itself blocked until all other active readers have
> exited)

Right. The scarce resource is database locks, not connections.
Especially with SQLite, which has, by necessity, a rather brutal
locking strategy.

Realize that SQLite is not a high-performance multi-user database.
You use SQLite to store your browser preferences, not your customer
database.

If you're doing enough transactions from multiple processes that
performance is an issue, you need to move up to MySQL or Postgres.
If almost all transactions are SELECTs, performance may not be
too bad, but if there are INSERT and UPDATE transactions on the
same table, performance will be awful.

John Nagle

Lie Ryan

unread,

Nov 8, 2011, 12:29:25 AM11/8/11

to pytho...@python.org

On 11/07/2011 05:04 PM, John Nagle wrote:
> Realize that SQLite is not a high-performance multi-user database.
> You use SQLite to store your browser preferences, not your customer
> database.

I agree with SQLite is not multi-user; I disagree that SQLite is not a
high-performance database. In single user cases, SQLite should far
outperform a client-server-based database engine since it doesn't have
the client-server overhead.