[Django] #29563: SQLite and Queryset.iterator() support

11 views
Skip to first unread message

Django

unread,
Jul 13, 2018, 10:37:30 AM7/13/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew | Owner: nobody
Brown |
Type: | Status: new
Uncategorized |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I’m writing a non-web app that uses Django as the ORM and SQLite as the
backend, and I have a need to iterate over large tables efficiently.
Django’s documentation says Queryset.iterator() does not work on SQLite
[#ref1 (1)] [#ref2 (2)] but I tried it anyways, and discovered that it
works: results are not read into memory in entirety, but streamed from the
database in chunks. I traced this to an apparent logic error in the
SQLCompiler.execute_sql() method [#ref3 (3)] and the interpretation of the
can_use_chunked_reads flag. More on this below.

But I was curious why it didn’t crash or get some error from the SQLite
library despite the Django documentation saying SQLite doesn’t support
streaming queries. Careful reading of the SQLite documentation seems to
indicate there’s no problem reading a query in one cursor and writing to
the database (even the same table) in another. There’s a caveat about
isolation [#ref4 (4)] to watch out for, but otherwise seems like a
perfectly supported mode of operation.

So I dug into Django’s history and I came upon Django bug #7411. This bug
was written in June 2008. At the time, the latest version of SQLite was
3.5.9, which didn’t support database commits interleaved with partially
read cursors. So the workaround implemented was to read the entire query
result into memory. The database feature flag “can_use_chunked_reads” was
added, and this was set to False in the SQLite backend. Code was added to
the SQLCompiler.execute_sql() method to wrap the result iterator in
list(result) if that flag was false [3b37c8151a]. I call this a
“workaround” because it’s working around a limitation SQLite had at the
time.

However, later that year SQLite version 3.6.5 was released, which added
the ability to run COMMIT simultaneously with other read operations [#ref5
(5)]. I was not able to reproduce bug #7411 using SQLite versions >=
3.6.5. This version was released in November 2008.

So about that bug in SQLCompiler.execute_sql(). From what I can tell,
Django ticket #16614 committed a change [f3b7c05936] which introduced a
logic bug to the relevant code in June 2016, reproduced below [#ref3 (3)]:

{{{#!python
if not chunked_fetch and not
self.connection.features.can_use_chunked_reads:
try:
# If we are using non-chunked reads, we return the same data
# structure as normally, but ensure it is all read into memory
# before going any further. Use chunked_fetch if requested.
return list(result)
finally:
# done with the cursor
cursor.close()
return result
}}}

Notice the condition will skip the if statement body if either
chunked_fetch is True or if can_use_chunked_reads is True. Since calling
queryset.iterator() sets chunked_fetch to True, the can_use_chunked_reads
flag is ignored and the if statement skipped. I believe the “and” should
be an “or”, which would return list(result) if the database doesn’t
support chunked reads regardless of the chunked_fetch value.

Regardless of that bug, I suggest the workaround be removed entirely since
it hasn’t been necessary since 2008 and hasn’t been functional since 2016.
If we need to support the can_use_chunked_reads flag for compatibility
with custom and third-party database backends, then we can fix the logic
error and set SQLite’s flag to True. Documentation should be updated
accordingly.

If there is a reason for keeping chunked reads disabled for SQLite (such
as the SQLite caveats on isolation [#ref4 (4)], or needing to support
older versions of SQLite <3.6.5), then the logic error should be fixed.

I’m willing to put in a pull request, but since the situation is quite
complicated (the workaround also inadvertently helped avoid a related bug
in Python’s sqlite3 driver, [[https://bugs.python.org/issue10513|#10513]],
fixed as of Python 2.7.13 and 3.5.3) I wanted to keep this report as short
as I could to get some feedback first.

[=#ref1 (1)] https://docs.djangoproject.com/en/2.0/ref/models/querysets
/#without-server-side-cursors
[=#ref2 (2)]
https://github.com/django/django/blob/2.0.7/django/db/backends/sqlite3/features.py#L9
[=#ref3 (3)]
https://github.com/django/django/blob/2.0.7/django/db/models/sql/compiler.py#L1096
[=#ref4 (4)] https://sqlite.org/isolation.html (see final paragraph “No
Isolation Between Operations On The Same Database Connection”)
[=#ref5 (5)] https://sqlite.org/releaselog/3_6_5.html

--
Ticket URL: <https://code.djangoproject.com/ticket/29563>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 13, 2018, 10:59:47 AM7/13/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ryan P Kilby):

* cc: Ryan P Kilby (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:1>

Django

unread,
Jul 13, 2018, 12:12:04 PM7/13/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ramiro Morales):

* stage: Unreviewed => Accepted


Comment:

Accepting tentatively based on the analysis Andrew did.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:2>

Django

unread,
Jul 13, 2018, 1:52:39 PM7/13/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Josh Schneier):

Django 2.2 only supports Python 3.5+ anyway.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:3>

Django

unread,
Jul 19, 2018, 2:41:36 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Andrew Brown):

I’ve attached my test case test1.py that I used to reproduce bug #7411. It
doesn’t use Django and hits the sqlite interface directly in what I
believe is the same pattern as #7411 to hit the same bug.

To show the workaround is no longer needed, I reproduced the bug under
Python 2.5.2 and SQLite 3.5.9, then tried to reproduce the bug under any
version of Python using SQLite >=3.6.5

Under Python 2.5.2 compiled against SQLite 3.5.9 (the latest of each as of
mid 2008) it gets the same error as bug #7411:
{{{
test1.py
Python Version: 2.5.2 (trunk, Jul 19 2018, 14:16:29)
[GCC 5.4.0 20160609]
SQLite Version: 3.5.9
Traceback (most recent call last):
File "../test1.py", line 25, in <module>
conn.commit()
sqlite3.OperationalError: SQL logic error or missing database
}}}

The exact error changes depending on the combination of Python and SQLite
versions, and some versions of Python the test isn’t reproducible but a
similar bug is (Python bug [https://bugs.python.org/issue10513 #10513]
test case attached as test2.py). However, I could not reproduce #7411 with
any version of Python when using SQLite versions 3.6.5 and up.

{{{
test1.py
Python Version: 2.5.2 (trunk, Jul 19 2018, 14:10:44)
[GCC 5.4.0 20160609]
SQLite Version: 3.6.5
Didn't crash
}}}

Python bug [https://bugs.python.org/issue10513 #10513] was present in some
versions regardless of SQLite version, but was fixed in 2.7.13 and 3.5.3.
As I understand Django only supports the latest revision of the supported
major Python versions so this bug is no longer relevant either.

Feel free to ask any questions. I’ve tried to summarize what I’ve found,
but I have a lot more info including bisections and specific commits where
bugs were introduced and fixed. I know this is a relatively complicated
situation so let me know what I can do to help verify my claims.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:4>

Django

unread,
Jul 19, 2018, 2:42:08 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Andrew Brown):

* Attachment "test1.py" added.

Django

unread,
Jul 19, 2018, 2:42:15 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Andrew Brown):

* Attachment "test2.py" added.

Django

unread,
Jul 19, 2018, 3:09:06 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

Can you propose a patch or do you need some question answered?

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:5>

Django

unread,
Jul 19, 2018, 3:16:40 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
| Brown
Type: Uncategorized | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Andrew Brown):

* status: new => assigned
* owner: nobody => Andrew Brown


Comment:

Yes, I'll work on a patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:6>

Django

unread,
Jul 19, 2018, 5:44:57 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
| Brown
Type: Uncategorized | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Andrew Brown):

I've created a patch that can be seen on this branch:
https://github.com/brownan/django/tree/ticket_29563

What would you like to see in the way of new tests? No tests broke for me,
so it seems the behavior I changed in SQLCompiler.execute_sql() wasn't
covered. How do I approach this?

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:7>

Django

unread,
Jul 19, 2018, 7:21:00 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
| Brown
Type: Uncategorized | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Hey Andrew, first thanks for the extremely well detailed report,
investigation, and the initial patch.

The only change I'd make to your branch is to break it in two commits.

A first one that addresses the `execute_sql()` `and`->`or` switch and adds
a test to make sure the result is fully iterated when `(chunked_fetch :=
True) and (connection.features.can_use_chunked_reads := False)`. I'm not
sure exactly where the test should live and how the testing should be
performed but you could rely on
`@skipIfDatabaseFeature('can_use_chunked_reads')` to skip the test on
backends that don't support it. I'd start with this commit while having
the feature turned off on SQLite to make the iteration easier. I'd also
refer the commit that added support for this feature in the commit
message.

Then I'd include the rest of the changes in a second commit referring to
this ticket.

By the way, you can open a PR with your branch to give it more exposure
and have it run against CI.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:8>

Django

unread,
Jul 19, 2018, 8:26:01 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
| Brown
Type: Uncategorized | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Andrew Brown):

Thanks for the feedback. I'll see what I can come up with for tests in the
next few days when I get some free time.

One hitch in testing this is that even if I write a test to make sure the
can_use_chunked_reads flag is respected, after the proposed second
commit's changes there will be no database backends that set that flag to
false; SQLite is the only backend that sets it to False right now. I could
handle this by temporarily setting that flag for the test perhaps.
Otherwise, that test will always get skipped and there doesn't seem to be
much point to me.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:9>

Django

unread,
Jul 19, 2018, 11:00:28 PM7/19/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
Type: | Brown
Cleanup/optimization | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* needs_better_patch: 0 => 1
* has_patch: 0 => 1
* type: Uncategorized => Cleanup/optimization


Comment:

[https://github.com/django/django/pull/10203 PR]

> One hitch in testing this is that even if I write a test to make sure
the can_use_chunked_reads flag is respected, after the proposed second
commit's changes there will be no database backends that set that flag to
false; SQLite is the only backend that sets it to False right now. I could
handle this by temporarily setting that flag for the test perhaps.
Otherwise, that test will always get skipped and there doesn't seem to be
much point to me.

Temporarily setting the flag in the test is an option but even if no
built-in backends have this flag set to `False` it might be useful for
third-party backends (MSSQL, Firebird) which might not have implemented
this feature yet and run the full test suite to determine coverage.

Another option might be to get rid of the feature flag entirely but that
would require a post to the django-developers mailing list.

[https://github.com/django/django/pull/10203 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:10>

Django

unread,
Jul 20, 2018, 3:07:27 PM7/20/18
to django-...@googlegroups.com
#29563: SQLite and Queryset.iterator() support
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
Type: | Brown
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Andrew Brown):

I'll let others decide if removing the flag is something worth
considering. I've updated the pull request to split it into two commits,
and added a unit test that ensures the can_use_chunked_reads flag is
properly interpreted. It tests that the SQLCompiler.execute_sql() method
returns a list if the flag is False.

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:11>

Django

unread,
Jul 25, 2018, 6:16:47 PM7/25/18
to django-...@googlegroups.com
#29563: Add support for QuerySet.iterator() result streaming on SQLite

-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
Type: | Brown
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: 1 => 0
* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:12>

Django

unread,
Jul 25, 2018, 6:34:46 PM7/25/18
to django-...@googlegroups.com
#29563: Add support for QuerySet.iterator() result streaming on SQLite
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
Type: | Brown
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"55810d94d03728dcad9f53d5b9e21565627aeade" 55810d94]:
{{{
#!CommitTicketReference repository=""
revision="55810d94d03728dcad9f53d5b9e21565627aeade"
Refs #29563 -- Fixed SQLCompiler.execute_sql() to respect
DatabaseFeatures.can_use_chunked_reads.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:13>

Django

unread,
Jul 25, 2018, 6:34:47 PM7/25/18
to django-...@googlegroups.com
#29563: Add support for QuerySet.iterator() result streaming on SQLite
-------------------------------------+-------------------------------------
Reporter: Andrew Brown | Owner: Andrew
Type: | Brown
Cleanup/optimization | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"c0e3c65b9d1b26cfc38137b7666ef0e108aab77f" c0e3c65]:
{{{
#!CommitTicketReference repository=""
revision="c0e3c65b9d1b26cfc38137b7666ef0e108aab77f"
Fixed #29563 -- Added result streaming for QuerySet.iterator() on SQLite.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/29563#comment:14>

Reply all
Reply to author
Forward
0 new messages