[Django] #26040: Streaming Large CSV Files Example Incorrect

34 views
Skip to first unread message

Django

unread,
Jan 5, 2016, 3:22:39 PM1/5/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-----------------------------+---------------------------------------------
Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: | Version: 1.8
Documentation |
Severity: Normal | Keywords: csv streaming documentation bug
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------+---------------------------------------------
Hi everyone,

The documentation has an example of how to stream large CSV files.

[https://docs.djangoproject.com/en/1.8/howto/outputting-csv/#streaming-
large-csv-files]

This is great but unfortunately the solution is incorrect (at least in
django 1.8 using python 3.4).

Per the documentation for the [https://docs.djangoproject.com/en/1.8/ref
/request-response/#django.http.StreamingHttpResponse StreamingHTTPResponse
class] "It should be given an iterator that yields strings as content."

but csvwriter.writerow returns None, not the result of the file.write call
of the file passed to the csvwriter. The Echo class provided in the
example was a good idea but it doesn't appear to work.

An alternative solution that does work would be:


{{{
def streaming_csv_writer(rows_to_output):
memory_file = StringIO()
writer = csv.writer(memory_file)
for row in rows_to_output:
writer.writerow(row)
memory_file.seek(0)
yield memory_file.read()
memory_file.truncate(0)

response = StreamingHTTPResponse(streaming_csv_writer(rows), ...)
}}}

I'm happy to patch this myself but I wanted to discuss it first before
writing the patch to get some additional opinions and to try to discover a
bit of the history of this documentation example (because I have a feeling
it must have worked at some time in the past).

Django is a great framework and I'm truly grateful to the maintainers and
contributors to the project. You folks rock!

--
Ticket URL: <https://code.djangoproject.com/ticket/26040>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 5, 2016, 3:24:32 PM1/5/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage:
documentation bug | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by przerull):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

as an additional note: The documentation for this example is the same in
django 1.8 and 1.9. I have not yet confirmed that this issue exists for
python2

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:1>

Django

unread,
Jan 5, 2016, 4:10:03 PM1/5/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* stage: Unreviewed => Accepted


Comment:

I confirm that the issue also exists on Python 2, `csv.writer.writerow`
always return `None`.

The initial ticket suggested using a similar implementation to what's
proposed here (#21179) but it was deemed unpythonic.

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:2>

Django

unread,
Apr 10, 2016, 6:13:57 AM4/10/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by claudep):

When you say it doesn't work, is it that the response is empty or simply
that the response is not streamed?

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:3>

Django

unread,
Apr 11, 2016, 10:26:25 AM4/11/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by the-kid89):

Seeing how we are working with "large" amounts of data wouldn't it make it
be a good idea to look at heapq to store the list. You can use the heapq
library inside generators in Python as well so it could even speed things
up a bit more and its supported in all supported versions of Python.

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:4>

Django

unread,
May 30, 2016, 11:57:15 PM5/30/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by berkerpeksag):

* cc: berker.peksag@… (added)


Comment:

Perhaps the example could be removed and the first paragraph could be
updated to mention about using Python generators?

Alternatively, the example could be changed to read from a large CSV file.

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:5>

Django

unread,
Jul 14, 2016, 12:20:07 AM7/14/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by berkerpeksag):

Note that there is an open issue about changing the return value of
`DictWriter.writeheader()` at Python issue tracker:
https://bugs.python.org/issue27497 The example at
https://docs.djangoproject.com/en/dev/howto/outputting-csv/#streaming-
large-csv-files was also mentioned in that discussion.

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:6>

Django

unread,
Aug 18, 2016, 8:50:46 PM8/18/16
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------

Reporter: przerull | Owner: nobody
Type: Bug | Status: new
Component: Documentation | Version: 1.8
Severity: Normal | Resolution:
Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

The report never said why the example doesn't work. The example works for
me on both Python 2.7 and Python 3.5, at least the CSV output looks fine.
Is the problem that the response isn't streamed? If so, how do you test
that?

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:7>

Django

unread,
Apr 14, 2019, 7:23:45 AM4/14/19
to django-...@googlegroups.com
#26040: Streaming Large CSV Files Example Incorrect
-------------------------------------+-------------------------------------
Reporter: Philip Zerull | Owner: nobody
Type: Bug | Status: closed
Component: Documentation | Version: 1.8
Severity: Normal | Resolution: invalid

Keywords: csv streaming | Triage Stage: Accepted
documentation bug |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Daniel Hepper):

* status: new => closed
* resolution: => invalid


Comment:

As Tim Graham noted, the example does in fact work, so I will close this
as invalid.

The key in the example is the Echo class:

If you look at the
[https://github.com/python/cpython/blob/a24107b04c1277e3c1105f98aff5bfa3a98b33a0/Modules/_csv.c#L1241-L1243
source of writer.writerow], you can see that it calls writeline and
returns its result. `writeline` is a
[https://github.com/python/cpython/blob/a24107b04c1277e3c1105f98aff5bfa3a98b33a0/Modules/_csv.c#L1383
reference to the write method of the file object passed to writer on
instantiation]. Now, the `write` method of a `file` object does indeed not
return anything, but the `write` method of the `Echo` object used in the
example does.

--
Ticket URL: <https://code.djangoproject.com/ticket/26040#comment:8>

Reply all
Reply to author
Forward
0 new messages