How to create and serve a .zip file using data from database?

205 views
Skip to first unread message

mrtn

unread,
Oct 7, 2012, 1:32:01 PM10/7/12
to python-...@googlegroups.com

Inside the get method of a handler, I would like to do the following:

1. query some data store, and get the data
2. create a bunch of .csv files using the data
3. create a .zip file containing all those .csv files
4. serve this .zip file to the client

In addition, I have no need to store .csv or .zip files, so would prefer that by the end of the request there is no file left on the server.

I wonder how should I optimally achieve steps 2, 3, 4 here in Tornado? Thanks!

A. Jesse Jiryu Davis

unread,
Oct 7, 2012, 8:02:41 PM10/7/12
to python-...@googlegroups.com
While it may not be the most efficient implementation, I think a good start is to create a StringIO for each file and pass it as the file object to csv.writer():

http://docs.python.org/library/csv.html#csv.writer

Once you've written all the CSVs to StringIOs, create a ZipFile with another StringIO as its output file:

output = String()
zip = zipfile.ZipFile(output, 'w')

and write each CSV to the zip archive with writestr(my_stringio.getvalue(), ...):

http://docs.python.org/library/zipfile.html#zipfile.ZipFile.writestr

And finalize the ZipFile with close(). Finally:

self.set_header('Content-Type, 'application/zip')
self.write(output.getvalue())
self.finish()

Obviously, this requires having several copies of your data in memory at once. If this isn't a problem then don't sweat it. If it is, you could experiment with streaming results into the zipfile, which would save some overhead, and more interestingly you could try streaming results *out* by passing self.connection.stream into the ZipFile constructor as its output file. If that works then you wouldn't have to keep much data in memory at all.

mrtn

unread,
Oct 8, 2012, 7:44:46 AM10/8/12
to python-...@googlegroups.com

Thanks! That's more or less what I've ended up with.

The only difference is that I don't create a StringIO object for each .csv file, but use a single StringIO object for the zipfile object (passing it into the constructor), and then I just do:

zf.writestr('file1.csv', content1)
zf.writestr('file2.csv', content2)
zf.writestr('file3.csv', content3)
...

The self.connection.stream approach sounds interesting, but what would you pass into self.write() method when you flush the data to the browser client?

A. Jesse Jiryu Davis

unread,
Oct 9, 2012, 10:45:59 AM10/9/12
to python-...@googlegroups.com
The idea is you don't call self.write, but instead let ZipFile do the writing. On second thought this isn't going to work because you need to set a content-length header before you can start writing, so I'm not sure the best way to do what I'm thinking of.
Reply all
Reply to author
Forward
0 new messages