Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Zipfile content reading via an iterator?

113 views
Skip to first unread message

Tim Chase

unread,
Dec 11, 2007, 3:14:14 PM12/11/07
to pytho...@python.org
I'm dealing with several large items that have been zipped up to
get quite impressive compression. However, uncompressed, they're
large enough to thrash my memory to swap and in general do bad
performance-related things. I'm trying to figure out how to
produce a file-like iterator out of the contents of such an item.

>>> z = zipfile.zipFile("test.zip")
>>> info = z.getinfo("data.txt")
>>> info.compress_size
132987864
>>> info.file_size
1344250972
>>> len(z.namelist())
20

I need to be able to access multiple files within it, but I can
iterate over each one, only seeing small slices of the file.
Using the read() method triggers the volumnous read. Thus what I
have to do currently:

>>> content = z.read("data.txt") # ouch!
>>> len(content)
1344250972
>>> for row in content.splitlines(): process(row) # pain!

What I'm trying to figure out how to do is something like the
mythical:

>>> for row in z.file_iter("data.txt"): process(row) # aah

to more efficiently handle the huge stream of data.

Am I missing something obvious? It seems like iterating over zip
contents would be a common thing to do (especially when compared
to reading the whole contents...I mean, they're zipped because
they're big! :)

Thanks for any pointers,

-tkc

Gabriel Genellina

unread,
Dec 11, 2007, 7:03:54 PM12/11/07
to pytho...@python.org
En Tue, 11 Dec 2007 17:14:14 -0300, Tim Chase
<pytho...@tim.thechases.com> escribi�:

> I'm dealing with several large items that have been zipped up to
> get quite impressive compression. However, uncompressed, they're
> large enough to thrash my memory to swap and in general do bad
> performance-related things. I'm trying to figure out how to
> produce a file-like iterator out of the contents of such an item.

The Time Machine in action again - that's already done, but in SVN. You
want the new ZipFile.open(filename) method, which returns a file-like
object.
Either wait until the 2.6 release, or get the zipfile.py source from
http://svn.python.org/view/python/trunk/Lib/zipfile.py and see if it works
with the current 2.5 release. (I think it should work OK)

--
Gabriel Genellina

Tim Chase

unread,
Dec 12, 2007, 10:03:12 AM12/12/07
to Gabriel Genellina, pytho...@python.org
Gabriel Genellina wrote:
>> I'm dealing with several large items that have been zipped up to
>> get quite impressive compression. However, uncompressed, they're
>> large enough to thrash my memory to swap and in general do bad
>> performance-related things. I'm trying to figure out how to
>> produce a file-like iterator out of the contents of such an item.
>
> The Time Machine in action again - that's already done, but in SVN. You
> want the new ZipFile.open(filename) method, which returns a file-like
> object.

Thanks! I'll give the 2.6 version a try.

As a side question, is there any catalog of Time Machine items
(instances where folks have asked for a feature only to have the
response be "it's already implemented in the development
version")? I've seen the Time Machine response several times on
c.l.p and it would be interesting to thumb through an archive of
the developers prescience.

-tkc

(sorry if this comes through twice...experienced some mailer
problems)

Gabriel Genellina

unread,
Dec 12, 2007, 10:40:44 AM12/12/07
to pytho...@python.org
En Wed, 12 Dec 2007 12:03:12 -0300, Tim Chase
<pytho...@tim.thechases.com> escribió:

> As a side question, is there any catalog of Time Machine items
> (instances where folks have asked for a feature only to have the
> response be "it's already implemented in the development
> version")? I've seen the Time Machine response several times on
> c.l.p and it would be interesting to thumb through an archive of
> the developers prescience.

Not that I know of. Quoting a message from Duncan Booth "First Python
reference to a time machine I can find is Paul Prescod in April 1998
proposing its construction, although it was obviously fully operational by
September 1998."

http://mail.python.org/pipermail/python-list/2003-July/214868.html

--
Gabriel Genellina

Tim Chase

unread,
Jan 29, 2008, 8:06:12 AM1/29/08
to Gabriel Genellina, pytho...@python.org
>>> I'm dealing with several large items that have been zipped up to
>>> get quite impressive compression. However, uncompressed, they're
>>> large enough to thrash my memory to swap and in general do bad
>>> performance-related things. I'm trying to figure out how to
>>> produce a file-like iterator out of the contents of such an item.
>>
>> The Time Machine in action again - that's already done, but in SVN. You
>> want the new ZipFile.open(filename) method, which returns a file-like
>> object.
>
> Thanks! I'll give the 2.6 version a try.

Just to follow up on this, I dropped the the 2.6 version of
zipfile.py in my project folder (where the machine is currently
running Python2.4), used the ZipFile.open() and it worked fine.
I was able to successfully extract a 960 meg MDB file from the
zip-file. The one thing that did throw me off is that it
rejected specifying that the file be opened as binary:

z = ZipFile('foo.zip')
f = z.open('path/to/file.mdb', 'rb') #failed
f = z.open('path/to/file.mdb') # worked

but just opening with no type specification did allow me to
extract the file successfully. I don't know if I just struck it
lucky with newline/EOF translations, or if it really does do
binary file handling and you don't get a choice of non-binary
file handling.

Anyways, thanks to Gabriel (and all the authors of Python
zipfile.py library) for the solution.

-tkc

Gabriel Genellina

unread,
Jan 29, 2008, 2:57:40 PM1/29/08
to pytho...@python.org
En Tue, 29 Jan 2008 11:06:12 -0200, Tim Chase
<pytho...@tim.thechases.com> escribi�:

> Just to follow up on this, I dropped the the 2.6 version of
> zipfile.py in my project folder (where the machine is currently
> running Python2.4), used the ZipFile.open() and it worked fine.

> [...]


> Anyways, thanks to Gabriel (and all the authors of Python
> zipfile.py library) for the solution.

I just bring attention to the upcoming feature. All credit should go to
the patch author, Alan McIntyre.

--
Gabriel Genellina

0 new messages