bean-bake zip files aren't actually compressed

34 views
Skip to first unread message

Justus Pendleton

unread,
Jan 4, 2019, 1:57:49 AM1/4/19
to Beancount
What compression algorithm should the zipfiles that bean-bake creates use?

I noticed today that zip files created by bean-bake aren't actually compressed. This appears to be a result of 71abb59ec78f where the reliance on an external zip program was replaced with the python zipfile module.

The ZipFile constructor has a keyword parameter *compression* with a default of ZIP_STORED, which means "don't compress". So you need to pass in a keyword argument to actually compress things. Using compression results in my baked files going from ~100MB to 15MB.

The tricky part is that python (and zip) support 3 different compression algorithms and which ones work depend on what modules are installed on the user system. Sure you can *probably* rely on them all being installed these days.....

In my diff, we try LZMA first, then fall back to BZIP2, then fall back to DEFLATE, and finally give up and just use STORED. The Python docs say that LZMA has been included in the ZIP specification since 2006 and BZIP2 since 2001, so it seems like they should be safe to use at this point....Maybe? I have no idea idea how widespread support for LZMA/BZIP2 is in zip apps.

So, is the patch fine like this? Should we just use DEFLATE with zip files and give up hope on using anything better?


diff -r ccc6dff1b7b4 beancount/scripts/bake.py

--- a/beancount/scripts/bake.py Mon Dec 31 18:13:23 2018 +0000

+++ b/beancount/scripts/bake.py Fri Jan 04 13:36:57 2019 +0700

@@ -17,6 +17,15 @@

 import re

 from os import path

 import zipfile

+import importlib

+if importlib.util.find_spec('lzma'):

+    ZIP_COMPRESSION = zipfile.ZIP_LZMA

+elif importlib.util.find_spec('bz2'):

+    ZIP_COMPRESSION = zipfile.ZIP_BZIP2

+elif importlib.util.find_spec('zlib'):

+    ZIP_COMPRESSION = zipfile.ZIP_DEFLATED

+else:

+    ZIP_COMPRESSION = zipfile.ZIP_STORED

 

 import lxml.html

 

@@ -200,7 +209,7 @@

       directory: A string, the name of the directory to archive.

       archive: A string, the name of the file to output.

     """

-    with file_utils.chdir(directory), zipfile.ZipFile(archive, 'w') as archfile:

+    with file_utils.chdir(directory), zipfile.ZipFile(archive, 'w', compression=ZIP_COMPRESSION) as archfile:

         for root, dirs, files in os.walk(directory):

             for filename in files:

                 relpath = path.relpath(path.join(root, filename), directory)

Martin Blais

unread,
Jan 5, 2019, 5:35:51 AM1/5/19
to Beancount

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/de75671d-532a-4deb-bd0f-fd9377e63753%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages