write-only workbooks write huge tmpfiles

19 views
Skip to first unread message

Justin Pryzby

unread,
May 16, 2022, 11:11:00 AM5/16/22
to openpyx...@googlegroups.com
The premise of write-only mode is to allow writing an arbitrarily large file
one row at a time, with bounded RAM. Check.

However, it seems to be writing *uncompressed* files to /tmp. Which is
frequently either 1) shared/on the root FS; or, 2) only a few GB in size. It
can be hard to diagnose the cause when /tmp is sporadically low on space for a
minute.

Since xml files compress very well, consider writing them through a
zip/gz/lz4/zstd file stream layer.

Justin

Charlie Clark

unread,
May 16, 2022, 1:00:17 PM5/16/22
to openpyx...@googlegroups.com
On 16 May 2022, at 17:10, Justin Pryzby wrote:

> The premise of write-only mode is to allow writing an arbitrarily large file
>
> one row at a time, with bounded RAM. Check.

Both modes now use the same approach. It's a pity but the zip format doesn't support streaming so we can't simply write straight to the archive.

> However, it seems to be writing *uncompressed* files to /tmp. Which is
>
> frequently either 1) shared/on the root FS; or, 2) only a few GB in size. It
>
> can be hard to diagnose the cause when /tmp is sporadically low on space for a
>
> minute.
>
> Since xml files compress very well, consider writing them through a
>
> zip/gz/lz4/zstd file stream layer.

I understand what you mean but that would add complexity to openpyxl and is really up to the client to configure by, for example, mounting temp files on a compressed drive.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

Justin Pryzby

unread,
May 16, 2022, 11:15:39 PM5/16/22
to openpyx...@googlegroups.com
add_table warns even if table.columns is set.

How are write-only users supposed to avoid the warning ?

Manually call _duplicate_name() and _tables.add() ?
Subclass WriteOnlyWorksheet() and add _get_cell=False ?

def add_table(self, table):
"""
Check for duplicate name in definedNames and other worksheet tables
before adding table.
"""

if self.parent._duplicate_name(table.name):
raise ValueError("Table with name {0} already exists".format(table.name))
if not hasattr(self, "_get_cell"):
warn("In write-only mode you must add table columns manually")
self._tables.add(table)

Justin
Reply all
Reply to author
Forward
0 new messages