Buffer validity after archive_read_data_block() returns ARCHIVE_EOF

15 views
Skip to first unread message

Johan Hattne

unread,
Apr 5, 2024, 5:11:18 AMApr 5
to libarchive-discuss
In case this is still the right forum for this sort of questions (I've searched a bit but found neither alternatives nor the answer to the question in the documention)…

Is the previous buffer invalidated by a call to archive_read_data_block() that returns ARCHIVE_EOF?   In particular, with

// Below returns ARCHIVE_OK, but yields a short read (len much smaller than block_size)
archive_read_data_block(archive, &buf_old, &len_old, &off_old);

// This returns buf_new = NULL, len_new = 0
archive_read_data_block(archive, &buf_new, &len_new, &off_new);

is it safe to use len_old bytes at buf_old at this stage? Looking at advance_file_pointer() and friends in libarchive/archive_read.c makes me think it is, but it does look a bit fishy.

// Best wishes; Johan

Tim Kientzle

unread,
Apr 5, 2024, 12:32:11 PMApr 5
to Johan Hattne, libarchiv...@googlegroups.com


> On Apr 5, 2024, at 2:11 AM, Johan Hattne <hat...@g.ucla.edu> wrote:
>
> In case this is still the right forum for this sort of questions (I've searched a bit but found neither alternatives nor the answer to the question in the documention)…
>
> Is the previous buffer invalidated by a call to archive_read_data_block() that returns ARCHIVE_EOF?

Yes.

> In particular, with
>
> // Below returns ARCHIVE_OK, but yields a short read (len much smaller than block_size)
> archive_read_data_block(archive, &buf_old, &len_old, &off_old);

`archive_read_data_block` does not guarantee that it will return any particular
block size in any particular context. The amount of data returned can
vary depending on how the data was encoded in the archive.
(For example, compression can result in varying-size blocks of
data being returned.)

>
> // This returns buf_new = NULL, len_new = 0
> archive_read_data_block(archive, &buf_new, &len_new, &off_new);
>
> is it safe to use len_old bytes at buf_old at this stage?

No.

`archive_read_data_block` is always allowed to discard any previously-returned data.

(It is not _required_ to do so, but you should never rely on that.)

Tim

Johan Hattne

unread,
Apr 5, 2024, 2:02:42 PMApr 5
to Tim Kientzle, Johan Hattne, libarchiv...@googlegroups.com
Thanks a lot, Tim! Does that also mean there is no way to determine
whether a short read from archive_read_data_block() is due to EOF (like
feof() returns nonzero after fread() yields a short read)?

// Best wishes; Johan

Tim Kientzle

unread,
Apr 5, 2024, 2:44:50 PMApr 5
to Johan Hattne, Johan Hattne, libarchiv...@googlegroups.com
There is no such thing as a “short read” from `archive_read_data_block`.  That function may return any number of bytes for any call at any time.  Indeed, the whole point of `archive_read_data_block` is that it does no unnecessary copying; you’re getting direct access to whatever contiguous data libarchive has internally available at that point.

It sounds like you might want to use `archive_read_data` instead.  This takes a buffer and size — similar to `fread()` — and fills it for you.  That function will return fewer bytes than requested only if there is an error or EOF.   That gives you a nicer API at the cost of additional copying.

Note: Generally, libarchive uses the word “block” just to refer to a chunk of binary data that may be of any size.

Tim

Johan Hattne

unread,
Apr 5, 2024, 5:09:25 PMApr 5
to Tim Kientzle, Johan Hattne, libarchiv...@googlegroups.com
OK, thanks again! Looks like I will rethinking my strategy over the
weekend.

// Best wishes; Johan

Reply all
Reply to author
Forward
0 new messages