Dear libarchive maintainers/devs,
First of all thanks for this library, it's really simple to use and supports basically all known archive file formats.
I do have a performance issue though: not sure if this has been brought
to your attention, but when one has to scan through an archive and then
extract files/data in a non-linear fashion, the performance is
atrocious.
For example, I have 7z file containing:
/dir/abc.bin
/dir2/cdef.bin
/dir2/xyz.bin
and many more files (10 thousands+). Now, I have to first extract
'/dir2/xyz.bin', then potentially '/dir/abc.bin' and optionally
'/dir2/cdef.bin'.
I have implemented non-linear access this by basically scanning the
whole archive from scratch each time and if I match the name (via regex)
then I extract the file. Basically for each file/regex path I execute
the following pseudo-code:
void extract_file(const char* filename) {
a_ = archive_read_new();
archive_read_support_filter_all(a_);
archive_read_support_format_all(a_);
archive_read_open_filename(a_, "my1GiB.archive.7z", 10240);
while(archive_read_next_header(a_, &entry) == ARCHIVE_OK) {
// this is pseudo code, I'm using a regex etc etc
if(filename == archive_entry_pathname(entry)) {
// use archive_read_data to get data
}
}
// dispose of a_ properly
}
I am not explicitly calling archive_read_data_skip as per notes at
https://github.com/libarchive/libarchive/wiki/Examples#List_contents_of_Archive_stored_in_File
.
Am I doing it 'right'? Isn't there a better way to somehow 'cache' a_ position/entry on the archive stream?
To be honest I understand if the answer is "no", because of the
universal support of all archive (and some I would imagine are only able
to be accessed linearly).
Thanks again for you great work!