Why is the performance of archive_write_data so low when using archive_write_add_filter_xz filter

32 views
Skip to first unread message

klop

unread,
Apr 15, 2024, 11:18:53 PMApr 15
to libarchive-discuss
//xz git

//set format and filers
SimpleCompressMethod(".tar.xz", true, archive_write_set_format_pax_restricted, { archive_write_add_filter_xz }),

//read raw files for compression
int ouput_ret = 0;
while (remain_size >= block_size)
{
if (ShouldStop(pWrap))
{
return false;
}

readed = fread(block, 1, block_size, fp);
if (readed == 0)
{
break;
}

remain_size -= readed;
la_ssize_t writed = archive_write_data(a, block, readed);//low performance
if (writed != readed)
{
return false;
}

cur_bytes += readed;
pWrap->pProgress->OnCompressProgress(pWrap->nId, false, cur_bytes, 0);
}


Is it because the lzma algorithm itself has low performance, or is it due to other reasons? Is there any way to improve performance?

Tim Kientzle

unread,
Apr 16, 2024, 11:51:32 PMApr 16
to klop, libarchiv...@googlegroups.com
We would need a lot more information to comment.

What do you mean by “low performance”? Compared to what?

How does it compare if you compress the data with the “xz” program, for example?

What is the value of “block_size”? Sometimes, you can see better performance by using a larger block size.

Tim
> --
> You received this message because you are subscribed to the Google Groups "libarchive-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to libarchive-disc...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/libarchive-discuss/fbf7329f-adea-4ee2-bd3b-11c5446fc0c2n%40googlegroups.com.

klop

unread,
Apr 17, 2024, 4:55:54 AMApr 17
to libarchive-discuss

1.Compare with 7z https://www.7-zip.org/
456.png

2.By comparing the time required to compress a 2.6gb folder into .tar.xz, 7z only takes about 5 minutes, but the libarchive-based implementation (even with xz multi-threading enabled) takes about 10 minutes, if it is single-threaded more than 40 minutes
-----In fact, I also tested the .tar.gz format and compressed the same folder. The speed was very fast and the time was similar to 7z.

After investigation, it was determined that the archive_write_data interface(In fact, archive_write_data will call the lzma_code function, which consumes a lot of time.) consumes a lot of CPU time.

3.m_nBlockSize = 256 * 1024(I tried increasing the block_size, but it didn't work, it was still very slow.)

Tim Kientzle

unread,
Apr 17, 2024, 8:38:27 AMApr 17
to klop, libarchiv...@googlegroups.com
How does that compare with the `xz` utility?

If the time is spent in lzma_code, you should ask the xz developers about the performance, I think.  Libarchive uses the liblzma library, but we don’t develop that library.

Tim

On Apr 17, 2024, at 1:55 AM, klop <klop...@gmail.com> wrote:


1.Compare with 7z https://www.7-zip.org/

Grzegorz Antoniak

unread,
Apr 17, 2024, 8:55:13 AMApr 17
to klop, libarchive-discuss

Just wondering, did you double-check if you're not testing the debug version of liblzma and/or debug version of libarchive against the "release" version of "7z"?

G.

On 17.04.2024 10:55, klop wrote:
1.Compare with 7z https://www.7-zip.org/)

klop

unread,
Apr 17, 2024, 9:02:44 PMApr 17
to libarchive-discuss
Thanks, your suspicion is right, but the 7z I use is the debug version compiled by myself.

I now feel that the performance of liblzma implementation in xz is lower than that of 7z, although it is based on 7z's lzma algorithm

klop

unread,
Apr 17, 2024, 9:12:30 PMApr 17
to libarchive-discuss
xz contains liblzma, which is integrated and compiled into libarchive.

Well, you are right, but because libarchive wants to support .7z and .xz formats, it needs liblzma to provide lzma algorithm support. I am consulting on this forum mainly to ask if anyone else has encountered this problem. or kown about this issue

Grzegorz Antoniak

unread,
Apr 18, 2024, 10:11:26 PMApr 18
to libarchiv...@googlegroups.com

I think that the only meaningful performance measuring technique would be to compare the release versions (that is compile all three projects in release versions: 7z, liblzma and libarchive).

In debug, there are a lot of different things that can impact the result, i.e. `libarchive`/`liblzma` could use different level of debug instrumentation overhead than `7z`, functions expected to be inlined are not inlined, memory allocation can be slower because it's monitored for leaks.

I haven't compared the debug versions of all three projects, but the act of measuring the speed of debug configs could implicitly take into account much more things than just the performance of the compression algorithm itself.

Could you re-check with release versions?

G.

Message has been deleted

klop

unread,
Apr 18, 2024, 11:44:58 PMApr 18
to libarchive-discuss
Thank you for your suggestion.

I used the timer on my phone to compare the approximate time consumed by each of the two release versions.

1. After loading and archiving the 2.6gb folder into .tar through 7z, compress it into .xz through the tool again. It took a total of 1 minute and 25 seconds.

2. Use the packaging tool developed based on libarchive to directly compress the same folder into .tar.xz (libarchive can support formatting and filters at the same time, so there is no need to operate twice). It took a total of 3 minutes and 4 seconds.
-------------------
I have an idea. Is it possible that liblzma is over-encapsulated and has too many levels of function calls, causing the program stack to be pushed in and popped out frequently, seriously affecting performance?


Grzegorz Antoniak

unread,
Apr 19, 2024, 2:04:06 PMApr 19
to libarchiv...@googlegroups.com

Hi,

I've done some tests on my machine (Linux x64). I've compiled libarchive from the master branch using Release build type, and this libarchive uses my system's liblzma. I've located some directory on my disk that has 2 GB and I've used it as my test data.

So in my tests, libarchive (`bsdtar`) and GNU `tar` are actually slightly faster than `7z` to compress my 2GB test dir. I think this suggests that the problem could be somewhere else than between libarchive and liblzma. Maybe you could re-try with `bsdtar` to check the performance of your custom-compiled libarchive+lzma?

----------------

Here are the details of the tests.

    $ du -hd0 /tmp/test-dir/
    2,0G    /tmp/test-dir/

I've used libarchive's `bsdtar` tool to compress the directory to `.tar.xz` (each test is executed 4 times).
    
    $ time TAR_WRITER_OPTIONS="threads=0,compression-level=1" ./bsdtar -J -cf test-bsdtar-c1.tar.xz /tmp/test-dir/*
    186,76s user 1,58s system 1480% cpu 12,720 total
    185,56s user 1,57s system 1483% cpu 12,617 total
    187,15s user 1,59s system 1477% cpu 12,777 total
    185,93s user 1,49s system 1481% cpu 12,654 total
    
Next, I've used GNU tar + xz filter to compress the same directory (no libarchive is used):

    $ time XZ_DEFAULTS="-T0 -1" tar cfJ test-tarJ-c1.tar.xz /tmp/test-dir/*
    188,58s user 3,19s system 1502% cpu 12,761 total
    188,77s user 3,20s system 1510% cpu 12,707 total
    189,54s user 3,19s system 1492% cpu 12,915 total
    188,65s user 3,23s system 1506% cpu 12,740 total
    
Next, I've used "7z" to create the ".tar" file, and "7z" to compress it:
    
    $ time (7z a test-7z-7z-c1.tar /tmp/test-dir/* && 7z a -mx1 test-7z-7z-c1.tar.xz test-7z-7z-c1.tar)
    97,55s user 3,58s system 914% cpu 11,064 total
    96,10s user 3,69s system 934% cpu 10,673 total
    96,74s user 3,56s system 931% cpu 10,767 total
    96,90s user 3,64s system 905% cpu 11,099 total

It's slightly faster, but the resulting file was much larger than before, so I've bumped up the compression level by 1 to match the resulting file to the size `bsdtar` and GNU `tar` produces:
    
    $ time (7z a test-7z-7z-c2.tar /tmp/test-dir/* && 7z a -mx2 test-7z-7z-c2.tar.xz test-7z-7z-c2.tar)
    182,42s user 3,82s system 1070% cpu 17,400 total
    180,57s user 3,80s system 1075% cpu 17,141 total
    180,09s user 3,80s system 1051% cpu 17,489 total
    179,43s user 3,84s system 1036% cpu 17,684 total

Excluding the time needed to create the ".tar" file produces:

    $ 7z a test-7z-7z-c2.tar /tmp/test-dir/* && time 7z a -mx2 test-7z-7z-c2.tar.xz test-7z-7z-c2.tar
    180,59s user 1,43s system 1250% cpu 14,556 total
    181,72s user 1,53s system 1340% cpu 13,669 total
    178,94s user 1,48s system 1371% cpu 13,155 total
    180,35s user 1,54s system 1345% cpu 13,515 total

Using compression level 2, the resulting XZ file was similar to what `bsdtar` and GNU `tar` produces, but also the speed is slower than `bsdtar` and GNU `tar` (probably because of the disk overhead when writing the tar file).

G.

Reply all
Reply to author
Forward
0 new messages