Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#892664: dpkg: Please add support for zstd (Zstandard) compressed packages

789 views
Skip to first unread message

Balint Reczey

unread,
Mar 11, 2018, 5:00:04 PM3/11/18
to
Package: dpkg
Version: 1.19.0.5
Severity: wishlist
Tags: patch

Dear Dpkg Developers,

Please add support for Zstandard compression to dpkg and other
programs generated by the dpkg source package [1].

Tests on packages repackaged with zstd -19 show little increase in
compressed package size compared to xz -6 while decompression speed
decreased dramatically.

For the recompressed firefox .deb (Ubuntu's
firefox_58.0.2+build1-0ubuntu0.17.10.1_amd64.deb) increased ~9% in
size but decompressed in <20% of the original time:

$ du -s firefox-*deb
43960 firefox-xz.deb
47924 firefox-zstd.deb

$ rm -rf firefox-xz/* ;time dpkg-deb -R firefox-xz.deb firefox-xz/
real 0m4,270s
user 0m4,220s
sys 0m0,630s
$ rm -rf firefox-zstd/* ;time dpkg-deb -R firefox-zstd.deb firefox-zstd/
real 0m0,765s
user 0m0,556s
sys 0m0,462s

Tests on the full Ubuntu main archive showed ~6% average increase in
the size of the binary packages.

The patches are also available on Salsa [2].

Cheers,
Balint

--
Balint Reczey
Ubuntu & Debian Developer

[1] http://facebook.github.io/zstd/
[2] https://salsa.debian.org/rbalint/dpkg/commits/zstd
0001-dpkg-Add-Zstandard-compression-support.patch
0002-Add-test-for-zstd-decompression.patch
0003-dpkg-Support-Zstandard-compressed-packages-with-mult.patch
0004-dpkg-Enable-zstd-uniform-compression.patch

Guillem Jover

unread,
Mar 17, 2018, 11:50:03 PM3/17/18
to
Hi!

On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
> Package: dpkg
> Version: 1.19.0.5
> Severity: wishlist
> Tags: patch

> Please add support for Zstandard compression to dpkg and other
> programs generated by the dpkg source package [1].

Thanks. I started implementing this several weeks ago after having
discussed it with Julian Andres Klode on IRC, but stopped after seeing
the implementation getting messy given the current code structure.

In any case, as I mentioned on IRC to Andres, this is something I
pondered about already in 2016, when Paul Wise blogged about it, and
which I also told Andres about at the time when he was adding lz4
support to apt. :) But parked it for later as there were several
apparent problems with it at the time.

So, the items that come to mind (most from the dpkg FAQ [F]:

* Availability in general Unix systems would be one. I think the code
should be portable, but I've not checked properly.
* Size of the shared library another, it would be by far the fattest
compression lib used by dpkg. It's not entirely clear whether the
shlib embeds a zlib library?
* Increase in the (build-)essential set (directly and transitively).
* It also seems the format has changed quite some times already, and
it's probably the reason for the fat shlib. Not sure if the format
has stabilized enough to use this as good long-term storage format,
and what's the policy regarding supporting old formats for example,
given that this is intended mainly to be used for real-time and
streaming content and similar. For example the Makefile for libzstd
defaults to supporting v0.4+ only, which does not look great.
* The license seems fine, as being very permissive, or it could affect
availability. This one I need to add to the FAQ.
* Memory usage seemed fine or slight better depending on the compression
level, but not when wanting equal or less space used?
* Space used seemed worse.
* Compression and decompression speed seemed better depending on the
compression and decompression levels.

[F] <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>

Overall I'm still not sure whether this is worth it. Also the
tradeoffs for stable are different to unstable/testing, or for
fast/slow networks, or long-term storage, one-time installations,
or things like CI and similar.

In any case this would still need discussion on debian-devel, and
involvement from other parts of the project, at least ftp-masters for
example. And whether the added "eternal" support makes sense if we are
or not planning to eventually switch to the compressor as the default,
for example, etc.

> $ rm -rf firefox-xz/* ;time dpkg-deb -R firefox-xz.deb firefox-xz/
> real 0m4,270s
> user 0m4,220s
> sys 0m0,630s
> $ rm -rf firefox-zstd/* ;time dpkg-deb -R firefox-zstd.deb firefox-zstd/
> real 0m0,765s
> user 0m0,556s
> sys 0m0,462s

Right, although that might end up being noise when factored into a
normal dpkg installation, due to the fsync()s, or maintscript
execution, etc.

> Tests on the full Ubuntu main archive showed ~6% average increase in
> the size of the binary packages.

What about the total increase? Because it's not the same say a 15%
increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)



And follows a quick code review, not very deep, given that whether to
include support for this is still open.


> From 79aad733cbc7edd44e124702f82b8a46a3a4aea9 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint...@canonical.com>
> Date: Thu, 8 Mar 2018 09:53:36 +0100
> Subject: [PATCH 1/4] dpkg: Add Zstandard compression support

> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
> index 52e9ce67d..7f898210e 100644
> --- a/dpkg-deb/main.c
> +++ b/dpkg-deb/main.c
> @@ -108,7 +108,7 @@ usage(const struct cmdinfo *cip, const char *value)
> " --[no-]uniform-compression Use the compression params on all members.\n"
> " -z# Set the compression level when building.\n"
> " -Z<type> Set the compression type used when building.\n"
> -" Allowed types: gzip, xz, none.\n"
> +" Allowed types: gzip, xz, zstd, none.\n"
> " -S<strategy> Set the compression strategy when building.\n"
> " Allowed values: none; extreme (xz);\n"
> " filtered, huffman, rle, fixed (gzip).\n"

In theory the proper way to introduce this is to first enable
decompression and then after a full stable release cycle add compression
support.

> diff --git a/lib/dpkg/compress.c b/lib/dpkg/compress.c
> index 44075cdb6..e20add3b7 100644
> --- a/lib/dpkg/compress.c
> +++ b/lib/dpkg/compress.c
> @@ -750,6 +753,127 @@ static const struct compressor compressor_lzma = {
> .decompress = decompress_lzma,
> };
>
> +/*
> + * Zstd compressor.
> + */
> +
> +#define ZSTD "zstd"
> +
> +#ifdef WITH_LIBZSTD
> +
> +static void
> +decompress_zstd(int fd_in, int fd_out, const char *desc)
> +{
> + size_t const buf_in_size = ZSTD_DStreamInSize();

The indentation should be tabs, not spaces here. There are several
other problems with spacing and blank lines, etc in other places.

> + void* const buf_in = malloc(buf_in_size);

m_malloc().

> + size_t const buf_out_size = ZSTD_DStreamOutSize();
> + void* const buf_out = malloc(buf_out_size);
> + size_t init_result, just_read, to_read;
> + ZSTD_DStream* const dstream = ZSTD_createDStream();
> + if (dstream == NULL) {
> + ohshit(_("ZSTD_createDStream() error "));
> + }

Unnecessary braces, and non-obvious error message.

> + /* TODO: a file may consist of multiple appended frames (ex : pzstd).
> + * The following implementation decompresses only the first frame */

The commit fixing this should be folded into this one.

> + init_result = ZSTD_initDStream(dstream);
> + if (ZSTD_isError(init_result)) {
> + ohshit(_("ZSTD_initDStream() error : %s"), ZSTD_getErrorName(init_result));
> + }
> + to_read = init_result;
> + while ((just_read = fd_read(fd_in, buf_in, to_read))) {
> + ZSTD_inBuffer input = { buf_in, just_read, 0 };
> + while (input.pos < input.size) {
> + ZSTD_outBuffer output = { buf_out, buf_out_size, 0 };
> + to_read = ZSTD_decompressStream(dstream, &output , &input);
> + if (ZSTD_isError(to_read)) {
> + ohshit(_("ZSTD_decompressStream() error : %s \n"),
> + ZSTD_getErrorName(to_read));
> + }
> + fd_write(fd_out, output.dst, output.pos);

Return value not checked.

> + }
> + }
> +}

> diff --git a/man/dpkg-source.man b/man/dpkg-source.man
> index 2233d7a8d..991162003 100644
> --- a/man/dpkg-source.man
> +++ b/man/dpkg-source.man
> @@ -176,7 +176,7 @@ Specify the compression to use for created tarballs and diff files
> (\fB\-\-compression\fP since dpkg 1.15.5).
> Note that this option will not cause existing tarballs to be recompressed,
> it only affects new files. Supported values are:
> -.IR gzip ", " bzip2 ", " lzma " and " xz .
> +.IR gzip ", " bzip2 ", " lzma ", " zstd " and " xz .
> The default is \fIxz\fP for formats 2.0 and newer, and \fIgzip\fP for
> format 1.0. \fIxz\fP is only supported since dpkg 1.15.5.
> .TP

> diff --git a/scripts/Dpkg/Compression.pm b/scripts/Dpkg/Compression.pm
> index 3dbc4adf0..4ea512fdc 100644
> --- a/scripts/Dpkg/Compression.pm
> +++ b/scripts/Dpkg/Compression.pm
> @@ -72,6 +72,12 @@ my $COMP = {
> decomp_prog => [ 'unxz', '--format=lzma' ],
> default_level => 6,
> },
> + zstd => {
> + file_ext => 'zst',
> + comp_prog => [ 'zstd', '-q' ],
> + decomp_prog => [ 'unzstd', '-q' ],
> + default_level => 19,
> + },
> xz => {
> file_ext => 'xz',
> comp_prog => [ 'xz' ],

I don't think it makes sense to add support for source packages, as
long as there's no significant usage by upstreams, which I don't think
there is.

> From 9dec1a3f6be2e3d525a92f5a123300618407cb19 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint...@canonical.com>
> Date: Thu, 8 Mar 2018 10:14:30 +0100
> Subject: [PATCH 2/4] Add test for zstd decompression

This should be folded into the main implementation commit.

> From c927d94df0fdc59c25961505a5438b0dfc58710a Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint...@canonical.com>
> Date: Fri, 9 Mar 2018 15:19:43 +0100
> Subject: [PATCH 3/4] dpkg: Support Zstandard compressed packages with multiple
> frames

This should be folded with the main implementation commit.


> From d4b3f22299339f4b54f0013b5f86eff48db1e8c4 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint...@canonical.com>
> Date: Fri, 9 Mar 2018 11:19:24 +0100
> Subject: [PATCH 4/4] dpkg: Enable zstd uniform compression
>
> ---
> dpkg-deb/main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
> index 7f898210e..7a40ecb80 100644
> --- a/dpkg-deb/main.c
> +++ b/dpkg-deb/main.c
> @@ -245,6 +245,7 @@ int main(int argc, const char *const *argv) {
> if (opt_uniform_compression &&
> (compress_params.type != COMPRESSOR_TYPE_NONE &&
> compress_params.type != COMPRESSOR_TYPE_GZIP &&
> + compress_params.type != COMPRESSOR_TYPE_ZSTD &&
> compress_params.type != COMPRESSOR_TYPE_XZ))
> badusage(_("unsupported compression type '%s' with uniform compression"),
> compressor_get_name(compress_params.type));

I'm not sure why this would be split into a different commit, or
introduced at a different point. Older dpkg-deb will either not understand
the data.tar.zst or both the data.tar.zst and the control.tar.zst, so
there's no point in delaying this.

Thanks,
Guillem

Guillem Jover

unread,
Mar 21, 2018, 12:10:02 AM3/21/18
to
On Sun, 2018-03-18 at 04:38:15 +0100, Guillem Jover wrote:
> Thanks. I started implementing this several weeks ago after having
> discussed it with Julian Andres Klode on IRC, but stopped after seeing
> the implementation getting messy given the current code structure.

> In any case, as I mentioned on IRC to Andres, this is something I
> pondered about already in 2016, when Paul Wise blogged about it, and
> which I also told Andres about at the time when he was adding lz4
> support to apt. :) But parked it for later as there were several
> apparent problems with it at the time.

Err, obviously s/Andres/Julian/.

Sorry,
Guillem

Balint Reczey

unread,
Apr 15, 2018, 10:00:04 PM4/15/18
to
Hi Guillem

On Sun, Mar 18, 2018 at 3:38 AM, Guillem Jover <gui...@debian.org> wrote:
> Hi!
>
> On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
>> Package: dpkg
>> Version: 1.19.0.5
>> Severity: wishlist
>> Tags: patch
>
>> Please add support for Zstandard compression to dpkg and other
>> programs generated by the dpkg source package [1].
>
> Thanks. I started implementing this several weeks ago after having
> discussed it with Julian Andres Klode on IRC, but stopped after seeing
> the implementation getting messy given the current code structure.

I think it is not that bad. :-)

> In any case, as I mentioned on IRC to Andres, this is something I
> pondered about already in 2016, when Paul Wise blogged about it, and
> which I also told Andres about at the time when he was adding lz4
> support to apt. :) But parked it for later as there were several
> apparent problems with it at the time.
>
> So, the items that come to mind (most from the dpkg FAQ [F]:
>
> * Availability in general Unix systems would be one. I think the code
> should be portable, but I've not checked properly.

The libzstd package does not have any special dependency and there are
packages for other Unix-like systems [2][3][4].

> * Size of the shared library another, it would be by far the fattest
> compression lib used by dpkg. It's not entirely clear whether the
> shlib embeds a zlib library?

I agree that the libzstd library is fairly big and I'd like to look
into ways of making it leaner, maybe creating a variant with limited
features covering what is needed in dpkg, apt, btrfs-progs and other
system packages.
It does not seem to embed the zlib library, but it offers many
features which may be obsolete for dpkg.

I tried dropping support for legacy file formats for example
(ZSTD_LEGACY_SUPPORT=8) and the size of the library dropped to 382K
from the original 490K.

> * Increase in the (build-)essential set (directly and transitively).

Yes, that's true, while apt also started supporting Zstd and .

> * It also seems the format has changed quite some times already, and
> it's probably the reason for the fat shlib. Not sure if the format
> has stabilized enough to use this as good long-term storage format,
> and what's the policy regarding supporting old formats for example,
> given that this is intended mainly to be used for real-time and
> streaming content and similar. For example the Makefile for libzstd
> defaults to supporting v0.4+ only, which does not look great.

Format stability is a very valid concern and upstream claims the
current format to be stable [5] (since zstd v0.8.1).

> * The license seems fine, as being very permissive, or it could affect
> availability. This one I need to add to the FAQ.
> * Memory usage seemed fine or slight better depending on the compression
> level, but not when wanting equal or less space used?
> * Space used seemed worse.

Yes, space used is worse than with xz compression, but I think the
much better compression and decompression speed would make up for
that.

> * Compression and decompression speed seemed better depending on the
> compression and decompression levels.
>
> [F] <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>
>
> Overall I'm still not sure whether this is worth it. Also the
> tradeoffs for stable are different to unstable/testing, or for
> fast/slow networks, or long-term storage, one-time installations,
> or things like CI and similar.
>
> In any case this would still need discussion on debian-devel, and
> involvement from other parts of the project, at least ftp-masters for
> example. And whether the added "eternal" support makes sense if we are
> or not planning to eventually switch to the compressor as the default,
> for example, etc.

I agree that the tradeoffs are very different for the use cases and
please feel free to bring this topic to debian-devel quoting any part
of my emails.

>
>> $ rm -rf firefox-xz/* ;time dpkg-deb -R firefox-xz.deb firefox-xz/
>> real 0m4,270s
>> user 0m4,220s
>> sys 0m0,630s
>> $ rm -rf firefox-zstd/* ;time dpkg-deb -R firefox-zstd.deb firefox-zstd/
>> real 0m0,765s
>> user 0m0,556s
>> sys 0m0,462s
>
> Right, although that might end up being noise when factored into a
> normal dpkg installation, due to the fsync()s, or maintscript
> execution, etc.

I agree that fsync()s and scripts add more time overall to the
installation time, but fsync()'s effect is decreasing with faster
storage. I would like to look into speeding up maintscript execution.

I need to reproduce my results on latest sid, but when installing big
package-sets like ubuntu-desktop on a server VM the package
decompression was the biggest (~40%) contributor to CPU utilization
and to make a meaningful improvement in that area I think switching to
a very fast decompressor is neccessary.

I think the biggest price we have to pay here is the slower download
of the somewhat bigger compressed packages, but IMO the real solution
here is rolling out DeltaDebs [6] support, which is planned to be an
improvement over debdelta [7]. DeltaDebs could save around 90% of
bandwidth - or download time needed for packages.

Since DeltaDeb generation also involves decompression, Zstd could
speed this up, too.

>
>> Tests on the full Ubuntu main archive showed ~6% average increase in
>> the size of the binary packages.
>
> What about the total increase? Because it's not the same say a 15%
> increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)

Yes, I was not clear enough in my email, the total increase was 6%.

>
> And follows a quick code review, not very deep, given that whether to
> include support for this is still open.
>
>
>> From 79aad733cbc7edd44e124702f82b8a46a3a4aea9 Mon Sep 17 00:00:00 2001
>> From: Balint Reczey <balint...@canonical.com>
>> Date: Thu, 8 Mar 2018 09:53:36 +0100
>> Subject: [PATCH 1/4] dpkg: Add Zstandard compression support
>
>> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
>> index 52e9ce67d..7f898210e 100644
>> --- a/dpkg-deb/main.c
>> +++ b/dpkg-deb/main.c
>> @@ -108,7 +108,7 @@ usage(const struct cmdinfo *cip, const char *value)
>> " --[no-]uniform-compression Use the compression params on all members.\n"
>> " -z# Set the compression level when building.\n"
>> " -Z<type> Set the compression type used when building.\n"
>> -" Allowed types: gzip, xz, none.\n"
>> +" Allowed types: gzip, xz, zstd, none.\n"
>> " -S<strategy> Set the compression strategy when building.\n"
>> " Allowed values: none; extreme (xz);\n"
>> " filtered, huffman, rle, fixed (gzip).\n"
>
> In theory the proper way to introduce this is to first enable
> decompression and then after a full stable release cycle add compression
> support.

I'm OK with that. I'm attaching the updated patch which I also
uploaded to Salsa, addressing your review comments. Feel free to
disable compression in a way you please, if you do it in a separate
commit it could be easily reverted later to enable compression.

>
>> diff --git a/lib/dpkg/compress.c b/lib/dpkg/compress.c
>> index 44075cdb6..e20add3b7 100644
>> --- a/lib/dpkg/compress.c
>> +++ b/lib/dpkg/compress.c
>> @@ -750,6 +753,127 @@ static const struct compressor compressor_lzma = {
>> .decompress = decompress_lzma,
>> };
>>
>> +/*
>> + * Zstd compressor.
>> + */
>> +
>> +#define ZSTD "zstd"
>> +
>> +#ifdef WITH_LIBZSTD
>> +
>> +static void
>> +decompress_zstd(int fd_in, int fd_out, const char *desc)
>> +{
>> + size_t const buf_in_size = ZSTD_DStreamInSize();
>
> The indentation should be tabs, not spaces here. There are several
> other problems with spacing and blank lines, etc in other places.

Fixed.

>
>> + void* const buf_in = malloc(buf_in_size);
>
> m_malloc().
>
>> + size_t const buf_out_size = ZSTD_DStreamOutSize();
>> + void* const buf_out = malloc(buf_out_size);
>> + size_t init_result, just_read, to_read;
>> + ZSTD_DStream* const dstream = ZSTD_createDStream();
>> + if (dstream == NULL) {
>> + ohshit(_("ZSTD_createDStream() error "));
>> + }
>
> Unnecessary braces, and non-obvious error message.

Fixed.

>
>> + /* TODO: a file may consist of multiple appended frames (ex : pzstd).
>> + * The following implementation decompresses only the first frame */
>
> The commit fixing this should be folded into this one.

Done, all commits are squashed.

>
>> + init_result = ZSTD_initDStream(dstream);
>> + if (ZSTD_isError(init_result)) {
>> + ohshit(_("ZSTD_initDStream() error : %s"), ZSTD_getErrorName(init_result));
>> + }
>> + to_read = init_result;
>> + while ((just_read = fd_read(fd_in, buf_in, to_read))) {
>> + ZSTD_inBuffer input = { buf_in, just_read, 0 };
>> + while (input.pos < input.size) {
>> + ZSTD_outBuffer output = { buf_out, buf_out_size, 0 };
>> + to_read = ZSTD_decompressStream(dstream, &output , &input);
>> + if (ZSTD_isError(to_read)) {
>> + ohshit(_("ZSTD_decompressStream() error : %s \n"),
>> + ZSTD_getErrorName(to_read));
>> + }
>> + fd_write(fd_out, output.dst, output.pos);
>
> Return value not checked.

Fixed.
Agreed.
Thank you for your feedback. I hope I could address your concerns and
I'm looking forward to the discussion on debian-devel.

There is already a thread on ubuntu-devel if you are interested [8].

I'm wondering if you are OK with the proposed .deb format (extensions,
etc.), because Ubuntu is very close to releasing 18.04 and if we could
agree on at least the package format Ubuntu's dpkg could add
decompression zstd support without risking diverging later from
Debian.

Cheers,
Balint

--
Balint Reczey
Ubuntu & Debian Developer

[2] https://www.opencsw.org/packages/zstd/
[3] https://brewinstall.org/Install-zstd-on-Mac-with-Brew/
[4] https://www.freshports.org/archivers/zstd/
[5] https://github.com/facebook/zstd/issues/731
[6] https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs
[7] http://debdelta.debian.net/
[8] https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/040211.html
0001-dpkg-Add-Zstandard-compression-and-decompression-sup.patch

Balint Reczey

unread,
Apr 18, 2018, 6:10:03 AM4/18/18
to
Hi Guillem,
By writing "Feel free to disable compression ..." I did not want to
mean pushing the work on you and I would be happy to write that patch
if you think it is needed. I agree that in theory disabling
compression initially would be the cleanest way forward, but
compression left enabled would help running tests with the new
compression more easily on the other hand.

IMO the real commitment to the format starts with shipping the first
zstd-compressed packages in the official archive and this won't happen
without DSA team upgrading the infrastructure to accept binary
packages using that compression.

Also note that people can already hand-craft zstd compressed packages
the way it is done in the added dpkg test.

What do you think? Should compression be disabled initially and if so
would you like me to write the patch for it?
If you would like me to start the discussion here, just ping me on irc
or via email and I'll gladly do it.

>
> There is already a thread on ubuntu-devel if you are interested [8].
>
> I'm wondering if you are OK with the proposed .deb format (extensions,
> etc.), because Ubuntu is very close to releasing 18.04 and if we could
> agree on at least the package format Ubuntu's dpkg could add
> decompression zstd support without risking diverging later from
> Debian.

I'm sorry for getting back to you after you kindly reviewed my
original patches, it was my unfortunate mistake.

Thanks,
Balint

Guillem Jover

unread,
Apr 19, 2018, 11:10:03 PM4/19/18
to
Hi!

On Wed, 2018-04-18 at 11:56:27 +0200, Balint Reczey wrote:
> On Mon, Apr 16, 2018 at 3:51 AM, Balint Reczey
> <balint...@canonical.com> wrote:
> > On Sun, Mar 18, 2018 at 3:38 AM, Guillem Jover <gui...@debian.org> wrote:
> >> On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
> >>> Package: dpkg
> >>> Version: 1.19.0.5
> >>> Severity: wishlist
> >>> Tags: patch
> >>
> >>> Please add support for Zstandard compression to dpkg and other
> >>> programs generated by the dpkg source package [1].
> >>
> >> Thanks. I started implementing this several weeks ago after having
> >> discussed it with Julian Andres Klode on IRC, but stopped after seeing
> >> the implementation getting messy given the current code structure.
> >
> > I think it is not that bad. :-)

Well, that file is a mess already. :)

> >> So, the items that come to mind (most from the dpkg FAQ [F]:
> >>
> >> * Availability in general Unix systems would be one. I think the code
> >> should be portable, but I've not checked properly.
> >
> > The libzstd package does not have any special dependency and there are
> > packages for other Unix-like systems [2][3][4].

Right, as suspected, but it's nice to get confirmation, thanks.

> >> * Size of the shared library another, it would be by far the fattest
> >> compression lib used by dpkg. It's not entirely clear whether the
> >> shlib embeds a zlib library?
> >
> > I agree that the libzstd library is fairly big and I'd like to look
> > into ways of making it leaner, maybe creating a variant with limited
> > features covering what is needed in dpkg, apt, btrfs-progs and other
> > system packages.

That could be an option, ideally sanctioned by upstream to avoid a
perpetual fork, and possible divergence from upstream format, encoding,
etc.

> > It does not seem to embed the zlib library, but it offers many
> > features which may be obsolete for dpkg.
> >
> > I tried dropping support for legacy file formats for example
> > (ZSTD_LEGACY_SUPPORT=8) and the size of the library dropped to 382K
> > from the original 490K.

Still a pretty fat. :)

> >> * Increase in the (build-)essential set (directly and transitively).
> >
> > Yes, that's true, while apt also started supporting Zstd and .

apt is not part of the essential-set though.

> >> * It also seems the format has changed quite some times already, and
> >> it's probably the reason for the fat shlib. Not sure if the format
> >> has stabilized enough to use this as good long-term storage format,
> >> and what's the policy regarding supporting old formats for example,
> >> given that this is intended mainly to be used for real-time and
> >> streaming content and similar. For example the Makefile for libzstd
> >> defaults to supporting v0.4+ only, which does not look great.
> >
> > Format stability is a very valid concern and upstream claims the
> > current format to be stable [5] (since zstd v0.8.1).

I understand that to mean the current format will not change, but what
will happen when and iff a new format is needed/wanted, what's their
stability guarantees, etc.? As I mentioned, one thing is to target
streaming compression, the other long-term storage; the time-frames
expected from each of those might be completely opposite.

> >> * The license seems fine, as being very permissive, or it could affect
> >> availability. This one I need to add to the FAQ.
> >> * Memory usage seemed fine or slight better depending on the compression
> >> level, but not when wanting equal or less space used?
> >> * Space used seemed worse.
> >
> > Yes, space used is worse than with xz compression, but I think the
> > much better compression and decompression speed would make up for
> > that.

That still depends at least on the local hardware used and on the
network speed.

> >> * Compression and decompression speed seemed better depending on the
> >> compression and decompression levels.
> >>
> >> [F] <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>
> >>
> >> Overall I'm still not sure whether this is worth it. Also the
> >> tradeoffs for stable are different to unstable/testing, or for
> >> fast/slow networks, or long-term storage, one-time installations,
> >> or things like CI and similar.
> >>
> >> In any case this would still need discussion on debian-devel, and
> >> involvement from other parts of the project, at least ftp-masters for
> >> example. And whether the added "eternal" support makes sense if we are
> >> or not planning to eventually switch to the compressor as the default,
> >> for example, etc.
> >
> > I agree that the tradeoffs are very different for the use cases and
> > please feel free to bring this topic to debian-devel quoting any part
> > of my emails.

I'll try to do that probably tomorrow. I'll probably start the
conversation and CC you guys, so that you can chime in and fill in any
blanks/details you want to provide.

> >>> $ rm -rf firefox-xz/* ;time dpkg-deb -R firefox-xz.deb firefox-xz/
> >>> real 0m4,270s
> >>> user 0m4,220s
> >>> sys 0m0,630s
> >>> $ rm -rf firefox-zstd/* ;time dpkg-deb -R firefox-zstd.deb firefox-zstd/
> >>> real 0m0,765s
> >>> user 0m0,556s
> >>> sys 0m0,462s
> >>
> >> Right, although that might end up being noise when factored into a
> >> normal dpkg installation, due to the fsync()s, or maintscript
> >> execution, etc.
> >
> > I agree that fsync()s and scripts add more time overall to the
> > installation time, but fsync()'s effect is decreasing with faster
> > storage. I would like to look into speeding up maintscript execution.

Well the best way to speed-up maintscripts is to completely get rid of
them. :)

> > I need to reproduce my results on latest sid, but when installing big
> > package-sets like ubuntu-desktop on a server VM the package
> > decompression was the biggest (~40%) contributor to CPU utilization
> > and to make a meaningful improvement in that area I think switching to
> > a very fast decompressor is neccessary.

CPU utilization does not mean the overall time might be better or worse,
as I'd assume the biggest amount of time here will come from I/O anyway.

> > I think the biggest price we have to pay here is the slower download
> > of the somewhat bigger compressed packages, but IMO the real solution
> > here is rolling out DeltaDebs [6] support, which is planned to be an
> > improvement over debdelta [7]. DeltaDebs could save around 90% of
> > bandwidth - or download time needed for packages.
> >
> > Since DeltaDeb generation also involves decompression, Zstd could
> > speed this up, too.

Right, and while I think the idea is very nice. It still needs to be
seen if for example Debian would be interested in providing those at
all, or if there would be interest for what suites or similar given
the increased mirror usage, etc. I'd not like to tie a decission on
this on something that might or might not happen.

> >>> Tests on the full Ubuntu main archive showed ~6% average increase in
> >>> the size of the binary packages.
> >>
> >> What about the total increase? Because it's not the same say a 15%
> >> increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)
> >
> > Yes, I was not clear enough in my email, the total increase was 6%.

Ok, not that bad then I guess.

> >> In theory the proper way to introduce this is to first enable
> >> decompression and then after a full stable release cycle add compression
> >> support.
> >
> > I'm OK with that. I'm attaching the updated patch which I also
> > uploaded to Salsa, addressing your review comments. Feel free to
> > disable compression in a way you please, if you do it in a separate
> > commit it could be easily reverted later to enable compression.
>
> By writing "Feel free to disable compression ..." I did not want to
> mean pushing the work on you and I would be happy to write that patch
> if you think it is needed. I agree that in theory disabling
> compression initially would be the cleanest way forward, but
> compression left enabled would help running tests with the new
> compression more easily on the other hand.
>
> IMO the real commitment to the format starts with shipping the first
> zstd-compressed packages in the official archive and this won't happen
> without DSA team upgrading the infrastructure to accept binary
> packages using that compression.
>
> Also note that people can already hand-craft zstd compressed packages
> the way it is done in the added dpkg test.
>
> What do you think? Should compression be disabled initially and if so
> would you like me to write the patch for it?

Unfortunately that's not how things work with dpkg. This is a tool
being used beyond Debian (and Ubuntu), so once upstream contains
support for just unpacking, then that means the commitment is already
there, because we have always supported hand-crafting .debs from
"standard" tools, and that's why the format has always been documented
in detail.

This is the same reason why the lzma compression format is still
supported to decompress-only (compression has been obsoleted), while
it was never accepted in the Debian archive. People should be able to
unpack old/historic packages with current dpkg-deb. It even still
supported building and extracting format 0.939000 .deb archives!

If downstreams patch support in for other compressors, I think it might
be a bit irresponsible as it creates unnecessary compatibility problems
in the .deb ecosystem. But then, this is free software and people can
patch in whatever they want. And in the end I consider it not really my
problem. ;) Some derivative for example added .lz support in, which has
never been accepted upstream, of course that's not at the same scale
as Ubuntu, so in this case this might get worse if it ends up not
making sense to add the support upstream, but oh well.

Disabling compression, is mostly to make sure no builds accidentally
try to use it while it's too soon (mostly in Debian), or on the wrong
suite, once and iff DAK accepts them for example. That obviously would
not prevent people from adding the support in other generators, or
hand-crafting .deb's.

> > There is already a thread on ubuntu-devel if you are interested [8].
> >
> > I'm wondering if you are OK with the proposed .deb format (extensions,
> > etc.), because Ubuntu is very close to releasing 18.04 and if we could
> > agree on at least the package format Ubuntu's dpkg could add
> > decompression zstd support without risking diverging later from
> > Debian.

The extension looks fine, that's the standard one used by upstream,
and the one I used too when starting the implementation. When it comes
to the .deb format itself, as I've mentioned before, I make no guarantees
this might get accepted upstream. So the divergence is potentially
something you might (or might not) need to carry forward and possibly
might need/want (or not) to unwind yourselves.

Thanks,
Guillem

Sebastian Andrzej Siewior

unread,
Apr 11, 2020, 7:20:03 PM4/11/20
to
On 2018-03-11 21:51:05 [+0100], Balint Reczey wrote:
> For the recompressed firefox .deb (Ubuntu's
> firefox_58.0.2+build1-0ubuntu0.17.10.1_amd64.deb) increased ~9% in
> size but decompressed in <20% of the original time:

So you are saying that the decompression speed that is the bottleneck
here? I *assumed* that it is mostly the disk speed since I get around 60
to 80MiB/sec out of xz.

> $ du -s firefox-*deb
> 43960 firefox-xz.deb
> 47924 firefox-zstd.deb

48M linux-image-5.5.0-1-amd64_5.5.13-2_amd64.data.tar.xz
54M linux-image-5.5.0-1-amd64_5.5.13-2_amd64.data.tar.19.zstd

766M linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar.xz
901M linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar.19.zstd

zstd -19 -T16
|linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar : Completed in 287.37 sec (cpu load : 1533%)
|
|real 4m47,416s
|user 73m23,825s
|sys 0m2,753s
|

xz -T16
| real 4m15,447s
| user 66m51,572s
| sys 0m3,201s


> $ rm -rf firefox-xz/* ;time dpkg-deb -R firefox-xz.deb firefox-xz/
> real 0m4,270s
> user 0m4,220s
> sys 0m0,630s
> $ rm -rf firefox-zstd/* ;time dpkg-deb -R firefox-zstd.deb firefox-zstd/
> real 0m0,765s
> user 0m0,556s
> sys 0m0,462s

So this looks impressive. Is dpkg-deb also performing sync() on the
output or is the report when the files hit the disk cache? Either way,
should be noticeable on ssd/nvme which write at higher performance.

> Tests on the full Ubuntu main archive showed ~6% average increase in
> the size of the binary packages.

I guess the vast majority of packages are small and hardly increase in
size. The bigger packages then increase more.

> The patches are also available on Salsa [2].

While I read the whole thread here, I did not find any consent other
than discuss it d-devel. Is this still the case?

> Cheers,
> Balint

Sebastian

Balint Reczey

unread,
Jan 8, 2021, 3:00:04 PM1/8/21
to
Control: retitle -1 dpkg: Please add decompression support for zstd
(Zstandard) compressed packages

Hi Guillem,
Sure, I have updated the patch to disable compression and also apply
to the latest dpkg.

I'm wondering if decompression support could be accepted for Bullseye,
to let compression being enabled, too, in Bookworm.

The compression ratios and the compressed sizes did not change much
since my last tests, but many projects adopted zstd support and many
frequently installed packages started depend on it including gcc-10
and libsystemd0:

buster: $ apt-cache rdepends libzstd1 | wc -l
40
sid: $ apt-cache rdepends libzstd1 | wc -l
105

Libzstd did not get leaner, but it will be available on most systems
thus dpkg depending on it would not increase the image/filesystem
size.

The file format stayed stable and other distributions like Fedora and
Arch already adopted it as the default:
https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression
https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/

They use different compression levels, Fedora: 19 Arch: 20 (with
--ultra), but the level can be decided later for Bookworm.
If you are already convinced and add decompression support for
Bullseye that's great and thank you, if not, please share your
remaining concerns.

Cheers,
Balint

PS: Sorry for picking this up so close to the freeze.

> > > There is already a thread on ubuntu-devel if you are interested [8].
> > >
> > > I'm wondering if you are OK with the proposed .deb format (extensions,
> > > etc.), because Ubuntu is very close to releasing 18.04 and if we could
> > > agree on at least the package format Ubuntu's dpkg could add
> > > decompression zstd support without risking diverging later from
> > > Debian.
>
> The extension looks fine, that's the standard one used by upstream,
> and the one I used too when starting the implementation. When it comes
> to the .deb format itself, as I've mentioned before, I make no guarantees
> this might get accepted upstream. So the divergence is potentially
> something you might (or might not) need to carry forward and possibly
> might need/want (or not) to unwind yourselves.
>
> Thanks,
> Guillem

Balint Reczey

unread,
Jan 8, 2021, 3:10:03 PM1/8/21
to
On Fri, Jan 8, 2021 at 8:54 PM Balint Reczey
<balint...@canonical.com> wrote:
>
> Control: retitle -1 dpkg: Please add decompression support for zstd
> (Zstandard) compressed packages

And the updated patch I forgot to attach.
0001-dpkg-Add-Zstandard-compression-and-decompression-sup.patch

Holger Levsen

unread,
Jan 9, 2021, 6:50:04 AM1/9/21
to
On Fri, Jan 08, 2021 at 08:54:01PM +0100, Balint Reczey wrote:
> I'm wondering if decompression support could be accepted for Bullseye,
> to let compression being enabled, too, in Bookworm.

I'd love to see this as well - and wouldn't mind having both for Bullseye ;)


--
cheers,
Holger

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
⠈⠳⣄

"Climate change" is an euphenism. "Global warming" as well.
signature.asc

Balint Reczey

unread,
Jan 10, 2021, 1:30:03 PM1/10/21
to
Hi Sebastian,

On Sun, Apr 12, 2020 at 1:09 AM Sebastian Andrzej Siewior
<seba...@breakpoint.cc> wrote:
>
> On 2018-03-11 21:51:05 [+0100], Balint Reczey wrote:
> > For the recompressed firefox .deb (Ubuntu's
> > firefox_58.0.2+build1-0ubuntu0.17.10.1_amd64.deb) increased ~9% in
> > size but decompressed in <20% of the original time:
>
> So you are saying that the decompression speed that is the bottleneck
> here? I *assumed* that it is mostly the disk speed since I get around 60
> to 80MiB/sec out of xz.

I would not say bottleneck, but a very big contributor to the CPU time
used. Some systems can have very slow IO and very fast CPU, but I
think those typically correlate positively and SSDs are more common
than spinning disks improving the typical IO speed.
Yes, the increase ratio seem to be approximately uniform, looking at
the data posted by Julian, thus bigger packages increase more:
https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/040245.html

> > The patches are also available on Salsa [2].
>
> While I read the whole thread here, I did not find any consent other
> than discuss it d-devel. Is this still the case?

Yes, ideally the divergence should be avoided thus we are waiting if
the package format could be set to stone in Debian.

Samuel Johnston

unread,
Aug 10, 2021, 1:50:03 AM8/10/21
to
The Linux kernel is now supplying modules compressed with zstd. Version updates for 5.12 and 5.13 are unable to be installed anymore on Debian due to this issue being unresolved.

Thanks,
Sam Johnston

On Sun, 10 Jan 2021 19:19:05 +0100 Balint Reczey <balint...@canonical.com> wrote:
> Hi Sebastian,
>
> On Sun, Apr 12, 2020 at 1:09 AM Sebastian Andrzej Siewior
> <seba...@breakpoint.cc> wrote:
> >
> > On 2018-03-11 21:51:05 [+0100], Balint Reczey wrote:
> > > For the recompressed firefox .deb (Ubuntu's
> > > firefox_58.0.2+build1-0ubuntu0.17.10.1_amd64.deb) increased ~9% in
> > > size but decompressed in <20% of the original time:
> >
> > So you are saying that the decompression speed that is the bottleneck
> > here? I *assumed* that it is mostly the disk speed since I get around 60
> > to 80MiB/sec out of xz.
>
> I would not say bottleneck, but a very big contributor to the CPU time
> used. Some systems can have very slow IO and very fast CPU, but I
> think those typically correlate positively and SSDs are more common
> than spinning disks improving the typical IO speed.
>
> > > $ du -s firefox-*deb
> > > 43960 firefox-xz.deb
> > > 47924 firefox-zstd.deb
> >
> >   48M linux-image-5.5.0-1-amd64_5.5.13-2_amd64.data.tar.xz
> >   54M linux-image-5.5.0-1-amd64_5.5.13-2_amd64.data.tar.19.zstd
> >
> >  766M linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar.xz
> >  901M linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar.19.zstd
> >
> > zstd -19 -T16
> > |linux-image-5.5.0-1-amd64-dbg_5.5.13-2_amd64.data.tar : Completed in 287.37 sec  (cpu load : 1533%)
> > |
> > |real    4m47,416s
> > |user    73m23,825s
> > |sys     0m2,753s
> > |
> >
> > xz -T16
> > | real    4m15,447s
> > | user    66m51,572s
> > | sys     0m3,201s
> >
> >
> > > $ rm -rf firefox-xz/* ;time  dpkg-deb -R firefox-xz.deb firefox-xz/
> > > real 0m4,270s
> > > user 0m4,220s
> > > sys 0m0,630s
> > > $ rm -rf firefox-zstd/* ;time  dpkg-deb -R firefox-zstd.deb firefox-zstd/
> > > real 0m0,765s
> > > user 0m0,556s
> > > sys 0m0,462s
> >
> > So this looks impressive. Is dpkg-deb also performing sync() on the
> > output or is the report when the files hit the disk cache? Either way,
> > should be noticeable on ssd/nvme which write at higher performance.
> >
> > > Tests on the full Ubuntu main archive showed ~6% average increase in
> > > the size of the binary packages.
> >

Guillem Jover

unread,
Aug 12, 2021, 1:50:03 PM8/12/21
to
On Tue, 2021-08-10 at 00:37:51 -0500, Samuel Johnston wrote:
> The Linux kernel is now supplying modules compressed with zstd. Version
> updates for 5.12 and 5.13 are unable to be installed anymore on Debian due
> to this issue being unresolved.

I'm not sure how that is related to this report? The kernel must be
compiled with zstd support to be able to load stuff compressed in that
format. That has nothing to do with .debs being compressed with some
specific compressor format. Also just in case I've checked the
upstream packaging support and it does not explicitly set zstd as
compressor when generating .debs.

Regards,
Guillem

Rustam

unread,
Oct 12, 2021, 6:40:02 AM10/12/21
to
Hi Guillem,
Any news on the proposed patch?
Can it be merged already? ;)
Ubuntu packages are already using zstd compression. So tools like Mainline don't work on Debian any more, see e.g. https://github.com/bkw777/mainline/issues/121

Yuri D'Elia

unread,
Nov 27, 2021, 4:00:03 PM11/27/21
to
Package: dpkg
Version: 1.20.9
Followup-For: Bug #892664

I'd also love to see zstd support in dpkg. I'm following several PPAs
that offer daily builds and since ubuntu's migration to zstd-by-default
I can no longer benefit from the pre-built packages.

-- Package-specific info:
System tainted due to merged-usr-via-aliased-dirs.

-- System Information:
Debian Release: bookworm/sid
APT prefers unstable
APT policy: (900, 'unstable'), (800, 'experimental'), (500, 'unstable-debug')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.14.0-4-amd64 (SMP w/8 CPU threads)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages dpkg depends on:
ii libbz2-1.0 1.0.8-4
ii libc6 2.33-0experimental2
ii liblzma5 5.2.5-2
ii libselinux1 3.3-1+b1
ii tar 1.34+dfsg-1
ii zlib1g 1:1.2.11.dfsg-2

dpkg recommends no packages.

Versions of packages dpkg suggests:
ii apt 2.3.13
pn debsig-verify <none>

Tomas Pospisek

unread,
Nov 28, 2021, 10:30:04 AM11/28/21
to
More than that: AFAIU Ubuntu has in fact switched its default compressor
to zstd [1], so Debian's tools haven't been able to understand Ubuntu's
freshly generated packages from 2021-06-14 on.

I have applied [2] Bálint's commit to *current* dpkg from git:

* there was a trivial merge conflict in man/deb.pod, which is easily fixed [3]
* in my dpkg git repo and zstd branch I have changed the patch author
(including the merge conflict fix) back from me to Bálint [3], which
might not be the right/clean way to do things, but that's a minor thing
I can fix if Guillem would want that
* dpkg-buildpackage built the patched package fine
* I only did a smoketest with the resulting dpkg :
`dpkg -x sbsigntool_0.9.4-2ubuntu1_amd64.deb foodir` [4]
which successfully unpacked Ubuntu's zstd compressed sbsigntool package
into the foodir directory

So I am reporting that Bálint's patch [4] applies cleanly (with a
trivially to solve merge conflict (see above)) and works (again see above
for the minimal testing I did), has been in production in Ubuntu since
2021-04-14 and zstd is beeng used as default compressor in Ubtuntu since
2021-06-14.

Of course I would welcome it very much if Debian's tools would be
compatible and allow to work with Ubtuntu's packages. Concrete point
in case: it would have made my life easier figuring out Ubuntu's mechanism
to sign user-generated modules [5].

Thanks a lot to all involved! For Guillem's work on dpkg, Bálint for the
patch and all others for their contributions here and in Ubuntu!!!

Greets,
*t

[1] http://changelogs.ubuntu.com/changelogs/pool/main/d/dpkg/dpkg_1.20.9ubuntu2/changelog
[2] https://salsa.debian.org/tpo/dpkg/-/tree/zstd
[3] https://salsa.debian.org/tpo/dpkg/-/commit/e7cb231bc289d356f563c1e2c761d94c85aa7055
[4] https://packages.ubuntu.com/impish/amd64/sbsigntool/download
[5] https://salsa.debian.org/rbalint/dpkg/-/commit/eb38de93eeb9524a54e80525c480df249828e84f
[6] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939392

Bálint Réczey

unread,
Dec 12, 2021, 9:00:03 AM12/12/21
to
Hi Tomas,
For the record a few Debian packages already work around dpkg's
inability to process zstd-compressed archives.

> Thanks a lot to all involved! For Guillem's work on dpkg, Bálint for the
> patch and all others for their contributions here and in Ubuntu!!!

You are welcome. I hope many users of Ubuntu and Ubuntu derivatives
will enjoy the fast installs and upgrades.

@all: For the record I'm not working on this bug anymore. Feel free to
close it or pick up the work from here.

Cheers,
Balint

Marcel Partap

unread,
Jan 31, 2022, 10:20:03 AM1/31/22
to

Dear Guillem et al.,
don't know whether there already has been that called-for discussion on debian-devel about the technical burden of zstd support in dpkg. But obviously, ubuntu's move to compress debs with zstd by default in July last year (in favor of better UX) has made picking packages from their (or any third-party) repository (like their rtl8821ce-dkms) "a bit more difficult". And it seems somewhat unlikely they will roll back that decision..

So here's the patch rebased again on current git tip.
https://salsa.debian.org/mpartap-guest/dpkg/-/tree/zstd-rebased

Best Regards
Marcel 😅

Marcel Partap

unread,
Feb 18, 2022, 10:30:03 AM2/18/22
to
[…]
Follow-up: forgot to set up linking with zstd lib in Makefile.am, thanks to Bernardo Bandos for picking that one up. Commits on my zstd-rebased branch.
The tests keep failing at `dpkg-deb -c pkg-data-zst.deb`, which fails with
> dpkg-deb (subprocess): ZSTD_decompressStream error : Unknown frame descriptor

My efforts poking at it with gdb have been fruitless. The file can be unpacked just fine with `ar` and the data.tar.zst is valid and can be uncompressed, too.

Any ideas?

Helmut Grohne

unread,
Apr 9, 2022, 1:10:04 PM4/9/22
to
Hi Guillem,

would you maybe reconsider adding zstd decompression support at this
time?

On Sun, Mar 18, 2018 at 04:38:15AM +0100, Guillem Jover wrote:
> So, the items that come to mind (most from the dpkg FAQ [F]:
>
> * Availability in general Unix systems would be one. I think the code
> should be portable, but I've not checked properly.

Given the number of places it has been vendored and used at this time, I
suppose we'd have seen issues. While there is optimized assembly for
x86_64 cpus and uses e.g. __builtin_ctzll, all those uses are carefully
guarded and have portable alternative implementations. Do you see any
particular unixes to watch out here? From a processor architecture pov,
I've never seen issues with zstd in e.g. rebootstrap. (The present
failure for riscv64 likely isn't caused by zstd itself.)

> * Size of the shared library another, it would be by far the fattest
> compression lib used by dpkg. It's not entirely clear whether the
> shlib embeds a zlib library?

What made you think so? Is it the zlibWrapper directory in the source?
That's an api adapter of the gzip interface to the zstd compressor.
The size remains a possible issue otherwise.

> * Increase in the (build-)essential set (directly and transitively).

We're now in a place where libzstd1 is transitively essential.

> * It also seems the format has changed quite some times already, and
> it's probably the reason for the fat shlib. Not sure if the format
> has stabilized enough to use this as good long-term storage format,
> and what's the policy regarding supporting old formats for example,
> given that this is intended mainly to be used for real-time and
> streaming content and similar. For example the Makefile for libzstd
> defaults to supporting v0.4+ only, which does not look great.

Given the state of development and the wide adoption, it would seem
unlikely to me to have it break more compatibility. Also note that there
is a trade-off here between size and compatibility. You cannot have both
a small size and support all ancient formats.

Beyond these cases, I think compatibility also goes the other way round.
If a significant portion of .debs in the wild are compressed using zstd
(and that's what we're seeing), dpkg should be able to decompress them
even if it wasn't the one that introduced them. You care very much about
being able to decompress each and every ancient .deb, but in practice we
also care about decompressing those .debs that currently reside in
Ubuntu's PPAs.

In my personal workflow, I decompress very many packages into tmpfs (or
ram). This is bottle-necked on CPU. In my experience, zstd decompression
is almost 100 times faster than xz decompression. That's a fairly big
improvement. At this time, I'm convinced that zstd is better for the
"compress once, decompress often" use case than xz, which still excels
at "compress once, decompress rarely". I admit that I only get the
benefits if dpkg also supports zstd as a compressor and many relevant
packages switch to it.

So at this point, I think that supporting zstd decompression is
something reasonable to add to dpkg. Please reconsider your decision.

Helmut

Guillem Jover

unread,
Apr 10, 2022, 6:40:03 AM4/10/22
to
Hi!

[ Sorry, have been meaning to update this report, as I've mentioned
when updating directly people that asked about the current state of
it on IRC, but seems I never got to it, besides what was already
covered on the debian-devel mailing list. ]

On Sat, 2022-04-09 at 18:52:07 +0200, Helmut Grohne wrote:
> would you maybe reconsider adding zstd decompression support at this
> time?

I'm always open to reconsideration. :)

> On Sun, Mar 18, 2018 at 04:38:15AM +0100, Guillem Jover wrote:
> > So, the items that come to mind (most from the dpkg FAQ [F]:
> >
> > * Availability in general Unix systems would be one. I think the code
> > should be portable, but I've not checked properly.
>
> Given the number of places it has been vendored and used at this time, I
> suppose we'd have seen issues. While there is optimized assembly for
> x86_64 cpus and uses e.g. __builtin_ctzll, all those uses are carefully
> guarded and have portable alternative implementations. Do you see any
> particular unixes to watch out here? From a processor architecture pov,
> I've never seen issues with zstd in e.g. rebootstrap. (The present
> failure for riscv64 likely isn't caused by zstd itself.)

Right, probably not a concern now, yes.

> > * Size of the shared library another, it would be by far the fattest
> > compression lib used by dpkg. It's not entirely clear whether the
> > shlib embeds a zlib library?
>
> What made you think so? Is it the zlibWrapper directory in the source?
> That's an api adapter of the gzip interface to the zstd compressor.

I don't recall now, but rechecking its source, that looks like what
might have triggered that question. Several of the files f.ex. gzlib.c,
gzread.c, gzwrite.c, etc seems to be locally forked zlib sources. I
don't recall checking (most probably not, so the question above)
whether that was included in the resulting shared library or was
just some kind of "contrib" thing though, or other possibilities.

> The size remains a possible issue otherwise.

Yes.

> > * Increase in the (build-)essential set (directly and transitively).
>
> We're now in a place where libzstd1 is transitively essential.

Ah, thanks, I don't recall this being mentioned before (at least on IRC).

Right, in Debian this is coming now from util-linux (Essential:yes)
depending on libsystemd0 depending on libzstd1.

> > * It also seems the format has changed quite some times already, and
> > it's probably the reason for the fat shlib. Not sure if the format
> > has stabilized enough to use this as good long-term storage format,
> > and what's the policy regarding supporting old formats for example,
> > given that this is intended mainly to be used for real-time and
> > streaming content and similar. For example the Makefile for libzstd
> > defaults to supporting v0.4+ only, which does not look great.
>
> Given the state of development and the wide adoption, it would seem
> unlikely to me to have it break more compatibility.

I'd expect so too, at least now.

> Also note that there
> is a trade-off here between size and compatibility. You cannot have both
> a small size and support all ancient formats.

Sure, but I thought given that this is a property of the state of the
upstream zstd format, it was relevant to mention.

> Beyond these cases, I think compatibility also goes the other way round.
> If a significant portion of .debs in the wild are compressed using zstd
> (and that's what we're seeing), dpkg should be able to decompress them
> even if it wasn't the one that introduced them. You care very much about
> being able to decompress each and every ancient .deb, but in practice we
> also care about decompressing those .debs that currently reside in
> Ubuntu's PPAs.

That's certainly all true, and that's something that is also bothering
me too. :/

As I've mentioned in the past on the debian-devel mailing list
(AFAIR) and/or on IRC, the way I've seen this was that:

- zstd offered a different trade-off on compression and
decompression times vs size, which might be relevant depending on
what is the bottleneck for users or buildds, say network, cpu,
disk, or for "throw-away", "rolling" vs "stable" builds, etc,
where we have to use a single compressor for all cases, instead
of say one per use-case.
- There's xz threaded decompression now merged in liblzma upstream,
and I've got the patches for dpkg ready for it, and I was
investigating the zlib-ng alternative which would somewhat improve
on the speed vs size divide.
- The Ubuntu people went ahead with the divergence anyway, even
after being told there was no guarantee this would be added. This
in a way makes upstream subservient to downsreams diverging the
format. There's for example another downstream that has added .lz
support, of course it does not share the same "widespreadness", but
illustrates the point.
- Adding new compression support, if the needs seem somewhat covered
already by existing ones, implies a maintenance burden practically
for eternity. For example, when the bzip2 upstream situation got
pretty dire, and then it was picked up but with code being
switched to Rust (which at the time implied portability would get
affected), I had to consider whether I'd need to fork a bzip2
project to keep it as functional. The compression landscape has
unfortunately this inherent property that once a clearly superior
format comes along, it leaves behind the carcasses of once thought
most glorious contenders.

But, yes this is a problem now. :(

> In my personal workflow, I decompress very many packages into tmpfs (or
> ram). This is bottle-necked on CPU. In my experience, zstd decompression
> is almost 100 times faster than xz decompression. That's a fairly big
> improvement. At this time, I'm convinced that zstd is better for the
> "compress once, decompress often" use case than xz, which still excels
> at "compress once, decompress rarely". I admit that I only get the
> benefits if dpkg also supports zstd as a compressor and many relevant
> packages switch to it.
>
> So at this point, I think that supporting zstd decompression is
> something reasonable to add to dpkg. Please reconsider your decision.

I'm not thrilled about how this was handled TBH, and the pressure that
has now formed around this. But if both the upcoming threaded xz
decompression support is not going to be satisfactory enough anyway,
and given the ".deb ecosystem" wide divergence, I guess there it
might be reasonable to add this. :/

Depending on the intended use, say local use, then one possibility in
Debian could be to compile the support by using the CLI instead of the
shared library, and making it a Recommends or similar. Of course that
has the additional trade-off of increasing even more the installation
size, and the error condition on missing CLI might need to be improved.

Thanks,
Guillem

Yaroslav Halchenko

unread,
Sep 28, 2022, 9:10:05 PM9/28/22
to
We build both debian and ubuntu backports for http://neuro.debian.net/ .
And then we use https://snapshot.debian.org/ engine to provide snapshots
over our archive, which we run in a Debian environment. Unfortunately
dpkg-deb used by snapshot pukes on Ubuntu packages due to "unknown
compression for member 'control.tar.zst'". I would be eager to see this
issue addressed. Pretty please!

--
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW: http://www.linkedin.com/in/yarik

signature.asc
0 new messages