Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Debian choice of upstream tarballs for packaging

9 views
Skip to first unread message

Paul Wise

unread,
Aug 15, 2021, 9:20:03 PM8/15/21
to
Hi all,

I noticed that sometimes Debian's choice of upstream source for
packaging can be suboptimal. This is especially apparent for the
different per-language upstream packaging ecosystems[1], where the
upstream packaging differs from the upstream VCS in some significant
ways, including missing files, prebuilt files, embedded copies etc.

While the upstream VCS also sometimes has these issues, it is often
much less problematic than the upstream packaging ecosystems.

I'd like to suggest that we standardise on the upstream VCS for our
orig.tar.gz files and phase out use of upstream packaging ecosystems.

For packages where the upstream packaging uses a tarball and it is
accompanied by an OpenPGP signature, and the differences between the
upstream VCS and the tarball aren't too problematic, we could use the
tarball in preference to the VCS due to having a signature.

While discussing PyPI with the Python team, it was pointed out that
sometimes the tarball contains things that cannot be regenerated from
just the VCS snapshot, such as information stored in the VCS history,
so perhaps the recommendation should be to prefer the VCS but always
compare the VCS with upstream tarballs and packaging ecosystem tarballs
using diffoscope in order to discover differences important to Debian.

I'd also like to see upstream tarball export systems switch to plain
VCS exports plus additional tarballs for files like autotools cruft.

If there is rough consensus we could add lintian complaints when Debian
watch files or copyright source locations refer to these ecosystems.

1. the ecosystems I'm talking about include cargo, npm, browser
extensions, rubygems, pypi, CPAN etc.

--
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc

Sam Hartman

unread,
Aug 16, 2021, 9:40:04 AM8/16/21
to
>>>>> "Paul" == Paul Wise <pa...@debian.org> writes:

Paul> Hi all, I noticed that sometimes Debian's choice of upstream
Paul> source for packaging can be suboptimal. This is especially
Paul> apparent for the different per-language upstream packaging
Paul> ecosystems[1], where the upstream packaging differs from the
Paul> upstream VCS in some significant ways, including missing
Paul> files, prebuilt files, embedded copies etc.

Paul> While the upstream VCS also sometimes has these issues, it is
Paul> often much less problematic than the upstream packaging
Paul> ecosystems.

Paul> I'd like to suggest that we standardise on the upstream VCS
Paul> for our orig.tar.gz files and phase out use of upstream
Paul> packaging ecosystems.

I support moving in this direction at least as a strong recommendation.
I think that there will be cases (like the cases you discuss and I
snipped) where using the tarball will be important.
And so if maintainers have a justification for preferring the tarball
rather than VCS, that should be permitted.

But the VCS is a lot more convenient and definitive for most operations.

The types of standardization we're talking about here have value even if
there are exceptions.
So I think it is valuable to move in that direction even if we cannot
get there 100%

I don't think it should block such standardization, but it might be
valuable to have a way to represent the signed git tag or commit we're
using as an upstream. I understand that the verification process would
be different than for an upstream tarball. You'd effectively have to
grab the tree for that tag, verify the signature, and then compare the
contents of the tree to the contents of the vcs-based tarball.
I don't want to see signatures stand in the way of us preferring vcs
long-term.

--Sam

Sean Whitton

unread,
Aug 16, 2021, 4:20:03 PM8/16/21
to
Hello,

On Mon 16 Aug 2021 at 09:18AM +08, Paul Wise wrote:

> I noticed that sometimes Debian's choice of upstream source for
> packaging can be suboptimal. This is especially apparent for the
> different per-language upstream packaging ecosystems[1], where the
> upstream packaging differs from the upstream VCS in some significant
> ways, including missing files, prebuilt files, embedded copies etc.
>
> While the upstream VCS also sometimes has these issues, it is often
> much less problematic than the upstream packaging ecosystems.
>
> I'd like to suggest that we standardise on the upstream VCS for our
> orig.tar.gz files and phase out use of upstream packaging ecosystems.

I agree with this, and already do it for all or almost all of the
packages I maintain. There will probably need to be lots of exceptions,
however. Perhaps "recommended" in Policy?

I wrote the git-deborig tool in devscripts to make it easy to generate
the orig.tar files. For almost every upload I just type either 'git
deborig', or for -2, 'origtargz'.

--
Sean Whitton
signature.asc

Pirate Praveen

unread,
Aug 16, 2021, 4:30:03 PM8/16/21
to


On 17/08/21 1:43 am, Sean Whitton wrote:
> I agree with this, and already do it for all or almost all of the
> packages I maintain. There will probably need to be lots of exceptions,
> however.

Many node modules don't tag their releases so its really hard to get
exact source code corresponding to an npmjs.com release. We have to
search for hints in commit messages to find the correct commit and then
take the snapshot of that commit.

Also with mono repos becoming more popular (many modules are developed
in the same git repo with each module having a different version but
there is no way to get tarballs of individual modules), now we not only
need to download tarballs corresponding to tags and then exclude all the
other modules we don't need from the monorepo tarball.

signature.asc

Paul Wise

unread,
Aug 17, 2021, 2:50:03 AM8/17/21
to
On Mon, Aug 16, 2021 at 8:25 PM Pirate Praveen wrote:

> Many node modules don't tag their releases so its really hard to get
> exact source code corresponding to an npmjs.com release.

It is probably worth filing upstream issues when you discover that.

> Also with mono repos becoming more popular (many modules are developed
> in the same git repo with each module having a different version but
> there is no way to get tarballs of individual modules), now we not only
> need to download tarballs corresponding to tags and then exclude all the
> other modules we don't need from the monorepo tarball.

Could you package the monorepo instead of each module?

Paul Wise

unread,
Aug 17, 2021, 3:00:04 AM8/17/21
to
On Mon, Aug 16, 2021 at 1:19 AM Paul Wise wrote:

> 1. the ecosystems I'm talking about include cargo, npm, browser
> extensions, rubygems, pypi, CPAN etc.

Examples of what current Debian practices are for these ecosystems:

(Amost?) all rust-* packages come from crates.io.

Many/most browser extensions come from the .xpi files instead of the
source repo.

Probably most Ruby and Perl packages come from rubygems/CPAN.

The Python packages often come from PyPI, but not always, there is a
desire amongst some parts of the team to change this.

Pirate Praveen

unread,
Aug 17, 2021, 4:40:03 AM8/17/21
to


2021, ഓഗസ്റ്റ് 17 12:18:00 PM IST, Paul Wise <pa...@debian.org>ൽ എഴുതി
>On Mon, Aug 16, 2021 at 8:25 PM Pirate Praveen wrote:
>
>> Many node modules don't tag their releases so its really hard to get
>> exact source code corresponding to an npmjs.com release.
>
>It is probably worth filing upstream issues when you discover that.

We do file issues but response is not guaranteed.

>> Also with mono repos becoming more popular (many modules are developed
>> in the same git repo with each module having a different version but
>> there is no way to get tarballs of individual modules), now we not only
>> need to download tarballs corresponding to tags and then exclude all the
>> other modules we don't need from the monorepo tarball.
>
>Could you package the monorepo instead of each module?
>

Sometimes we do but it has the risk of packaging
unleased changes. So it is similar to packaging git main branch.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Luca Boccassi

unread,
Aug 17, 2021, 6:10:03 AM8/17/21
to
Some monorepos like src:python-azure (
https://github.com/Azure/azure-sdk-for-python/ ) are such an
unsalvageable mess that different modules from the same monorepo depend
on each other, but a given monorepo commit rarely has compatible,
coherent versions checked in. It can and does happen all the time that
module A depends on module B and C, but at commit 12345 B is compatible
but C is not, and at commit 54321 C is compatible but B is not.

And the alternative of using pypi as upstream is of course a no-go,
given how it's a malware-infested dump. With hundreds of modules in the
monorepo, I can't possibly manually check every time that some of the
names haven't been taken over by typo-squatters or suchlike (yes,
sometimes the module names in the monorepo are different from the
module names uploaded to pypi).

--
Kind regards,
Luca Boccassi
signature.asc

Simon Richter

unread,
Aug 17, 2021, 8:20:03 AM8/17/21
to
Hi,

On 8/16/21 3:18 AM, Paul Wise wrote:

> I'd like to suggest that we standardise on the upstream VCS for our
> orig.tar.gz files and phase out use of upstream packaging ecosystems.

This is also an additional burden on package maintainers: explaining how
they arrived at that particular "upstream" package in a reproducible
way, and why what we ship as "orig" is different from upstream, and what
the copyright and licensing situation for that derived work is.

Upstream projects have gotten a lot sloppier in how they cut releases,
that is true, and that is making packaging more difficult as we need to
disable mechanisms and embedded code copies that were included for our
"convenience."

Rather than accept defeat, I'd like Debian to push upstreams more
aggressively for higher quality releases, and also to make judgement
calls on whether a particular package is even suitable for a stable
release instead of assuming that by default.

For a package to be included in Debian, it must be possible to maintain
it. Working with an upstream that focuses solely on users that install
through some other means is difficult in two ways, because neither our
processes nor our release schedule are supported by them, so from a
project management standpoint, our use case is "out of scope."

Without even minimal support from upstream, the Debian maintainer will
have to do a lot of extra work. We ship a lot of packages in stable that
are not supported by upstream, and every bug report to the upstream BTS
will be immediately closed with "upgrade to the latest version to see if
that works better", and if we want to support our users properly, the
maintainers of those packages get stuck with forward-porting bug
reproduction and back-porting fixes.

Part of being a maintainer is to communicate our needs to upstream, and
work with them on solutions. If an entire ecosystem is dysfunctional and
will not produce releases we can work with, it makes a lot more sense to
me to push, as a project, for a change of the ecosystem rather than
saddling individual maintainers with it.

> I'd also like to see upstream tarball export systems switch to plain
> VCS exports plus additional tarballs for files like autotools cruft.

I think that autotools is one of the few systems that *don't* cause
issues, because it encourages following proper release procedures, where
the version number needs to be added to a file, all files need to be
properly accounted for in the build system, out-of-tree builds need to
work with all testcases passing, and the installation and uninstallation
procedures need to work in order for "make distcheck" to succeed.

Encouraging upstreams to generate releases directly from VCS would
likely reduce quality here.

The problematic ecosystems are those that are aimed at releasing into
the ecosystem only and using the ecosystem's package manager instead of
a system package manager. This is, again, not something that individual
package managers can or should work around, but something we need to
address at a structural level, for example by creating a policy that
allows these package managers to coexist with ours in a sensible way.

This is likely to be different for different ecosystems. CPAN is rather
"traditional" in its processes, many packages move slowly and upstream
developers have enough insight into their packages to be able to tell
whether a bug report against an older version is still valid, so
(re-)packaging these is possible and provides value, while something
fast-moving that is only available as a daily snapshot with no
interoperability testing against different versions of dependencies
might be better off in a package manager that is built for that kind of
thing.

Simon

OpenPGP_signature

Jonas Smedegaard

unread,
Aug 17, 2021, 10:00:03 AM8/17/21
to
Quoting Simon Richter (2021-08-17 14:17:05)
> Rather than accept defeat, I'd like Debian to push upstreams more
> aggressively for higher quality releases, and also to make judgement
> calls on whether a particular package is even suitable for a stable
> release instead of assuming that by default.
>
> For a package to be included in Debian, it must be possible to
> maintain it.
[...]
> Part of being a maintainer is to communicate our needs to upstream,
> and work with them on solutions.

Well said!

All of it, not only above highlighed snippets.


- Jonas

--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/

[x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc

Antonio Terceiro

unread,
Aug 17, 2021, 10:20:03 AM8/17/21
to
On Tue, Aug 17, 2021 at 06:51:35AM +0000, Paul Wise wrote:
> On Mon, Aug 16, 2021 at 1:19 AM Paul Wise wrote:
>
> > 1. the ecosystems I'm talking about include cargo, npm, browser
> > extensions, rubygems, pypi, CPAN etc.
>
> Examples of what current Debian practices are for these ecosystems:
[...]
> Probably most Ruby and Perl packages come from rubygems/CPAN.

On Ruby it's very common to switch from rubygems to the upstream git
repo due to missing test files etc, and that's usually non-problematic.
IIRC we already discussed making that the default, it's just not
implemented yet.
signature.asc

Marc Haber

unread,
Aug 17, 2021, 1:00:05 PM8/17/21
to
On Tue, 17 Aug 2021 15:52:27 +0200, Jonas Smedegaard <jo...@jones.dk>
wrote:
>Quoting Simon Richter (2021-08-17 14:17:05)
>> Rather than accept defeat, I'd like Debian to push upstreams more
>> aggressively for higher quality releases, and also to make judgement
>> calls on whether a particular package is even suitable for a stable
>> release instead of assuming that by default.
>>
>> For a package to be included in Debian, it must be possible to
>> maintain it.
>[...]
>> Part of being a maintainer is to communicate our needs to upstream,
>> and work with them on solutions.
>
>Well said!
>
>All of it, not only above highlighed snippets.

I do mostly agree with that, but we also need to take into account
that people usually don#t want explanation, they want what they want
to work without hassle, and that they'll readily ditch Debian in favor
or some other distribution that claims to be more easy to use.

There is, for example, one distribution that is based on Ubuntu (maybe
they thought that ubuntu would be too hard to install) and does not
support upgrades. Their FAQ says "we cannot support upgrades because
we're based on Debian which doesnt support upgrades".

Greetings
Marc
--
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber | " Questions are the | Mailadresse im Header
Mannheim, Germany | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834

Jonas Smedegaard

unread,
Aug 17, 2021, 1:20:04 PM8/17/21
to
Quoting Marc Haber (2021-08-17 18:56:59)
> On Tue, 17 Aug 2021 15:52:27 +0200, Jonas Smedegaard <jo...@jones.dk>
> wrote:
> >Quoting Simon Richter (2021-08-17 14:17:05)
> >> Rather than accept defeat, I'd like Debian to push upstreams more
> >> aggressively for higher quality releases, and also to make
> >> judgement calls on whether a particular package is even suitable
> >> for a stable release instead of assuming that by default.
> >>
> >> For a package to be included in Debian, it must be possible to
> >> maintain it.
> >[...]
> >> Part of being a maintainer is to communicate our needs to upstream,
> >> and work with them on solutions.
> >
> >Well said!
> >
> >All of it, not only above highlighed snippets.
>
> I do mostly agree with that, but we also need to take into account
> that people usually don#t want explanation, they want what they want
> to work without hassle, and that they'll readily ditch Debian in favor
> or some other distribution that claims to be more easy to use.
>
> There is, for example, one distribution that is based on Ubuntu (maybe
> they thought that ubuntu would be too hard to install) and does not
> support upgrades. Their FAQ says "we cannot support upgrades because
> we're based on Debian which doesnt support upgrades".

Sorry, I fail to understand what was the point of last paragraph above.
Would you mind spelling it out to me?

I mean, I doubt that you simply wanted to say that some derivatives of
Debian don't understand Debian and therefore waste time reinventing
wheels...
signature.asc

Sune Vuorela

unread,
Aug 17, 2021, 2:30:03 PM8/17/21
to
On 2021-08-16, Paul Wise <pa...@debian.org> wrote:
> While discussing PyPI with the Python team, it was pointed out that
> sometimes the tarball contains things that cannot be regenerated from
> just the VCS snapshot, such as information stored in the VCS history,
> so perhaps the recommendation should be to prefer the VCS but always
> compare the VCS with upstream tarballs and packaging ecosystem tarballs
> using diffoscope in order to discover differences important to Debian.

At least some upstreams I'm involved in is having translations managed
different from source code, and only pulled into the tarball at tarball
generation time (though it might be committed to the git tag)

Though this is not part of the "packaigng ecosystems", we should ensure
that recommendations aren't too specific.

/Sune

Paul Wise

unread,
Aug 17, 2021, 11:10:02 PM8/17/21
to
On Tue, Aug 17, 2021 at 12:17 PM Simon Richter wrote:

> This is also an additional burden on package maintainers: explaining how
> they arrived at that particular "upstream" package in a reproducible way

Debian explaining how we arrived at a particular orig.tar.gz is well
established; use a debian/watch file. It supports accessing git
repositories directly.

> and why what we ship as "orig" is different from upstream, and what
> the copyright and licensing situation for that derived work is.

I see it another way, the upstream packages/tarballs are usually a
derived work of their VCS, adding cruft that should not be there and
removing files that should be there.

The fundamental problem is that the packages/tarballs are seen more as
something for end-users (who are often developers) to run (or
sometimes build) than for people building from source or for distros.
So upstream packages/tarballs end up as a mix of source and binary
packages. So these tarballs/packages are of a fundamentally different
nature to Debian source packages and are for an entirely different
audience than Debian package maintainers, who are doing the same
source -> binary packaging job as upstream package ecosystems, but for
the Debian packaging format instead of different formats for different
ecosystems. In the case of Firefox XPI packages, they are even more
like Debian binary packages, and yet Debian is using XPI packages as
our source packages for webext-* packages.

I agree with much of the remainder of your mail, but the world of
Debian, FLOSS, software and technology in general has disillusioned me
enough that I believe the efforts at improving upstream
packages/tarballs/ecosystems you suggest will mostly be futile, which
is why I suggest giving up on improving anything except the upstream
VCS. Even that is going to be hard to improve, many upstreams will
refuse to remove build system cruft, generated files, (modified)
embedded code copies and so on.

In both approaches, the first step is for Debian maintainers to
routinely compare the upstream VCS with the chosen Debian upstream
tarball. I tend to either use diffoscope or unpack and use a graphical
tree diff tool like meld. Once the comparison is done, the options are
to either switch to the VCS (as I suggest) or discuss the differences
with upstream (as you suggest). I encourage everyone to at least think
about doing the VCS comparison when adding new upstream releases to
Debian and choosing one or the other path for dealing with the
differences.

Simon Josefsson

unread,
Aug 18, 2021, 4:20:03 AM8/18/21
to
Paul Wise <pa...@debian.org> writes:

> Hi all,
>
> I noticed that sometimes Debian's choice of upstream source for
> packaging can be suboptimal. This is especially apparent for the
> different per-language upstream packaging ecosystems[1], where the
> upstream packaging differs from the upstream VCS in some significant
> ways, including missing files, prebuilt files, embedded copies etc.
>
> While the upstream VCS also sometimes has these issues, it is often
> much less problematic than the upstream packaging ecosystems.

While I agree with the points you raise, and think I agree with your
overall goal, I see some problems with using upstream VCS as a source
for Debian packaging:

1) Trust paths. Some upstreams sign release tarballs with an OpenPGP
release key that Debian trust for making releases. Not all upstream
uses the same key to sign VCS tags/commits, and not all upstreams sign
VCS tags/commits at all. While Debian can encourage and promote new
policies for upstream here, I don't think we are in a position to
require any uniform set of rules. Signing tarballs is the current
established best practice -- moving to VCS builds needs a set of new
schemes to be established and deployed, and I don't see any single
universal solution today.

2) Bootstrapping projects from VCS is complex and requires additional
tools, and I think the Debian packaging process is well suited for this.
Two examples that I have run into:

2a) Gnulib. Several GNU-related projects import files from gnulib
during VCS bootstrapping, and the way this happens is different for
different projects. The correct version of the files must be imported
in the right way for things to work, and knowledge of which gnulib
version is used is not always present in VCS but only in a released
tarball. How would this work when packaged in Debian? A debian
package containing the gnulib git repository could be added, to allow
source packages to checkout the right version during build.

2b) Cross-compilation and dependency cycles. Bootstrapping from VCS
may require a lot of tools that are optional when building from
tarball, and in my experience the complete set of tools to bootstrap a
project is rarely added as Build-Dep to Debian packages. I feel some
additional package build dependency mechanism would help here: maybe a
Build-Bootstrap-Dep header to list the tools needed to generate a
Debian source package? And Build-Dep could list the tools needed to
build Debian binary packages from the Debian source package. I admit
my understanding of the Debian packaging system is quite limited
though.

3) Bootstrappable builds. I think the underlying goal when it comes to
building from VCS may be to achieve bootstrappable builds -- see
https://www.bootstrappable.org/ -- however it seems to me that a lot of
care has to be taken when moving from tarball builds to VCS builds so we
don't make it harder to re-bootstrap the entire toolchain. For example,
building GNU Coreutils from a tarball works fine in extremely old
environments, but building GNU Coreutils from VCS requires modern tools,
and perhaps some of them doesn't support older environments any more.

/Simon
signature.asc

Simon Richter

unread,
Aug 18, 2021, 5:00:03 AM8/18/21
to
Hi,

On 8/18/21 5:04 AM, Paul Wise wrote:

>> This is also an additional burden on package maintainers: explaining how
>> they arrived at that particular "upstream" package in a reproducible way

> Debian explaining how we arrived at a particular orig.tar.gz is well
> established; use a debian/watch file. It supports accessing git
> repositories directly.

Yes, but it needs to be explained on a per-package basis, especially if
there is an upstream .tar.gz. Debian has historically shipped bitwise
identical files from upstream, and has been lauded for that as it makes
verification easy.

>> and why what we ship as "orig" is different from upstream, and what
>> the copyright and licensing situation for that derived work is.

> I see it another way, the upstream packages/tarballs are usually a
> derived work of their VCS, adding cruft that should not be there and
> removing files that should be there.

I am talking from a legal point of view. We would be creating a derived
work from upstream VCS that is different than the official upstream
release, and then claim this to be the "original" source.

There is a reason we highlight removal of files for licensing reasons in
the file name with a large "dfsg" marker: to indicate that this is a
derived work. If we were to prefer upstream VCS to upstream release
tarballs, I'd expect a similar marker.

Simon

OpenPGP_signature

Paul Wise

unread,
Aug 18, 2021, 11:10:03 PM8/18/21
to
On Wed, Aug 18, 2021 at 8:10 AM Simon Josefsson wrote:

> 1) Trust paths.

Agreed, this is the main exception I mentioned when starting this thread.

> 2a) Gnulib.

Presumably upstream could be convinced to encode this information into
the VCS, perhaps into the standard autogen script that is usually run
before running ./configure for autotools projects, although Debian
just uses autoreconf by default. Or the debian/rules could have this
information added. Although hopefully the existing gnulib package
would just work. Since gnulib is meant as an embedded code copy IIRC,
another alternative would be to add the requisite files as a second
tarball containing the desired parts of gnulib, with component gnulib:
foo_1.2.3.orig-gnulib.tar.gz.

> 2b) Cross-compilation and dependency cycles.

The build profiles mechanism was invented to solve dependency cycles.
Unfortunately the Debian package build/storage infrastructure doesn't
yet support building non-default build profiles, so you have to
bootstrap manually on your own systems and or the porterboxen and then
upload the resulting binaries. We would need a way for maintainers to
specify the desired build profiles, support for storing non-default
build profiles in a separate archive (aka we need bikesheds) in dak
and support in the build infrastructure for using the desired build
profile and sending the results to the right bikeshed.

https://wiki.debian.org/BuildProfileSpec

Cross-builds mostly use the same packages as normal builds, except for
the cross-compiler, so these should be fine. I hear
gcc-for-host/gcc-for-build packages will improve the cross-compiling
situation, but IIRC this needs someone to work on it or review the
existing work.

http://crossqa.debian.net/
https://wiki.debian.org/CrossCompiling

> 3) Bootstrappable builds.

As I understand it from idling on their IRC channel, the
Bootstrappable Builds folks aim to not rely on all the non-source
files and other cruft in tarballs, so bootstrapping from VCS seems
like it should work just as well. I think it would be great if Debian
could become bootstrappable, probably starting with the higher levels
of the build stack (things like java) rather than the lower levels
from bare-metal to Linux/GCC, which are still in progress by the
Bootstrappable Builds folks.

Marc Haber

unread,
Aug 19, 2021, 3:20:03 AM8/19/21
to
On Tue, 17 Aug 2021 19:16:05 +0200, Jonas Smedegaard <jo...@jones.dk>
wrote:
>Quoting Marc Haber (2021-08-17 18:56:59)
>> There is, for example, one distribution that is based on Ubuntu (maybe
>> they thought that ubuntu would be too hard to install) and does not
>> support upgrades. Their FAQ says "we cannot support upgrades because
>> we're based on Debian which doesnt support upgrades".
>
>Sorry, I fail to understand what was the point of last paragraph above.
>Would you mind spelling it out to me?

This was just a rant about a downstream distribution that broke one
Debian's key feature (seamless, supported, painless upgrades) and
blamed their own failure on Debian.

Jonas Smedegaard

unread,
Aug 19, 2021, 4:40:03 AM8/19/21
to
Quoting Marc Haber (2021-08-19 09:12:27)
> On Tue, 17 Aug 2021 19:16:05 +0200, Jonas Smedegaard <jo...@jones.dk>
> wrote:
> >Quoting Marc Haber (2021-08-17 18:56:59)
> >> There is, for example, one distribution that is based on Ubuntu
> >> (maybe they thought that ubuntu would be too hard to install) and
> >> does not support upgrades. Their FAQ says "we cannot support
> >> upgrades because we're based on Debian which doesnt support
> >> upgrades".
> >
> >Sorry, I fail to understand what was the point of last paragraph
> >above. Would you mind spelling it out to me?
>
> This was just a rant about a downstream distribution that broke one
> Debian's key feature (seamless, supported, painless upgrades) and
> blamed their own failure on Debian.

Ah, thanks. I get it now. Sorry for my think skull...
signature.asc

Sean Whitton

unread,
Aug 24, 2021, 7:30:03 PM8/24/21
to
Hello Simon,

On Wed 18 Aug 2021 at 10:10AM +02, Simon Josefsson wrote:

> 1) Trust paths. Some upstreams sign release tarballs with an OpenPGP
> release key that Debian trust for making releases. Not all upstream
> uses the same key to sign VCS tags/commits, and not all upstreams sign
> VCS tags/commits at all. While Debian can encourage and promote new
> policies for upstream here, I don't think we are in a position to
> require any uniform set of rules. Signing tarballs is the current
> established best practice -- moving to VCS builds needs a set of new
> schemes to be established and deployed, and I don't see any single
> universal solution today.

From my point of view, signing git tags is no less well established a
best practice than signing tarballs -- in fact, to me, it seems *more*
well established. Of course, that's based on the kinds of upstreams I
find myself interacting with, based on the package maintainance work I
tend to be involved in. I don't mean to deny that it looks the other
way around from other points of view. But I think either of us would be
mistaken to take one of them to be more standard, at this point.

--
Sean Whitton
signature.asc

Phil Morrell

unread,
Aug 24, 2021, 8:10:03 PM8/24/21
to
On Tue, Aug 24, 2021 at 04:21:50PM -0700, Sean Whitton wrote:
> On Wed 18 Aug 2021 at 10:10AM +02, Simon Josefsson wrote:
> > Signing tarballs is the current
> > established best practice -- moving to VCS builds needs a set of new
> > schemes to be established and deployed, and I don't see any single
> > universal solution today.
>
> From my point of view, signing git tags is no less well established a
> best practice than signing tarballs -- in fact, to me, it seems *more*
> well established.

Maybe for upstreams the tooling is certainly easier for signed tags that
are distributed with the git repo, rather than tarball signatures that
have to be attached to a releases page after the fact. However, the
debian tooling last I checked correctly passed on the upstream tarball
signature intact to be available to the end-user (included in .dsc).

uscan verifies signed tags only locally before throwing away the
metadata - see also 3.0 (git) source format and tag2upload. It doesn't
have to be full history clone, only IIRC the tag and its sole commit
object from `git cat-file -p` to recreate them.
signature.asc

Simon Richter

unread,
Aug 25, 2021, 6:10:03 AM8/25/21
to
Hi,

On 8/25/21 1:21 AM, Sean Whitton wrote:

> From my point of view, signing git tags is no less well established a
> best practice than signing tarballs -- in fact, to me, it seems *more*
> well established.

That is ecosystem dependent.

FWIW, I'd love to see git bundles as a source archive format -- this
would allow shipping a (signed) tag, its commit, and the tree and blob
objects for that commit as a single file that can be built in a
reproducible way and allows changes on top to be easily tracked,
including the branch point.

In the absence of an "official" upstream release tarball, using this
format also makes it clear that this is a git snapshot, so no
explanation is needed how that archive was created.

Simon

OpenPGP_signature

Thomas Goirand

unread,
Aug 25, 2021, 10:20:02 AM8/25/21
to
On 8/16/21 3:18 AM, Paul Wise wrote:
+1 to deprecate PyPi links.

It's been *years* since I encounter a PyPi package that doesn't have a
Git repo as its homepage (and unfortunately, 99% on Github).

I wrote this many times, but I don't see why we should use any "upstream
tarball" when the Git repository itself contains the tarball with:

git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
| xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

(which leads to a .xz, which is nicer)

Not only then, only only has to merge the upstream tag in the Debian
branch to get the new release, but on top, no need to "gbp import" or
"pristine-tar commit", and a single packaging branch becomes enough.

I very much wish this packaging workflow gained more traction, and the
pristine-tar abomination dies...

Cheers,

Thomas Goirand (zigo)

Simon Richter

unread,
Aug 25, 2021, 10:40:04 AM8/25/21
to
Hi,

> I wrote this many times, but I don't see why we should use any "upstream
> tarball" when the Git repository itself contains the tarball with:

> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

> (which leads to a .xz, which is nicer)

"git archive" is reproducible, for simplicity I wouldn't use a prefix
though. xz has some issues with reproducibility, AFAIK "-T2" makes it
disable some internal heuristics that are based on the machine it is
running on, and generates consistent output.

Simon

OpenPGP_signature

Jeremy Stanley

unread,
Aug 25, 2021, 10:40:04 AM8/25/21
to
On 2021-08-25 16:11:37 +0200 (+0200), Thomas Goirand wrote:
[...]
> I wrote this many times, but I don't see why we should use any "upstream
> tarball" when the Git repository itself contains the tarball with:
>
> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
>
> (which leads to a .xz, which is nicer)

This is a very absolutist statement. As pointed out in prior
discussions, not everything which makes it into an upstream's
release tarball is necessarily tracked in revision control. And for
things which are tracked in revision control, they may sometimes
need to be extracted from the revision control metadata rather than
kept within the worktree, so would not be included by naive git
archive output. You could perform the necessary steps to
extract/assemble this data at package build time, but may need a
copy of the actual upstream Git repository on hand in order to do
so. Some of these extracted files may even be referenced in
copyrights (e.g. an AUTHORS file), with reliance on the worktree
contents alone leading to a legally undistributable downstream copy.

> Not only then, only only has to merge the upstream tag in the Debian
> branch to get the new release, but on top, no need to "gbp import" or
> "pristine-tar commit", and a single packaging branch becomes enough.
[...]

Yes, if you have the upstream Git repository, things like revision
history can be accessed when generating the source package. However,
if upstream already supplies signed source release tarballs with
this extracted for you, and guarantees through testing that they
don't omit files from the worktree in their official tarballs,
redoing all that at source package build time does seem marginally
obsessive (though I suppose that's fine so long as you actually
remember to do it).
--
Jeremy Stanley
signature.asc

Theodore Ts'o

unread,
Aug 25, 2021, 11:10:03 AM8/25/21
to
On Wed, Aug 25, 2021 at 04:11:37PM +0200, Thomas Goirand wrote:
>
> It's been *years* since I encounter a PyPi package that doesn't have a
> Git repo as its homepage (and unfortunately, 99% on Github).
>
> I wrote this many times, but I don't see why we should use any "upstream
> tarball" when the Git repository itself contains the tarball with:
>
> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
>
> (which leads to a .xz, which is nicer)

Well, if we don't use an "upstream tarball", we do need to keep our
own private archive the Git repository. After all, there is no
guaranteee that the upstream git repo might disappear in the future.

Simon's proposal that use use a tarball of the bare git repo
containing all of the git objects needed leading up to the signed tag
works, but isn't necessarily the most efficient over time, since we
would be keeping multiple copies of redundant git repos in
snapshots.debian.org, or across multiple Debian versions in our ftp
archives. But it at least guarantees if we will continue to have
access to the source even if the upstream git repo goes *poof*.

> Not only then, only only has to merge the upstream tag in the Debian
> branch to get the new release, but on top, no need to "gbp import" or
> "pristine-tar commit", and a single packaging branch becomes enough.
>
> I very much wish this packaging workflow gained more traction, and the
> pristine-tar abomination dies...

Sure, but it implies that the git repos on salsa and/or dgit have to
become our official source of record for the purposes of GPL
compliance. Which means we need to be a lot more careful about ever
allowing those git trees from being deleted or rewritten, even if the
goal is to remove files that might be found to be problematic
copyright licensing perspective.

- Ted

Phil Morrell

unread,
Aug 25, 2021, 12:50:02 PM8/25/21
to
On Wed, Aug 25, 2021 at 04:35:51PM +0200, Simon Richter wrote:
> > I wrote this many times, but I don't see why we should use any "upstream
> > tarball" when the Git repository itself contains the tarball with:
>
> > git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> > | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
>
> "git archive" is reproducible, for simplicity I wouldn't use a prefix
> though.

For simplicity I *would* use a prefix, purely because that's what
github/gitlab uses, so upstream can still choose to additionally sign
the distributed tarball if they wish.

name=CorsixTH-0.61-beta1 # don't ask me why there's no v, it's just what GitHub does
git archive --prefix=$name/ -o ../$name.tar.gz v0.61-beta1
gpg --armor --detach-sign ../$name.tar.gz

https://github.com/CorsixTH/CorsixTH/issues/1271#issuecomment-344882419
signature.asc

Thomas Goirand

unread,
Aug 25, 2021, 1:10:03 PM8/25/21
to
On 8/25/21 5:01 PM, Theodore Ts'o wrote:
> On Wed, Aug 25, 2021 at 04:11:37PM +0200, Thomas Goirand wrote:
>>
>> It's been *years* since I encounter a PyPi package that doesn't have a
>> Git repo as its homepage (and unfortunately, 99% on Github).
>>
>> I wrote this many times, but I don't see why we should use any "upstream
>> tarball" when the Git repository itself contains the tarball with:
>>
>> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
>> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
>>
>> (which leads to a .xz, which is nicer)
>
> Well, if we don't use an "upstream tarball", we do need to keep our
> own private archive the Git repository. After all, there is no
> guaranteee that the upstream git repo might disappear in the future.

What I was saying is that you don't need to use the Github generated
tarballs, because likely they are also generated with git-archive, so
doing it yourself achieves the same thing. You do not need to even keep
the upstream branch, the only thing you need is to push the Git tags
from upstream (which by itself contains the whole thing...).

> Simon's proposal that use use a tarball of the bare git repo
> containing all of the git objects needed leading up to the signed tag
> works, but isn't necessarily the most efficient over time, since we
> would be keeping multiple copies of redundant git repos in
> snapshots.debian.org, or across multiple Debian versions in our ftp
> archives. But it at least guarantees if we will continue to have
> access to the source even if the upstream git repo goes *poof*.

If pushing the upstream git tags to Salsa, you're safe, and the way we
do in the OpenStack team, we still generate and upload tarballs to the
Debian archive matching each tags.

>> Not only then, only only has to merge the upstream tag in the Debian
>> branch to get the new release, but on top, no need to "gbp import" or
>> "pristine-tar commit", and a single packaging branch becomes enough.
>>
>> I very much wish this packaging workflow gained more traction, and the
>> pristine-tar abomination dies...
>
> Sure, but it implies that the git repos on salsa and/or dgit have to
> become our official source of record for the purposes of GPL
> compliance. Which means we need to be a lot more careful about ever
> allowing those git trees from being deleted or rewritten, even if the
> goal is to remove files that might be found to be problematic
> copyright licensing perspective.
>
> - Ted

That's the thing with git: anyone working on the repository will hold a
local copy, which can act as a backup... :)

Thomas Goirand (zigo)

Simon Richter

unread,
Aug 25, 2021, 3:40:03 PM8/25/21
to
Hi,

On 25.08.21 18:42, Phil Morrell wrote:

>> "git archive" is reproducible, for simplicity I wouldn't use a prefix
>> though.

> For simplicity I *would* use a prefix, purely because that's what
> github/gitlab uses, so upstream can still choose to additionally sign
> the distributed tarball if they wish.

> name=CorsixTH-0.61-beta1 # don't ask me why there's no v, it's just what GitHub does

This comment is precisely what I mean with "simplicity."

Simon

OpenPGP_signature

Sam Hartman

unread,
Aug 25, 2021, 6:00:04 PM8/25/21
to
>>>>> "Simon" == Simon Richter <s...@debian.org> writes:

Simon> Hi,
Simon> On 8/16/21 3:18 AM, Paul Wise wrote:

>> I'd like to suggest that we standardise on the upstream VCS for
>> our orig.tar.gz files and phase out use of upstream packaging
>> ecosystems.

Simon> This is also an additional burden on package maintainers:
Simon> explaining how they arrived at that particular "upstream"
Simon> package in a reproducible way, and why what we ship as "orig"
Simon> is different from upstream, and what the copyright and
Simon> licensing situation for that derived work is.

Simon> Upstream projects have gotten a lot sloppier in how they cut
Simon> releases, that is true, and that is making packaging more
Simon> difficult as we need to disable mechanisms and embedded code
Simon> copies that were included for our "convenience."

Simon> Rather than accept defeat,

I don't think it's accepting defeat to use an upstream vcs.
I think it's just better.
IT's closer to the development workflow I'd use to work on the upstream.
AS a maintainer, it makes it easier for me to forward or back port
patches.
It makes it easier for me to contribute upstream.

I acknowledge your concern about needing to justify why I picked a
particular git tag.

By this point I kind of think source tarballs are an anti-pattern.

I do agree with you that Debian should work with upstreams to understand
our needs and to produce high quality releases.
At least for me though, that's unrelated to this issue.
In my world high quality source releases are signed git tags not
tarballs.
I'd like Debian to embrace my world:-)
signature.asc

Brian Thompson

unread,
Aug 25, 2021, 9:00:04 PM8/25/21
to
Ecosystem-dependent or not, I can see being able to verify who uploaded
the Git tag (or anything for that matter) as being increasingly valuably
in a world where there is a lot of uncaught or ignored plagiarism.
Uploaders and creators should have integrity so that their users can
rely on them and be confident to deliver quality work.

--
Best regards,

Brian T
signature.asc

Thomas Goirand

unread,
Aug 26, 2021, 10:20:02 AM8/26/21
to
Excuse me, but -T in xz looks like to be the number of threads. Did you
mistake it with something else?

Cheers,

Thomas Goirand (zigo)

Simon Richter

unread,
Aug 26, 2021, 10:50:03 AM8/26/21
to
Hi Thomas,

On 8/26/21 4:16 PM, Thomas Goirand wrote:

>> "git archive" is reproducible, for simplicity I wouldn't use a prefix
>> though. xz has some issues with reproducibility, AFAIK "-T2" makes it
>> disable some internal heuristics that are based on the machine it is
>> running on, and generates consistent output.

> Excuse me, but -T in xz looks like to be the number of threads. Did you
> mistake it with something else?

No, enabling multithreading with a number of threads > 1 disables a
heuristic that optimizes memory usage based on memory available on the
machine it is running on, and uses fixed-size buffers.

Simon

OpenPGP_signature

Paul Wise

unread,
Aug 27, 2021, 4:00:03 AM8/27/21
to
On Wed, Aug 25, 2021 at 2:36 PM Simon Richter wrote:

> "git archive" is reproducible

I'm told by the Bootstrappable Builds folks that `git archive` isn't
deterministic in some cases to do with filtering, I lost the details
though.

Paul Wise

unread,
Aug 27, 2021, 4:00:04 AM8/27/21
to
On Wed, Aug 25, 2021 at 10:01 AM Simon Richter wrote:

> FWIW, I'd love to see git bundles as a source archive format -- this
> would allow shipping a (signed) tag, its commit, and the tree and blob
> objects for that commit as a single file that can be built in a
> reproducible way and allows changes on top to be easily tracked,
> including the branch point.

The dpkg-source 3.0 (git) format is exactly this, IIRC ftp-masters
didn't want to have to review the full history in the bundle, but that
could be mitigated by allowing single-commit bundles.

Sean Whitton

unread,
Aug 27, 2021, 5:00:03 PM8/27/21
to
Hello zigo,

On Wed 25 Aug 2021 at 04:11PM +02, Thomas Goirand wrote:

> I wrote this many times, but I don't see why we should use any "upstream
> tarball" when the Git repository itself contains the tarball with:
>
> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
>
> (which leads to a .xz, which is nicer)
>
> Not only then, only only has to merge the upstream tag in the Debian
> branch to get the new release, but on top, no need to "gbp import" or
> "pristine-tar commit", and a single packaging branch becomes enough.
>
> I very much wish this packaging workflow gained more traction, and the
> pristine-tar abomination dies...

I agree.

I'd like to suggest using 'git deborig' which is much shorter to type :)

--
Sean Whitton
signature.asc

Sean Whitton

unread,
Aug 27, 2021, 5:00:04 PM8/27/21
to
Hello,

On Fri 27 Aug 2021 at 07:58AM GMT, Paul Wise wrote:

> On Wed, Aug 25, 2021 at 2:36 PM Simon Richter wrote:
>
>> "git archive" is reproducible
>
> I'm told by the Bootstrappable Builds folks that `git archive` isn't
> deterministic in some cases to do with filtering, I lost the details
> though.

git-deborig in devscripts has code to handle the filtering problem, I
believe.

--
Sean Whitton
signature.asc

Sean Whitton

unread,
Aug 27, 2021, 5:00:04 PM8/27/21
to
Hello,

On Wed 25 Aug 2021 at 07:01PM +02, Thomas Goirand wrote:

> If pushing the upstream git tags to Salsa, you're safe, and the way we
> do in the OpenStack team, we still generate and upload tarballs to the
> Debian archive matching each tags.

Branches on salsa can be force-pushed, so it's not completely safe.

When you 'dgit push-source' or 'dgit push', that history is immutable in
that only the service admins can rewrite it.

--
Sean Whitton
signature.asc

Sean Whitton

unread,
Aug 27, 2021, 5:00:05 PM8/27/21
to
Hello,

On Wed 25 Aug 2021 at 12:00PM +02, Simon Richter wrote:

> Hi,
>
> On 8/25/21 1:21 AM, Sean Whitton wrote:
>
>> From my point of view, signing git tags is no less well established a
>> best practice than signing tarballs -- in fact, to me, it seems *more*
>> well established.
>
> That is ecosystem dependent.

Yes, that was my point. We're going to have upstreams who release
tarballs and upstreams who release tags for some time.

--
Sean Whitton
signature.asc

Simon Richter

unread,
Aug 29, 2021, 7:50:03 AM8/29/21
to
Hi,

On 27.08.21 22:56, Sean Whitton wrote:

>> That is ecosystem dependent.

> Yes, that was my point. We're going to have upstreams who release
> tarballs and upstreams who release tags for some time.

My expectation for that state would be "indefinitely", and I don't see
that as a bad thing, we should be able to handle either in our tooling,
and we don't need to inflict a certain workflow on upstreams, and I
don't see the autoconf+automake folks deviate from "make distcheck" as a
release process anytime soon.

Where I do see us pushing upstreams though is towards "making releases
at all, and committing to supporting them." The number of packages with
version numbers like "0.0.20190812+git01234567" is too damn high.

Simon

Nicholas Guriev

unread,
Aug 29, 2021, 11:30:03 AM8/29/21
to
> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
> | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

I think you should add +ds version suffix or similar to indicate
repacking for Debian. Does it still make sense provided that upstream
does not care much of tarballs?

signature.asc

gregor herrmann

unread,
Sep 26, 2021, 5:10:03 AM9/26/21
to
On Mon, 16 Aug 2021 09:18:34 +0800, Paul Wise wrote:

> I noticed that sometimes Debian's choice of upstream source for
> packaging can be suboptimal. This is especially apparent for the
> different per-language upstream packaging ecosystems[1], where the
> upstream packaging differs from the upstream VCS in some significant
> ways, including missing files, prebuilt files, embedded copies etc.
>
> While the upstream VCS also sometimes has these issues, it is often
> much less problematic than the upstream packaging ecosystems.
>
> I'd like to suggest that we standardise on the upstream VCS for our
> orig.tar.gz files and phase out use of upstream packaging ecosystems.
[…]
> 1. the ecosystems I'm talking about include cargo, npm, browser
> extensions, rubygems, pypi, CPAN etc.

Sorry for being a bit late to the party; as you mention CPAN here, I
thought I'd share some thoughts about it.
(We briefly discussed the topic at the pkg-perl BoF during DebConf
[0] but this is not an offical team statement.)

I see the advantages of the proposal but I think for the perl
ecosystem it doesn't make a whole lot of sense, for two reasons:

- First, CPAN and Debian are quite similar (for better or worse :));
not only about the same age but for both projects the canonical
way of distribution is via tarballs from mirrors - the Debian
archive and the CPAN mirror network. And for both project there is
no requirement to use any VCS or even less a specific one or a
specific hosting for a VCS.
- Second, using only the VCS of a CPAN distribution is not ideal
because it misses information which is created at release time and
which we rely on. So taking the code from an upstream repo
basically means doing part of a release ourselves.

In general, the above mentioned problems of discrepancies between
upstream VCS (if they exist) and upstream tarballs are minor or close
to non-existant in the CPAN world. Hence switching to a VCS-based
approach wouldn't really solve any actual problem in almost all cases
and would create challenges for our tools and workflows.

There are exceptions where we do use the upstream VCS as the tarball
indeed contains undesirable artifacts; and we agreed in the BoF that
improving our tooling to work from a VCS instead of tarballs would be
nice.


Cheers,
gregor


[0] https://lists.debian.org/debian-perl/2021/08/msg00013.html

--
.''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
: :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
`. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
`-
signature.asc
0 new messages