On Wed 2015-10-28 14:02:35 -0400, Ben Laurie <
be...@google.com> wrote:
> Yes. Or equivalent (e.g. I think CVS repos' locations are not strictly
> URLs). Basically, "what do I tell what tool to get a copy of the actual
> repo developers work on?".
sure, having this be automated would be great. there are versioning
dependencies here too (e.g. what if version X of VCS A doesn't support
URLs of type foo://, but version Y does?)
>> debian packages contain a file debian/copyright which should indicate
>> the URL of the upstream project, and they often also contain a file
>> debian/watch, which indicates the URL where upstream's released tarballs
>> can be found. In any package, debian/control also has an option
>> Homepage: field which points to the URL of the upstream project (and is
>> replicated in the Packages.xz files), and a Vcs-Git: or Vcs-Browser: field
>> which points to revision control for the debian packaging itself. so
>> all of this info is extractable from the archive itself, even if some of
>> it is not in the specific metadata you find in the Packages.xz file.
>
> To be clear, the "debian packaging itself" is essentially metadata and not
> a mirror of b?
Debian packaging can contain metadata, patches, and code, depending on
the state of the upstream project and how well it fits into debian
default packaging techniques.
The VCS that tracks the debian packaging can either contain only the
debian packaging or it can contain both the debian packaging and the
upstream source. For my own debian packaging work i always include the
upstream source in the VCS, but this isn't a requirement within debian
(these practices have grown up organically, and aren't mandated by the
distro).
> So, it seems to me there are two possible aims for this kind of work
> in the context of a packaging system.
>
> 1. Try to tightly integrate the upstream repo into the packaging process.
>
> 2. Allow audit of the packaging process.
>
> It seems to me that 2 is probably a more interesting immediate target than
> 1, particularly given the many technical barriers you raise.
>
> I confess, though, that my interest was more driven by a desire to gather
> metrics than directly to do with packaging. That said...
If your goal is just to gather metrics, then most of the data you're
asking about is already available, albeit in different forms for
different distros. There are lots of metrics that would be interesting
to review, but clearly we can't overload b-t with all of them.
And there are risks of confusion for some kinds of metrics as well. For
example, if we report something like this:
{ 'package': 'foo',
'version': '1.2-3',
'distro': { 'name': 'lunix',
'version': '9.2',
'platform': 'x86_64-linux-gnu'
},
'sha256': 'b42e48b3a6bf6e407688be6f11908cb0ef8b9a58',
'upstream_repo': { 'type': 'git',
'url': '
https://git.example/projects/foo',
'commit': '18911075732d64426bafccdde6a6b3656727da0d' }
}
and the foo project is known to to have a serious bug that is not fixed
in the referenced upstream_repo.commit, it would be easy to mistakenly
claim that lunix 9.2 hasn't fixed that issue in their foo package. But
it's possible that they have added a patch to fix it already.
Note also that some upstream repositories produce multiple binary
artifacts (e.g. the "pinentry" upstream source produces a half-dozen
binaries in debian, all producing password prompt mechanisms in
different environments).
Similarly, the relationship between the source code and the generated
binary artifact isn't a clean mapping. As the reproducible-builds
project has shown, the toolchain used to convert the source to a binary
artifact is also relevant. in r-b, we've been capturing the manifest of
all packages used during build as a "buildinfo" file. Is b-t the right
place to record that info as well?
>> However, is binary-transparency the right place to do that work?
>
> ...it seems to me that binary transparency is ultimately about provenance.
> In that context, the ability to go all the way back to the original work of
> the developers seems like a key component to provenance, even if its only
> ultimately useful to humans rather than machines.
I think i'm going to push back on this a bit. Is b-t ultimately about
provenance? Currently i see b-t as offering users some assurance that
the package they're seeing is seen by everyone who relies on the same
cryptographic authority. This allows the user to be sure that if
they're getting malware, at least everyone else is getting malware
too. This provides a deterrent to vendors who are being pressured to
ship customized malware to some of their users.
Is this provenance? i see it more as oversight or accountability for
the last-hop leg that software travels from vendor to end-user.
If a malicious binary artifact is discovered, and its associated b-t log
somehow does point all the way back to upstream source code, and that
source code has clear evidence of malice, great, we found the attack.
But if the upstream source code does *not* have evidence of malice, we
don't know where the problem was introduced.
b-t will be simpler if it just focuses on the last hop (the link between
vendor and user), and leave it to the individual vendors to justify any
failures further back in the chain.
That said, i understand the desire to link binary artifacts all the way
back to source code, both from the FLOSS-advocacy perspective and from
the reproducible-builds perspective. And if b-t could provide that
linkage, i'd be happy. I'm just wary about trying to make b-t do all
those things.
> But I guess we're not the only interested parties - should we start
> encouraging some kind of self-published metadata in projects, which
> includes this kind of thing? And if its missing, Debian might (or might
> not) choose to provide it itself...
I listed several ways that this kind of data is already provisionally
published in Debian. If there are specific things that you think are
missing, i'd be happy to look into ways to improve debian infrastructure
to allow it. let me know!
>> If we do, how will that work interact with proprietary vendors who
>> want to provide some level of binary transparency themselves?
>
> It seems to me that the option "upstream repo: private" is entirely fair.
sure, sounds fine to me :)
--dkg