Hi Maxim,
On Fri, 2018-05-25 at 13:57 +0200, Maxim Yu. Osipov wrote:
> Hi Claudius,
>
> Let me summarize the patchset status.
>
> 1. This patchset is just a first step towards more generic traceable
> reproducible build - tarball versioning/reproducibility features are
> out
> of scope of this patch set.
It is the first step and a RFC patch. It is mainly meant to support the
discussion about the possible options. So I if we are on the same page
about the solution for reproducibility in Isar and how to go forward
from there I will post a non-RFC patch with documentation.
> 2. Baurzhan always asks to provide some bits of information into the
> documentation when a new feature is added (or changed).
See answer above.
>
> 3. Henning asked about "stealing DL_DIR of bitbake as well" (see his
> email below). What is your opinion?
My opinion is that we should document want files are needed be archived
in order to reproduce the complete build. It is also that we (as in
isar) cannot dictate to the user how she has to archived those
artifacts.
There are many different systems for archiving binary files available
and we can offer them some suggestions or even write some example code
for a couple of systems, but we should still be able to support all of
their own ideas. Those code examples can be in the form of some
shell/python scripts, but I would be against binding isar to one of
such system at this point.
To be honest I didn't 100% get what Henning meant as 'stealing DL_DIR
of bitbake'. I suppose that he meant putting files from there into the
tarball as well? But I currently don't see a good reason for doing so.
My patch was about producing an artifact that isn't covered by the
DL_DIR and the normal reproducibility mechanisms of bitbake, mixing
this just causes redundancy and confusion IMO.
I also skipped answering this questions because I thought I answered
that in the passage before by pointing out that any expansions to this
(like source packages) can be done later.
Just to make it clear: I don't want to shut down discussion about this
by always pointing out to any argument against this solution, that it
can be done later. Hennings and Baurzhans critique points are very
welcome, because they ask: "Can your solution really be expanded to
include those use-cases/requirements?"
For instance this partial update feature is something that is not so
easy to do with this simple mechanism. It would be much easier to do
this if aptly had "proxy caching" support [1] and we would use that to
solve reproducibility in isar. Also Hennings point about source
packages could be done easier with some coding inside an apt repo
proxy/web server. So are we just saving complexity now and get it later
in heaps or are we gaining a simple normal case while having some
hurdles in the odd special one? I don't know yet. Please tell me!
What we have now is a solution space, from a simple solution like this
RFC patch to a possible complex solution with an "apt caching proxy".
Maybe someone can think of a good solution in between or if some
important feature or UX concern requires a more complex approach.
Here are some ideas I have seen mentioned and my opinion on them
including some pros and cons I just came up with. This is from memory,
so please correct me if I remembered something incorrectly:
Idea 0: Store tarball of debootstrap output with filled apt cache and
use that to restore isar-bootstrap.
Critique 0: Thats in short my 'simple solution'
Pro: simple to implement
Con: Debootstrap process is not done on every build.
Archival of a binary root file system is strange.
How to archive source packages?
=> add apt-get source to the installation process
How to handle partial update?
=> write a script that generates an isar recipe that deploys
those packages to the isar-apt repo.
Idea 1: Generate a repository from the cache and use that for the next
debootstrap run.
Critique 1: Similar to my 'simple solution' but adds the creation of an
additional repository to it. -> higher complexity
Pro: debootstrap process is done on every build.
Con: Different apt repo urls are used.
For me that is a no-go, because that means the configuration
is different between the initial and subsequent builds.
How to add new packages later? (maybe like partial update?)
How to handle multiple repos?
=> map all repos from initial run to the local one.
And then what? => cannot be reverted, loss of information
How to archive source packages? (same as Idea 0)
How to handle partial update? (same as Idea 0)
Idea 2: Like idea 1 but with aptly. And then use aptly to manage
packages.
Critique 2: I am not that familiar with aptly, so I please correct me.
Pro: debootstrap process is done on every build.
Better repo management tools.
Con: Different apt repo urls are used.
Need a whole mirror locally? (See Idea 3 and 4)
Dependency on external tool.
Possible some roadblocks since aptly isn't really designed for
our use case.
Idea 3: Create a whole repo mirror with aptly or similar and strip
unused packages later.
Critique 3:
Pro: debootstrap process is done on every build.
Better repo management tools.
Con: Need a whole mirror locally.
For me that is a no-go as well, it should only be downloaded
what is necessary for a build, nothing more.
Dependency on external tool.
Adding new packages later is a double step: adding in aptly then to isar Possible some roadblocks since aptly isn't really designed for
our use case.
Idea 4: Create a whole repo mirror with aptly or similar and import
used package into a new repo.
Critique 4:
Pro: debootstrap process is done on every build.
Better repo management tools.
Con: Different apt repo urls are used.
Need a whole mirror locally?
That might be unnecessary. Per aptly documentation it could
possible to create a mirror with a package filter to
only allow used packages. Then this is similar to idea 2.
Dependency on external tool.
Possible some roadblocks since aptly isn't really designed for
our use case.
Idea 5: Implementing a 'caching proxy' feature in aptly.
Critique 5:
Pro: debootstrap process is done on every build.
Better repo management tools.
Con: Dependency on external tool.
Needs some implementation in aptly.
Idea 6: Implementing a caching proxy feature in isar.
Critique 6: That
was my initial idea way back.
Pro: debootstrap process is done on
every build.
Con: Needs a lot of python scripting and code
maintenance in isar.
If I missed or misrepresented an idea, please don't hesitate to correct
me or add them.
Ideas 2 to 4 are just slight variations from another. Those are just
the different ideas I could imagine using aptly for our purposes.
Because of the contra arguments 'whole local mirror' and 'different apt
repo urls are used' I would got for 0 and 5.
Idea 6 I discarded after some experimentation. Writing a async http
proxy with only std-lib python is a pain. However writing blocking code
with thread pools might be easier.
So I think I rambled enough now... Sorry for that.
Cheers for anyone left reading to this point,
Claudius
[1] What I mean by this is that aptly operates as some kine of lazy
fetching http/ftp/rsync/... repo. Any request to a unavailable file is
downloaded from upstream and then stored on the build machine. I don't
mean that aptly should necessarily be a http proxy, could also just be
a web server.