I like to make the system compile itself until it converges, that is,
the newly compiled system is identical to the installed system. It's
surprising how many cycles it takes it to converge in some cases; I
think there are some subtle mutual dependencies between gcc and glibc.
But sometimes it is hard to detect convergence because of embedded
timestamps. All ar archives contain embedded timestamps (my system has
3 .a files), and also bzImage and lilo contain deliberately inserted
timestamps. Fortunately the kernel's timestamp is not in the
compressed part of the image.
So I was wondering:
Do Debian's autobuilders compile repeatedly until the system
converges? If so, how do they detect convergence?
Finally, might it be a good idea to get rid of embedded timestamps, so
that the same source doesn't give a different binary each time it is
compiled? The build script for a package could replace the embedded
timestamps by the date of the newest source file, for example.
Edmund
[1] I'm using 20 source packages. Compressed source is about 60 MB;
binaries are about 35 MB, but you need something like 400 MB free
space for compiling glibc. I make it compile itself under chroot, but
I did boot it under bochs once, too. Compiling glibc under bochs
wouldn't be a very good idea ...
--
To UNSUBSCRIBE, email to debian-dev...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
I'm amazed it works at all. I have memories of timestamps cropping up
in the oddest places...
> Do Debian's autobuilders compile repeatedly until the system
> converges? If so, how do they detect convergence?
Heck no. Nothing gets built more than once. It causes all sorts of
problems, but convergence is essentially impossible - we're building
ALL of the debian archive, remember? We can't pick and choose 20
packages; we have to deal with 4000 instead.
> Finally, might it be a good idea to get rid of embedded timestamps, so
> that the same source doesn't give a different binary each time it is
> compiled? The build script for a package could replace the embedded
> timestamps by the date of the newest source file, for example.
Perhaps dpkg-deb could do this, but I'd see it as a feature rather than
a bug - shows you when the package was built...
Dan
/--------------------------------\ /--------------------------------\
| Daniel Jacobowitz |__| SCS Class of 2002 |
| Debian GNU/Linux Developer __ Carnegie Mellon University |
| d...@debian.org | | dm...@andrew.cmu.edu |
\--------------------------------/ \--------------------------------/
> > Do Debian's autobuilders compile repeatedly until the system
> > converges? If so, how do they detect convergence?
>
> Heck no. Nothing gets built more than once. It causes all sorts of
> problems, but convergence is essentially impossible - we're building
> ALL of the debian archive, remember? We can't pick and choose 20
> packages; we have to deal with 4000 instead.
Why impossible?
I don't see that the number of packages changes anything essentially.
In practice the cost is linear in the number of packages. If you can
solve the problem of embedded timestamps, then you end up doing an
extra 4000 builds for each test cycle, which should be feasible.
The main difference between my toy set-up and Debian is that I install
all my 20 packages, whereas Debian has packages that conflict with
each other.
So you should probably define a core set of packages that are always
installed at build time and make each package declare any other build
dependencies it may have. When you build a package, you have only
those packages installed. Use a chroot environment to keep it really
clean.
That way, you can change most packages without having to rebuild any
other packages.
The main advantage of doing this:
* You know for certain that the released binary packages can be
rebuilt with the released source on the released system, which is good
for security.
The cost:
* You need a more complex autobuilder.
* You need some extra CPU power for the autobuilder (roughly
equivalent to rebuilding every package on each test cycle).
* You force package maintainers to get rid of embedded timestamps (but
presumably you help them by providing some tools for this).
PS: If a package accidently contains an embedded timestamp, the
autobuilder should detect the situation rather than go into a loop.
Edmund
> > Finally, might it be a good idea to get rid of embedded timestamps, so
> > that the same source doesn't give a different binary each time it is
> > compiled? The build script for a package could replace the embedded
> > timestamps by the date of the newest source file, for example.
>
> Perhaps dpkg-deb could do this, but I'd see it as a feature rather than
> a bug - shows you when the package was built...
I've just realised there's a possible misunderstanding here.
I don't call the date of a file in data.tar.gz in a deb an "embedded
timestamp", because the programs that handle debs know how to unpack
the data that far. By an "embedded timestamp" I mean a timestamp that
appears inside one of those files.
In a lot of cases it's because the file is an archive or compressed.
These are the embedded timestamps that a simple tool could get rid of.
In other cases the timestamp is deliberately inserted in a less
standard manner; here it would be necessary to patch the source.
For an example, try strings /sbin/lilo | grep 2000. I think this
misfeature was added to LILO quite recently.
These embedded timestamps do more harm than good, in my opinion.
So fix your compare to ignore difference in time? I find those times to
be quite valuable.
Wichert.
--
_________________________________________________________________
/ Nothing is fool-proof to a sufficiently talented fool \
| wic...@cistron.nl http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |
> > But sometimes it is hard to detect convergence because of embedded
> > timestamps. All ar archives contain embedded timestamps (my system has
> > 3 .a files), and also bzImage and lilo contain deliberately inserted
> > timestamps. Fortunately the kernel's timestamp is not in the
> > compressed part of the image.
>
> So fix your compare to ignore difference in time? I find those times to
> be quite valuable.
The trouble is that only the linux-kernel package, say, can know how
to ignore the timestamp inside a kernel image. So each package would
have to export its own version of compare. Moreover, the system would
be less robust, because there might be a bug in one of the compare
functions, and that would be hard to detect.
Why do you find those times to be valuable (more valuable than they
would be if they were replaced by the date of the source diff)?
Edmund
Huh? I don't follow that logic. It's quite trivial to compare two
packages and ignore the timestamps for the files inside them. Basically
you extract and do a binary diff (or compare md5sums). You can script
that in say 10 minutes.
> Why do you find those times to be valuable (more valuable than they
> would be if they were replaced by the date of the source diff)?
Because knowing when something is build adds extra information.
Wichert.
--
_________________________________________________________________
/ Nothing is fool-proof to a sufficiently talented fool \
| wic...@cistron.nl http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |
> Previously Edmund GRIMLEY EVANS wrote:
> > The trouble is that only the linux-kernel package, say, can know how
> > to ignore the timestamp inside a kernel image.
>
> Huh? I don't follow that logic. It's quite trivial to compare two
> packages and ignore the timestamps for the files inside them. Basically
> you extract and do a binary diff (or compare md5sums). You can script
> that in say 10 minutes.
You're still misunderstanding me!
The file "bzImage" contains a timestamp. That's what I mean by an
"embedded timestamp". It's the number of seconds in binary, if I
remember correctly; /sbin/lilo contains the date as text. I don't see
how a simple script could ignore those timestamps.
I agree that a simple script could unpack static libraries. On the
other hand, I don't suppose anyone has any use for the dates of the
archive members; ar x doesn't even extract the dates by default. So
why not zap them.
> > Why do you find those times to be valuable (more valuable than they
> > would be if they were replaced by the date of the source diff)?
>
> Because knowing when something is build adds extra information.
You still have that information in the dates of the files. I am
suggesting that the embedded timestamps should be zapped, not the
dates of the files.
Probably most packages don't contain any embedded timestamps.
The kernel's embedded timestamp is (I assume) used to generate the
date displayed by uname -a. Would it matter to you if this date were
the date of the source diff instead of the date the binary package was
built?
(It might, I suppose, be more correct, to use the date of the newest
file on which the package depends, which would correspond to the
earliest time at which the package could have been built. That would
be more historically correct.)
Edmund
> * You need some extra CPU power for the autobuilder (roughly
> equivalent to rebuilding every package on each test cycle).
Um... what??? Are we talking on the same wavelength here?
A complete build of Debian, on a top-end machine, would probably take
three or four complete days. It might take more like two weeks.
That's not really feasible, and every time you needed to redo a
cycle....
> PS: If a package accidently contains an embedded timestamp, the
> autobuilder should detect the situation rather than go into a loop.
I really doubt that's possible to do reliably. Who knows what the
timestamp might look like?
You could get away with a small limit on binary delta, perhaps, but
that's fugly.
Dan
/--------------------------------\ /--------------------------------\
| Daniel Jacobowitz |__| SCS Class of 2002 |
| Debian GNU/Linux Developer __ Carnegie Mellon University |
| d...@debian.org | | dm...@andrew.cmu.edu |
\--------------------------------/ \--------------------------------/