Google Groups

Re: Upstreaming emerge improvements (was: Goodbye Frankenbuild: New workflow for correct incremental builds)


Brian Harring Jul 28, 2010 4:22 AM
Posted in group: Chromium OS dev
On Tue, Jul 27, 2010 at 09:05:28AM -0400, David James wrote:
> On Mon, Jul 26, 2010 at 10:06 PM, Brian Harring <ferr...@gmail.com> wrote:
> > If I may ask, is there a reason you didn't look at an EAPI with
> > (essentially) ABI slots?  Such a route, while it requires specifying a
> > bit more info (or tweaking the manager to generate that info), would
> > give fine grained control to the resolver for doing rebuilds of the
> > sort mentioned above.
>
> Long term, I think that that's exactly what we want. Right now, it's
> hard to tell which packages require dependencies to be rebuilt, and
> which packages do not have this requirement. If there were an
> automated way to tell which packages need their dependencies to be
> rebuilt, that would be great.
>
> The ABI slot solution sounds good, as long as we don't have too many
> packages where the maintainers specify their rebuild requirements
> incorrectly (e.g., by breaking binary compatibility without realizing
> it). If we found that this was a regular problem, we would probably
> want to go back to just rebuilding all dependencies whenever a package
> changes.

Bad packaging metadata we can't really do much about realistically,
beyond threaten to wedige the packaging dev- that said a DT_NEEDED
validation QA check to ensure the targets are within stated rdeps
might be useful.

Doesn't do anything for dlopen or static, but it was one of the more
useful features of the old EBD/saviour branch of portage- iirc I named
that feature 'verify-linkage'.  Implementation was ldd based (ick),
but it was a start.


> > Aside from that, I'd be very interested if there is an ML, irc
> > channel, or any area people are doing discussions of this sort- I'm
> > the author of pkgcore, long term hacker of portage
> > (refactoring/cleanup- trying to combat the mess rather than creating
> > it), and have been working on issues similar to what y'all have been
> > rolling out.  What's been a bit problematic is that there isn't
> > exactly a lot of people doing the level of binpkg work y'all are- thus
> > finding stakeholders to bug for info has been pretty sparse.
>
> Personally, I'd be quite interested in this problem. The
> chromium-os-dev list is fine for this type of discussion, but if you
> have another mailing list that would be better, I'd be happy to join.

Unfortunately I'm not particularly aware of any.  Typically I track
down people in irc and pick their brains directly; that said from an
irc standpoint #gentoo-portage on freenode is probably the best spot
to catch the appropriate people (zac medico/zmedico, sebastian
luther/zac, solar, myself) who would have interest/comments.


> Here are a few other suggestions about things to address in regular emerge:
>  1. To address package rebuilds completely correctly, we'll also need
> to differentiate between host and target build dependencies. There's a
> patch for this, but it looks like it got closed as a duplicate:
>   http://bugs.gentoo.org/show_bug.cgi?id=317337

This was proposed as "bdepends" at one point- build time depends,
specifically what is executed during building the package (bash and
sed for example).

The problem w/ a new dependency type is mainly getting developer buy
in- building tools to help developers track/validate these
dependencies.  At least in the past, this sort of proposal died out
since their wasn't a sufficient carrot (nor was it easy to modify
portage for such a capability).


>  2. Strangely, Portage often favors the dependencies (and masking) of
> built packages over the ebuilds you have in your package dir. I would
> rather that Portage look at the ebuilds, because developers often
> change the ebuilds and want their new ebuild to be prioritized over
> the one that was already built. In parallel_emerge, we work around
> this by setting --usepkg=n for dependency calculation and then convert
> the source package to a binary package at a later stage.

Essentially you want binpkgs to be utilized strictly as a cache,
right?  I suspect this isn't a particularly hard mod to implement for
the folk who know the innards of _emerge.* reasonably well.


>  3. Does Portage have public APIs for calculating dependencies and
> merging packages? Right now, parallel_emerge uses private APIs but I
> would like to convert it to public APIs if they exist.

CC'ing the folk who could best answer this one- at this point, there
isn't perse a public api for the most part.  I know they're currently
trying to derive one for packagekit/porthole/SoC, but I'm not aware of
it's status.


>  4. Portage is a bit aggressive about locking directories, so this
> makes parallel merging slow. Whenever a package is being merged,
> Portage locks the entire package directory. This means essentially
> that package merges do not occur in parallel at all. Is it possible to
> reduce the scope of the locking so that it does not have such a
> significant effect on performance?

Heh.  This is a contentious issue.  Roughly, everything after
pkg_setup when it comes to merging is expected to be the only thing
modifying the FS during this- the reasoning is purely due to
pkg_preinst/pkg_postinst being essentially uncontrolled access to the
livefs.  That critical locking unfortunately does need to exist if
those phases are defined.  I'd expect in addition that the treewalk
bits (the merger) wouldn't particularly like having a run in with
another active merge at the same time.

That said, the scope of the locking definitely should be reducable,
and parallelization increased w/in the atomic locked merge.  Locking
of the binpkgs repository in particular seems unneeded since
effectively it's an untarring to ${D}, and unpacking the xpak (that
annoying trailing data after EOS on tbzs) to ${T} w/in the pkgs build
env.  After that that point, there shouldn't need to be any further
locking on binpkgs- further creating a new binpkg doesn't require a
write lock until you actually try to make it live w/in the binpkg repo
(the final atomic rename), so at least the compression steps should be
able to be done in parallel.


>  5. Some packages will make global system changes in their postinstall
> scripts, but these scripts are not parallel-safe. Can we make these
> scripts parallel-safe? Currently, this is not a big problem for emerge
> because it just locks for the entire merge, including postinstall (see
> #4), but once we get rid of that lock it becomes a problem.

I'm definitely open to suggestions of how to make them safe, but by
pkg_(pre|post)(inst|rm)'s nature, they're free form root level
execution.  From a format standpoint, for better or worse they
technically can do whatever the hell they want, bounded only by QA
standards of that particularly repository.

I could see adding a PROPERTIES hint of some sort indicating that the
locking isn't needed for that pkg offhand, although that's a bit of a
hack.

Zac/Sebastian, any comments on the above?

~brian