Stop “optimizing” redo-always, it breaks a lot of use cases

56 views
Skip to first unread message

Nils Dagsson Moskopp

unread,
Aug 3, 2022, 9:02:19 AM8/3/22
to redo...@googlegroups.com
Dear redo implementors,

some of you have chosen to “optimize” redo-always so that a target is
not always rebuilt, i.e. built at most once. For example, goredo does
this: <http://www.goredo.cypherpunks.ru/FAQ.html>

> By definition, it should be built always, as redo-sh and redo-c
> implementations do. But that ruins the whole redo usability,
> potentially rebuilding everything again and again. apenwarr/redo and
> goredo tries to build always-targets only once per run, as a some kind
> of optimization.

Here are some use cases that get sabotaged by the “optimizations”; all
can be summed up with ”not everything out there is a pure function”:

1. Composition with recursion – a major reason to use redo – becomes
hard or impossible to do without implementation-specific workarounds.

The popular apenwarr-redo seems to have this problem in its own test
suite, leading to it having code that messes with its cache during a
test: https://github.com/apenwarr/redo/blob/main/t/flush-cache.in

This is necessary – because otherwise this test will not pass:
https://github.com/apenwarr/redo/blob/main/t/640-always/all.do

In my opinion, a test reaching into an implementation-specific cache
solely because an implementation's dependency caching is broken … is
bogus: redo-always only functions “as expected” in the test case due
to smoke and mirrors – and the trickery means that it is unportable.

The test case for redo-always in redo-sh does not need this. If some
redo implementation does not pass it, recursive composition is hard,
unportable and/or both.

2. A target that is always out of date and has desired side effects may
be broken by not considering it out of date erroneously. This should be
obvious: If I have a target that is always out of date and I want it to
generate something and send it to another host over the network, it is
not going to work past the first time if the target is only build once.

3. Targets that include timestamp information are majorly affected by
this. Imagine a website footer that includes the output of “date” or a
long-running document conversion process that wants to timestamp each
build artifact. If footer.html.do is just ”redo-always; LANG=C date”,
that will probably not result in what was intended if footer.html is
built only once. I recently used redo to generate a printed schedule
for an event – since that could change at a moment's notice, it was
very important to the team to have the timestamps on the printouts.

4. Targets that are out of date unless a condition is fulfilled may be
broken. For example, I have targets that are using redo-always to be out
of date, but then use redo-stamp to take it back. Without redo-always, a
target might not be evaluated next time, as any input to redo-stamp can
come from wherever, e.g. it could be an ETag from an HTTP header or so.

Here are some reasons to deliver a botched “redo-always” implementation:

1. A naive way of representing dependencies can not represent a target
that is always out of date. An example: Treating targets as out of date
or not out of date as a binary is wrong – a target is always out of date
or not out of date relative to some other thing: In the minimal case, a
target is out of date relative to its build rule. Incidentally, a system
that represents dependencies that way MUST have the concept of a “run”,
as it can not cleanly deal with many states that occur during a build.

2. Several obvious but wrong “optimizations” of dependency checking make
it so that a target can be built at most once during a run. For example,
a build system can aggressively and erroneously cache dependency checks,
so an always-out-of-date target will be found out of date, but after it
is built the updated cache entry reads “this is up to date”. This will
make builds faster and sometimes incorrect, even without redo-always.

3. An implementor who finds it hard to make a recursive top-down build
system fast can implement parts of it in a bottom-up toposort way. This
makes builds faster, more easier to parallelize and sometimes incorrect
even without redo-always.

Based on these three alone, you can sometimes work out motivations an
author had. For example, the ”potentially rebuilding everything again
and again” problem of goredo could be a property of an inplementation
that is very eager to calculate what is out-of-date, but is unable to
take it back later. As I understand it, redo-sh does not have such an
issue. (Please correct me if I am wrong here – with example dofiles.)

Greetings,
Nils
signature.asc

Prakhar Goel

unread,
Aug 3, 2022, 9:23:04 AM8/3/22
to Nils Dagsson Moskopp, redo-list
Nils,

We've had this argument before. Redo-always should _always_ be cached per run. Not doing so has catastrophic effects on worst case efficiency for the central use-case for redo which is program builds which are in fact pure functions of the source code. The entire idea of caching build artifacts makes no sense without this assumption.

Your use cases with impure functions are well beyond the scope of a build system like redo. They are an abuse of the abstraction.

An aside: reasoning from test cases is ridiculous. Tests are designed to hit unusual edge cases and sometimes they have to resort to hacky workarounds and special APIs to do it.

-- PG

--
You received this message because you are subscribed to the Google Groups "redo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redo-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redo-list/87a68ljx9s.fsf%40dieweltistgarnichtso.net.

Nils Dagsson Moskopp

unread,
Aug 3, 2022, 10:20:17 AM8/3/22
to Prakhar Goel, redo-list
Prakhar Goel <newt...@gmail.com> writes:

> We've had this argument before. Redo-always should _always_ be cached per
> run. Not doing so has catastrophic effects on worst case efficiency for the
> central use-case for redo which is program builds which are in fact pure
> functions of the source code. The entire idea of caching build artifacts
> makes no sense without this assumption.

I know that some people prefer “better fast than correct”. What I am
asserting is that it is a false dilemma: A redo can be both fast and
correct, unless some implementation choices mean it can not do both.

For example, the concept of a “run” seems absolutely necessary to me to
be able to implement parts of redo in a non-recursive way. It is a hack
and leads to difficulties, as I have shown in the previous email.

> Your use cases with impure functions are well beyond the scope of a build
> system like redo. They are an abuse of the abstraction.

That's just like, your opinion. Nothing is 100% pure in the real world.

Anyways, even if you leave out these cases, there remains the issue that
a naive implementation that lazily and inefficently checks dependencies
can totally handle redo-always targets. It can do recursive composition,
while a clever broken implementation can not ever correctly handle that.

Simply put, there exist a bunch of optimizations related to dependency
checks that *almost* work, but do not preserve the semantics of a naive
recursive implementation – and more importantly, rely on assumptions
that are often wrong, but that is usually irrelevant. Skipping all these
dependency checks that seem obviously superfluous at first glance (often
related to the ”run” abstraction) does not always result in what users
would expect. What is special about redo-always is that it exposes
sloppyness in an implementation in a very obvious way.

“Building targets only once when a user asked for them to be built every
time” is not a design goal. It is a side effect of an implementation not
doing dependency checks, or pretending to be a top-down build system as
it actually is three bottom-up build systems in a trenchcoat. The appeal
of skipping them is that in almost all cases users get the same result,
but faster. The downside is … in some cases this leads to not rebuilding
targets that are, in fact, out of date – even when users explictly asked
for exactly that.

This means implementations that do these optimizations are necessarily
less capable. I suspect that redo-sh is able to handle every case that
apenwarr-redo can handle (just slower) … but the reverse is not true,
since some (not all) cases involving redo-always and/or redo-stamp
simply do not work on a conceptual level if you only build once.

> An aside: reasoning from test cases is ridiculous. Tests are designed to
> hit unusual edge cases and sometimes they have to resort to hacky
> workarounds and special APIs to do it.

I am not reasoning from a test case. The test case is an example that
shows how complicated it was for an implementor to sidestep the issue.
Anyone who is not a redo implementor does not have that luxury though.

You might not have understood the test case; it hits the main use case,
in an entirely intended way. Or maybe I did not understand it, but then
please explain what's the ”unusual edge case” heer. As I see it, it just
tests if redo-always works, but then runs into the problem that it works
differently depending on how you composed the build.
> To view this discussion on the web visit https://groups.google.com/d/msgid/redo-list/CABa1M23MYknuf%3DGV05miNJEjZfQ9DEWHvNZ18E-UxcWEwZrnjg%40mail.gmail.com.
signature.asc

Prakhar Goel

unread,
Aug 3, 2022, 9:07:31 PM8/3/22
to Nils Dagsson Moskopp, redo-list
Sigh... For reference, the last time we talked about this was:
https://groups.google.com/g/redo-list/c/ZzbyNWbPLjg/m/extXmPYECwAJ.

Consider a dep graph X -> Y, Z -> T. I.e. there's a diamond in the
graph. With a caching implementation, this takes linear time to
process. With a non-caching implementation, T gets checked twice. Now
what happens if there are a number of diamonds in a row? A caching
implementation still processes the entire graph in linear time. A
non-caching implementation however ends up checking every _path_ which
is in-general _exponential_ in the number of nodes.

So, does this happen in real life? _All the time!_ Imagine an object
target that includes some other files that in-turn use some of the
same header files. Or a website where every page has a footer that
depends on some of the same information. Or a target that depends on a
few scripts from the same tarball. Diamond dependencies are
_extremely_ common!

So, Nils, what do you call a system that takes exponential time for
common inputs when a linear-time algorithm is available? I call it
broken. But that's just, like, my opinion, man. Tell me, when you get
on a flight, do you prefer that it take the long way around the globe?

A naive recursive implementation of redo _is broken_. And your
distinction between "bottom-up" and "top-down" build systems is
completely fictitious. That redo happens to be implemented with
multiple processes (that all communicate with env vars anyway) is an
irrelevant implementation detail. A correct implementation caches the
dep checks and is in-fact more capable because it can process common
non-trivial dependency graphs that a naive implementation would choke
on.

I do agree that the concept of a "run" is important and it's
unfortunate that the implementations so far haven't been explicit and
precise about it. It's not a hack. It's an essential component to
determine the boundaries of the system. The obvious definition is that
every invocation of "redo" starts a new run (and as a consequence,
calling redo in the middle of a build is a fatal error). This would
allow a wrapper script to implement your impure use-cases without the
nasty exponential run-time concerns. It is of course perfectly
reasonable to use redo as a pure sub-component of an otherwise impure
system.

Anyway, I have better things to do than teach internet randos basic
complexity analysis and graph algorithms. This will be my last message
on this topic.
--
________________________
Warm Regards
Prakhar Goel

Nils Dagsson Moskopp

unread,
Aug 7, 2022, 3:10:41 PM8/7/22
to Prakhar Goel, redo-list
Prakhar Goel <newt...@gmail.com> writes:

> Sigh... For reference, the last time we talked about this was:
> https://groups.google.com/g/redo-list/c/ZzbyNWbPLjg/m/extXmPYECwAJ.

Thanks, that was a good reference.

> Consider a dep graph X -> Y, Z -> T. I.e. there's a diamond in the
> graph. With a caching implementation, this takes linear time to
> process. With a non-caching implementation, T gets checked twice. Now
> what happens if there are a number of diamonds in a row? A caching
> implementation still processes the entire graph in linear time. A
> non-caching implementation however ends up checking every _path_ which
> is in-general _exponential_ in the number of nodes.
>
> […]

The merits of a non-caching implementation beyond build correctness are
irrelevant.

The problem is not “using a cache”. The problem is “using a cache in a
way that the semantics do not match a non-caching implementation” – it
leads to erroneous results. In any kind of application, one might add.

For example, to make “redo-always” work with targets which must be built
multiple times one can simply choose to not cache the up-to-date status
of a target that is out of date after being built as being up to date.

Avery Pennarun's “flush the cache in the test cases” idea seems like it
is only necessary because apenwarr-redo unconditionally marks a target
once it was built as up-to-date. Could that be removed if it did not?

What I assume happens in implementations that are caching and can not
rebuild a target multiple times is either that they are either marking
things as up-to-date unconditionally in their cache or they are using
wrong cache keys for dependency check caching.

I suspect the ”run” abstraction might be a “wrong cache key” situation –
as using “has a target been built in this run”, i.e. target filename and
run id as cache key, can match a naive implementation if every target is
built at most once. Adding the parent target or current target (from
which redo-ifchange was invoked) to the cache key would still avoid
useless checks in the diamond dependency cases, but matches naive
recursive implementations more (but not entirely, or so I think).

Greetings,
Nils
signature.asc

Tomas Ebenlendr

unread,
Aug 8, 2022, 4:57:13 AM8/8/22
to Nils Dagsson Moskopp, Prakhar Goel, redo-list
Maybe this is only documentation issue. It seems to me that the
semantics here is, that most buildsystems supposes that every file
needs to be rebuild at most once for every run, i.e., you build all
dependencies first, then the file in question. This is achievable in
most situations. When user knows this restriction, they can write their
buildscripts around that. File that needs to be generated on _every_
invocation of a script may be just generated by direct call to that
script, no "buildsystem rule" needed. Things get more complicated, when
that file has some dependencies. But then you are taming some very
complex beast and you need to know how things are behaving on low level
anyways.


On Sun, 07 Aug 2022 21:10:28 +0200
Nils Dagsson Moskopp <ni...@dieweltistgarnichtso.net> wrote:

> Prakhar Goel <newt...@gmail.com> writes:
>
> > Sigh... For reference, the last time we talked about this was:
> > https://groups.google.com/g/redo-list/c/ZzbyNWbPLjg/m/extXmPYECwAJ.
>
> Thanks, that was a good reference.
>
> > Consider a dep graph X -> Y, Z -> T. I.e. there's a diamond in the
> > graph. With a caching implementation, this takes linear time to
> > process. With a non-caching implementation, T gets checked twice. Now
> > what happens if there are a number of diamonds in a row? A caching
> > implementation still processes the entire graph in linear time. A
> > non-caching implementation however ends up checking every _path_ which
> > is in-general _exponential_ in the number of nodes.
> >
> > […]
>
> The merits of a non-caching implementation beyond build correctness are
> irrelevant.
>
> The problem is not “using a cache”. The problem is “using a cache in a
> way that the semantics do not match a non-caching implementation” – it
> leads to erroneous results. In any kind of application, one might add.

That may be same as if you say, that optimizing dictionary (hash table,
associative array) so that iterating over all elements returns elements
in different order for each run of program, is breaking change, because
trivial implementation does return it deterministically. What you
describe here may be just behaviour of implementation that is not
defined in specification.

>
> For example, to make “redo-always” work with targets which must be built
> multiple times one can simply choose to not cache the up-to-date status
> of a target that is out of date after being built as being up to date.
>
> Avery Pennarun's “flush the cache in the test cases” idea seems like it
> is only necessary because apenwarr-redo unconditionally marks a target
> once it was built as up-to-date. Could that be removed if it did not?

This seems to me as workaround to simulate "multiple invocations", and
maybe the test should be implemented by multiple invocations of redo,
(i.e. launching separate instace of redo that knows nothing about redo
that is controlling the test suite, including separated metadata
storage), that would be better also from "tests as documentation"
perspective.

>
> What I assume happens in implementations that are caching and can not
> rebuild a target multiple times is either that they are either marking
> things as up-to-date unconditionally in their cache or they are using
> wrong cache keys for dependency check caching.
>
> I suspect the ”run” abstraction might be a “wrong cache key” situation –
> as using “has a target been built in this run”, i.e. target filename and
> run id as cache key, can match a naive implementation if every target is
> built at most once. Adding the parent target or current target (from
> which redo-ifchange was invoked) to the cache key would still avoid
> useless checks in the diamond dependency cases, but matches naive
> recursive implementations more (but not entirely, or so I think).
>
> Greetings,
> Nils
>


--
Tomáš 'ebík' Ebenlendr
Economia a.s.
PF 2022.60114580162


--

BEZPEČNOSTNÍ UPOZORNĚNÍ:
Obsah tohoto e-mailu a jeho jakékoli přílohy je
určen pouze uvedenému adresátovi (adresátům) a může obsahovat informace a
skutečnosti, jež jsou předmětem důvěrných informací společnosti Economia,
a.s. a jako takové musí být chráněny před vyzrazením. Pokud nejste
zamýšleným adresátem této zprávy nebo Vám byla tato zpráva zaslána omylem,
prosíme o upozornění odesílatele na tuto skutečnost a její bezodkladné
vymazání včetně všech příloh. Zároveň Vás v takovém případě upozorňujeme,
že jakékoli další použití, přeposílání, kopírování a ukládání této zprávy
nebo jejího obsahu a příloh je přísně zapovězeno a bude řešeno právní
cestou.

CONFIDENTIALITY NOTICE:
The contents of this e-mail message and
any attachments are intended solely for the addressee(s) and may contain
confidential and/or privileged information owned by Economia, a.s. and may
be legally protected from disclosure. If you are not the intended recipient
of this message or their agent, or if this message has been addressed to
you in error, please immediately alert the sender by reply e-mail and then
delete this message and any attachments. If you are not the intended
recipient, you are hereby notified that any use, dissemination, copying, or
storage of this message or its attachments is strictly prohibited.

Nils Dagsson Moskopp

unread,
Aug 8, 2022, 7:20:31 AM8/8/22
to Tomas Ebenlendr, Prakhar Goel, redo-list
Tomas Ebenlendr <tomas.e...@economia.cz> writes:

> Maybe this is only documentation issue. It seems to me that the
> semantics here is, that most buildsystems supposes that every file
> needs to be rebuild at most once for every run, i.e., you build all
> dependencies first, then the file in question.

IIRC the semantics for “redo-ifchange” are “ensure the target exists
(i.e. build it if necessary), then rebuild the current target if the
target was out of date before the invocation of this command”. I think
user expectations for a target that is always out of date are obvious.

They are, in fact, consistent with both approaches (build everything at
most once or build everything at most as often as redo-ifchange sees it
as out-of-date, i.e. multiple times if necessary) at first glance … but
the important differences become obvious as soon as one thinks “always”
means … well, always – and relies on that.

This is, in fact, one of the most common irritations I have seen about
differences between redo implementations, because it can not easily be
worked around:

A redo implementation that can build targets multiple times can build a
“redo-always” target exactly once – e.g. by using redo-stamp to ensure
it is not rebuilt twice in a row (but then why are you using redo-always
anyway?). An implementation that can not build a “redo-always” target
multiple times leaves no workaround to the user to ensure it is built
every time AFAIK.

> This is achievable in most situations. When user knows this
> restriction, they can write their buildscripts around that. File that
> needs to be generated on _every_ invocation of a script may be just
> generated by direct call to that script, no "buildsystem rule"
> needed. Things get more complicated, when that file has some
> dependencies. But then you are taming some very complex beast and you
> need to know how things are behaving on low level anyways.

A huge problem here is that a lot of use cases that involve conditional
rebuilds of the form ”rebuild a target unless a condition is fulfilled”
are not possible that way without making the build quite inefficient.

The typical pattern for that is to use redo-always to mark a target as
always out of date, then use redo-stamp to conditionally mark it as up
to date. This leads to the build being conditionally executed up until
the redo-stamp output determines if the target actually was or was not
up to date at built time.

Anyways, the simplest argument for making all redo implementations be
able to rebuild a target marked with “redo-always”, well, always, is
that a redo implementation that can rebuild a target multiple times
always creates build artifacts that match user expectations. A redo
implementation that can built a target at most once sometimes does
not match user expectations.

Again, this is not about if implementations should be optimized or not –
it is about which optimizations should be employed. And given that the
remedy for the redo-always situation (i.e. do not mark the target as up
to date in the dependency cache if it is still out of date after it has
been built) is cheap and scales well, I am curious why one would not do
it. One reason I can imagine is that reworking a fundamentally unsound
caching approach might mean that it is not easy to do so – but in that
case, some other things (likely involving redo-stamp) might be subtly
(or not so subtly) broken as well.

Greetings,
Nils
> --
> You received this message because you are subscribed to the Google Groups "redo" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redo-list+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/redo-list/20220808105659.6e9750dc%40ebiz.economia.cz.
signature.asc
Reply all
Reply to author
Forward
0 new messages