No redo with group dependencies

tob...@frilling-online.de

unread,

Jan 2, 2017, 10:22:25 AM1/2/17

to tup-users

Hello,

I have some commands in my build system that do not take the input files from the command line but have hardcoded input paths. My Tupfile looks basically like this:

: foreach src/*.in |> genx %f -o %o |> tmp/%B.out tmp/<out>

: tmp/<out> |> geny -o %o |> build/mainy
: tmp/<out> |> genz -o %o |> build/mainz

So some command generates files in tmp/, then geny and genz basically grab tmp/*.out to build two main files. So far so good.

But when I add a new *.in file in src/ tup will only execute the first line and not the last two. As a work-around I've added ;: %<out> to the last two commands, but that feels like a dirty hack, which I don't want in my build system, otherwise I would still use make :-)

Is there a more idiomatic way to have the group as a real dependency?

Mike Shal

unread,

Jan 11, 2017, 11:04:16 AM1/11/17

to tup-...@googlegroups.com

Hi Tobias,

I think the underlying problem here is less about the groups and more
about the fact that tup doesn't track dependencies on directories. I'm
assuming that your geny / genz commands do something like:

cat tmp/*.out > build/mainy

In other words, the geny command is the one responsible for doing the
wildcard expansion on tmp, which means its basically doing an
opendir()/readdir() on the tmp directory. Tup doesn't track the
opendir() like it tracks open(), though it definitely should if we
could find a way to do so. I think the main problem I have is trying
to figure out what to do in a case where you opendir() on a directory
and also write a file to that same directory - in other words, the
command both reads from and writes to the same directory, causing a
circular dependency. (Though it looks like your case doesn't have
this, due to the separate tmp/ and build/ directories, so maybe we
should try adding it?)

As a simpler example where tup fails here, consider this Tupfile:

: *.c |> gcc %f -o %o |> prog
: |> gcc *.c -o %o |> prog2

Both prog and prog2 are exactly the same, but in one case the wildcard
is tracked by tup, and in the other case the *.c is expanded by the
shell (meaning the second command should really have a dependency on
the current directory, as well as all the .c files). The first command
works properly when a new .c file is created, but the second one
incorrectly does not re-execute because we aren't tracking the
directory accesses. And this is an example of where it'd be a circular
dependency, since creating prog2 in the current directory changes the
contents of the directory, so then we'd have to re-execute the
command. So tup tries to work around not supporting opendir() by
making you do the wildcards in the Tupfile. In the prog/prog2 example,
you would favor the first one over the second. In your case, maybe
instead of geny doing:

(find -o flag)
cat tmp/*.out > $output

You would do:

(find -o flag and inputs)
cat $inputs > $output

and then using %<out> in your rule is a little more natural:

: tmp/<out> |> geny %<out> -o %o |> build/mainy

The way groups factor into your case is that group dependencies are
"sticky" dependencies, meaning they only affect the ordering of
commands that need to be executed. So if genx and geny both needed to
be rebuilt, then it ensures geny is built after genx. However,
"normal" dependencies are the ones that tup picks up automatically via
its dependency checking, so the files actually read by genx & geny are
the ones used to determine if they need to be rebuilt in the first
place. The sticky vs. normal link distinction is elaborated a bit in
the generated header example:
http://gittup.org/tup/ex_generated_header.html

When you use a %<group> inside the command string, it essentially
upgrades the sticky link to a normal link, which "fixes" your build
because it works around the fact that we ignore opendir() :/. If
anyone has ideas on how to incorporate directory level dependencies
without throwing in circular dependencies in places, please post!

Thanks,
-Mike

Peter Jaspers

unread,

Dec 8, 2023, 4:48:53 AM12/8/23

to tup-users

Disclaimer: I am not a tup user but nevertheless very interested in tup because I am part of a small team that is busy implementing a build system that is heavily inspired by Mike’s paper Build system rules and algorithms and by tup itself.

I do not understand why cyclic directory dependencies are different from cyclic file dependencies. How does tup behave when an output file is both read and written by a command? Does it ignore the read dependency? Or does it raise an error? Or does it always re-execute the command at next build?

For me the main concern with detecting directory dependencies is that this may result in unnecessary re-execution of commands.

E.g. the C/C++ compiler will lookup an include file in the directories specified by the -I flags or listed in the INCLUDE path environment variable. It can do so by using stat but it can also do so by (ab)using readdir.

In the stat case a file dependency is detected (also in case the file does not exist). This file dependency only causes recompilation when the file is created/modified/deleted. Exactly what we want. In the readdir case a directory dependency is detected. Any change to the directory will cause, mostly unnecessary, recompilation.

What we really need is a glob dependency. A glob obviously depends on a directory. Changes in the directory will cause the glob to re-execute. Re-execution of a command that depends on the glob however is only necessary when the glob result changes.

But how to detect which glob a program is executing? I am afraid that this is not possible. Remains the solution provided by tup: perform globs in the input section of a rule.

Op woensdag 11 januari 2017 om 17:04:16 UTC+1 schreef mar...@gmail.com:

Reply all

Reply to author

Forward