Execute :-rules in bash with `set -o pipefail`

350 views
Skip to first unread message

Erik

unread,
Dec 8, 2015, 8:57:10 AM12/8/15
to tup-users
I generally want my tup :-rules to be run in bash with `set -o pipefail` enabled. I could add this to every single rule but that's verbose and error prone. Someone on stackoverflow recommended I try doing this with shared library interposition, which seemed like it might work. My hypothesis at the time was that the commands were executed using glibc's `system` call. That seems to be false, but spending some time perusing the codebase, I couldn't actually find where the commands were executed. It doesn't help that I'm not very familiar with fuse or pthread. This leaves me with a few questions:
  • Is shared library interposition a viable root to accomplish this? If the commands are executed almost directly by a call to some shared library this seems like it could work, wouldn't be too hard, and wouldn't be mandatory, or require me (or anyone else) to patch and recompile tup source.
  • If it is viable can someone point me to the appropriate place to look for where to do the intercept? In addition to reading relevant pieces of the code (or so I thought), I also tried tracing command execution with ltrace, but it seemed to interfere with fuse, and I couldn't manage to get fuse running with ltrace active.
  • If it's not viable, is there a better method to accomplish this that is relatively trivial (on the order of writing a simple interposer)?
  • If none of these, is there a potentially better way to improve tup to allow this? I'm open to contributing a pull request if there's a relevant piece to write. It seems like maybe with the lua api you could have a function that arbitrarily modifies every command before it's executed, but then you'd have to fetch back to the lua api from main tup which seems like a pain and potentially slow. There could be a specific flag to do specifically this (e.g. wrap every command in "bash -c '<command with escaped apostrophes>'") but that seems a little hard coded and less portable. I'm open to other suggestions.
Thanks,
Erik

Mike Shal

unread,
Dec 8, 2015, 11:37:02 AM12/8/15
to tup-...@googlegroups.com
That sounds like it may be a good thing for tup to support. What kinds of commands are you running that you'd want this set universally?

The code is a little wonky to follow because you can't fork a process from a process running fuse without race conditions. The sub-processes are executed in master_fork.c using execle() - https://github.com/gittup/tup/blob/master/src/tup/server/master_fork.c#L654

Note that tup currently just runs '/bin/sh -e -c [cmd]', and it doesn't look like the stock bourne shell supports pipefail as far as I can tell. So I guess we'd need some way to set a flag or something in the rules, and then run bash if using pipefail?

Also, I'm not sure of the best way to communicate to tup that a certain rule should use pipefail. Currently all of the information about a rule is in the :-rule line itself, so we'd probably want to use a ^-flag or something:

: |> ^p^ cmd1 | cmd2 |>

So it would run 'cmd1 | cmd2' with pipefail enabled. However, since you'd have to add the ^p flag for each rule that uses it, I guess all that really buys you is fewer keystrokes vs specifying bash directly.

Would others find this useful?

-Mike

Erik

unread,
Dec 11, 2015, 2:31:10 PM12/11/15
to tup-users
I probably use Tup a little differently than most people. Instead of using it to build software (which I do sometimes), I use it for repeated data analysis and graph generation. One my tup rules looks something like

: input.json |> < %f simulator | jq -f extract_something.jq > %o |> output.json

where the simulator produces a lot of output, but in this case I only care about one aspect. Without pipefail, the simulator might fail, jq will get empty input, which it's fine with, and the result is an empty output file. This then causes the next rule that uses output.json to fail, because it's empty, but the problem was the previous rule.

In general, I find it hard to believe that (especially with tup), there's ever a time when you wouldn't want pipefail to be active, but maybe I'm unique in that feeling. One interesting thing that I didn't realize is that sh is run with the -e flag, which means any failed command is a failure. E.g. currently in tup the rule

: |> false; echo foo |>

will fail, but the rule

: |> false | echo foo |>

will succeed.

Thanks for pointing me to the appropriate place in tup for where commands were executed. With it I was able to write an interposer that does what I want. Essentially it looks for execle calls that look like '/bin/sh -e -c cmd' and turns them into execve calls that look like '/bin/bash -e -o pipefail -c cmd'.

I like your suggestion about the ^p^ flag, although it seems a little strange that a flag would also change execution to bash. However, it feels like a reasonable way to accomplish this without requiring pipefail to be active for all :-rules. After looking at the code though, another option that seems easy would be to have a tup config option that specifies the command that executes all of :-rules. This may be somewhat frowned upon due to "changing any of these options should not affect the end result of a successful build" (*), but having the default:

updater.command = /bin/sh -e -c

be there, but have the ability to change it to

updater.command = /bin/bash -e -o pipefail -c

would be very convenient, and would provide a lot of flexibility with how tup is run. E.g. a user could just specify bash to get access to the more advanced bash syntax. The downside is that it has the potential to violate the rules (*) of a tup config as stated above. An potentially absurd example would be setting

update.command = /usr/bin/env python -c

which would execute all :-rules in python.

I'm curious about everyone's thoughts about this. I have my hacky solution, which means I'm fine for the time being, but I'd be up for submitting a pull request of a more reasonable implementation of one of these ideas if it seemed worthwhile.

Erik

Mike Shal

unread,
Dec 16, 2015, 2:21:01 PM12/16/15
to tup-...@googlegroups.com
On Fri, Dec 11, 2015 at 2:31 PM, Erik <erik.b...@gmail.com> wrote:
I probably use Tup a little differently than most people. Instead of using it to build software (which I do sometimes), I use it for repeated data analysis and graph generation. One my tup rules looks something like

: input.json |> < %f simulator | jq -f extract_something.jq > %o |> output.json

where the simulator produces a lot of output, but in this case I only care about one aspect. Without pipefail, the simulator might fail, jq will get empty input, which it's fine with, and the result is an empty output file. This then causes the next rule that uses output.json to fail, because it's empty, but the problem was the previous rule.

Another option here is to move the command bits into a bash script, and execute that in the rule:

foo.bash:
#! /bin/bash -e -o pipefail (does this work?)
$1 simulator | jq -f ...

Tupfile:
: input.json |> ./foo.bash %f > %o |> output.json

That's not ideal if the majority of your commands are pipelines though, of course.
 

In general, I find it hard to believe that (especially with tup), there's ever a time when you wouldn't want pipefail to be active, but maybe I'm unique in that feeling.

I'm inclined to agree, and if /bin/sh supported pipefail I'd definitely consider turning it on. Maybe it does and I haven't found where that is in the manual?

I'm hesitant to change the shell wholesale to bash though - even though I use it everyday, I'm not sure if there are places where it isn't installed by default.
 


I like your suggestion about the ^p^ flag, although it seems a little strange that a flag would also change execution to bash. However, it feels like a reasonable way to accomplish this without requiring pipefail to be active for all :-rules. After looking at the code though, another option that seems easy would be to have a tup config option that specifies the command that executes all of :-rules. This may be somewhat frowned upon due to "changing any of these options should not affect the end result of a successful build" (*), but having the default:

updater.command = /bin/sh -e -c

be there, but have the ability to change it to

updater.command = /bin/bash -e -o pipefail -c

would be very convenient, and would provide a lot of flexibility with how tup is run. E.g. a user could just specify bash to get access to the more advanced bash syntax. The downside is that it has the potential to violate the rules (*) of a tup config as stated above. An potentially absurd example would be setting

update.command = /usr/bin/env python -c

I think if we did something like this, we would need to specify it in the :-rule somehow rather than in the options. For example, if one of the rules requires bash (or other shell) for something, you wouldn't want to require every user of the software to overwrite their options file with the correct update.command setting.

Potentially we could add this more easily in the lua parser, with something like:

tup.definerule{command='false | echo foo', shell='/bin/bash -e -o pipefail -c'}

I'm not sure what it would look like in the regular Tupfile parser without something like the ^-flag suggestion, which is less flexible.
 

I'm curious about everyone's thoughts about this. I have my hacky solution, which means I'm fine for the time being, but I'd be up for submitting a pull request of a more reasonable implementation of one of these ideas if it seemed worthwhile.


I'm curious if others have thoughts here too :)

Thanks,
-Mike

Ben Boeckel

unread,
Dec 16, 2015, 3:25:04 PM12/16/15
to tup-...@googlegroups.com
On Wed, Dec 16, 2015 at 14:20:59 -0500, Mike Shal wrote:
> foo.bash:
> #! /bin/bash -e -o pipefail (does this work?)

No. Shebang lines only support a single argument. This is the same as:

/bin/bash '-e -o pipefail' foo.bash

Instead, use shopt to set pipefail, and maybe even 'set -e' since the
script should be '#!/usr/bin/env bash' so that systems without bash in
/bin (mainly BSDs) will work.

--Ben

Erik

unread,
Dec 22, 2015, 6:07:39 PM12/22/15
to tup-users


On Wednesday, December 16, 2015 at 2:21:01 PM UTC-5, mar...@gmail.com wrote:
On Fri, Dec 11, 2015 at 2:31 PM, Erik <erik.b...@gmail.com> wrote:
I probably use Tup a little differently than most people. Instead of using it to build software (which I do sometimes), I use it for repeated data analysis and graph generation. One my tup rules looks something like

: input.json |> < %f simulator | jq -f extract_something.jq > %o |> output.json

where the simulator produces a lot of output, but in this case I only care about one aspect. Without pipefail, the simulator might fail, jq will get empty input, which it's fine with, and the result is an empty output file. This then causes the next rule that uses output.json to fail, because it's empty, but the problem was the previous rule.

Another option here is to move the command bits into a bash script, and execute that in the rule:

foo.bash:
#! /bin/bash -e -o pipefail (does this work?)
$1 simulator | jq -f ...

Tupfile:
: input.json |> ./foo.bash %f > %o |> output.json

That's not ideal if the majority of your commands are pipelines though, of course.

As Ben said, you can't do this per say, but you could have a bash script and make the first line 'set -e -o pipefail'. This does work, but almost equally as annoying as the other solutions.
 
 

In general, I find it hard to believe that (especially with tup), there's ever a time when you wouldn't want pipefail to be active, but maybe I'm unique in that feeling.

I'm inclined to agree, and if /bin/sh supported pipefail I'd definitely consider turning it on. Maybe it does and I haven't found where that is in the manual?

I'm hesitant to change the shell wholesale to bash though - even though I use it everyday, I'm not sure if there are places where it isn't installed by default.

As far as I can tell, sh has not pipefail options, and my search for workarounds online have yielded nothing satisfactory. I think I generally agree that it would be potentially problematic if tup switched to bash. I don't know of any places where it's not included, but it seems likely that the small gain isn't worth removing tup from users who don't have bash installed.
 
 


I like your suggestion about the ^p^ flag, although it seems a little strange that a flag would also change execution to bash. However, it feels like a reasonable way to accomplish this without requiring pipefail to be active for all :-rules. After looking at the code though, another option that seems easy would be to have a tup config option that specifies the command that executes all of :-rules. This may be somewhat frowned upon due to "changing any of these options should not affect the end result of a successful build" (*), but having the default:

updater.command = /bin/sh -e -c

be there, but have the ability to change it to

updater.command = /bin/bash -e -o pipefail -c

would be very convenient, and would provide a lot of flexibility with how tup is run. E.g. a user could just specify bash to get access to the more advanced bash syntax. The downside is that it has the potential to violate the rules (*) of a tup config as stated above. An potentially absurd example would be setting

update.command = /usr/bin/env python -c

I think if we did something like this, we would need to specify it in the :-rule somehow rather than in the options. For example, if one of the rules requires bash (or other shell) for something, you wouldn't want to require every user of the software to overwrite their options file with the correct update.command setting.

Potentially we could add this more easily in the lua parser, with something like:

tup.definerule{command='false | echo foo', shell='/bin/bash -e -o pipefail -c'}

I'm not sure what it would look like in the regular Tupfile parser without something like the ^-flag suggestion, which is less flexible.

I can get behind a ^-flag that changes the shell. Potentially just ^b for bash with pipefail. It seems like the fact that the command is executing in bash not sh is more relevant than pipefail, which seems like a safe default for tup if it were executing commands in bash. Pending any other comments, I'll look into adding a flag that does this (using /usr/bin/env bash). If that works, I'll look into adding a shell option to the lua parser.

Pat Pannuto

unread,
Dec 22, 2015, 6:29:08 PM12/22/15
to tup-...@googlegroups.com
As a small usability thought, perhaps it would be better to change the
semantics from carat _flags_ to carat _directives_. Specifically, I'm
thinking the syntax should be something more like ^chroot (replacing
^c) and ^bash.

The one-character %-flags in tup rules are a little more standard and
align with natural idioms for a build system, easy to recall or
intuit. That's less true with the ^-flags as they change underlying
mechanics in a way that may be non-obvious.

On 12/22/15, Erik <erik.b...@gmail.com> wrote:
>
>
> On Wednesday, December 16, 2015 at 2:21:01 PM UTC-5, mar...@gmail.com
> wrote:
>>
>> On Fri, Dec 11, 2015 at 2:31 PM, Erik <erik.b...@gmail.com <javascript:>>
> --
> --
> tup-users mailing list
> email: tup-...@googlegroups.com
> unsubscribe: tup-users+...@googlegroups.com
> options: http://groups.google.com/group/tup-users?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "tup-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tup-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Erik

unread,
Dec 22, 2015, 6:46:00 PM12/22/15
to tup-users, ppan...@umich.edu
It's true that the one character ^-flags are a little non-obvious. I think the main issue is that the opening carats are used both for the ^-flags and the optional name of the command. If tup were to simply move to ^-directives, how would you propose to parse multiple directives? Currently ^oc^ is distinct from ^o chroot^. But if carats were directives, it seems less obvious.

Two simple options are:
  1. You could put some special delimiter like ^output,chroot^ does both directives, and ^output chroot^ does output detection of a command named chroot, but that seems potentially more obtuse (although admittedly maybe only in this highly contrived example).
  2. You could view the optional name as another directive (like rename) that takes one argument, so maybe it would be ^output chroot^ versus ^output rename chroot^ but any kind of change where tup has directives that take different numbers of arguments will add some complexity to the ^-directive parser.
Neither of these seem great. There's probably some better solution that I'm not thinking of. However, this change seems like something to be scheduled for 0.8, while adding the ^b flag could be harmlessly pushed to 0.7.X.

Erik Brinkman

unread,
Apr 6, 2016, 12:21:10 AM4/6/16
to tup-users, ppan...@umich.edu
In lieu of doing the full change that pat recommended, I have a patch that I believe does what it should. However, I have a couple questions that I'm hoping other people can offer guidance on.
  1. To make the change I introduced another int flag to execmsg similar to need_namespacing. Is it okay that execmsg has two ints for two bits or is it worth making a general flags int?
  2. The flag really only makes sense on *nix, but I had to modify the signature of server_exec. Presumably on windows windepfile.c will complain about an unused variable. Is there a nice way around this? Is there a way to test a windows compile other than asking someone to checkout my branch and compile and test?
Thanks!
Reply all
Reply to author
Forward
0 new messages