rebase_path(x) returning absolute paths considered harmful?

201 views
Skip to first unread message

Shai Barack

unread,
Apr 26, 2021, 12:41:07 PM4/26/21
to gn-dev
Hi gn-dev,

How do we feel about rebase_path(x) returning an absolute path? I feel like absolute path is a "safe default" because it's easy to identify that the path string is absolute and it can always be rebased to another directory later, unlike relative paths which leave room for interpretation. But OTOH absolute paths are generally undesirable in a build system, I feel, as they leak information that's outside the checkout which impairs hermeticity and reproducibility.

Some problems we've had with this on Fuchsia:
  • Leaks irrelevant or personally identifying information such as the user's username (e.g. /home/shayba/my_checkout/... in paths).
  • Breaks caching by invocation details (e.g. goma actions will miss the cache because some args are absolutes and no two engineers/bots have the same base path).
  • Breaks interactions between multiple bots, e.g. a pipeline where one bot performs an action and a followup bot performs a followup action.
Have other users had a similar experience?

I realize that others may have different opinions about this, and also that changing the one-parameter rebase_path() behavior or removing it entirely are both breaking changes. I propose adding an optional flag to //.gn to prohibit the one-parameter form of rebase_path(). Then users can clean house, or convert desired instances of abspaths to rebase_path(x, ""),and turn that flag on for their project. Alternatively there could be an allowlist, modeled after the existing "exec_script_whitelist".

Thoughts?

Brett Wilson

unread,
Apr 26, 2021, 4:57:34 PM4/26/21
to Shai Barack, gn-dev
rebase_path is used when interfacing with external tools and scripts. These external tools have an infinitely wide variation of required inputs, so it seems like GN should be able to support this.

I'm also not a fan of adding individual permission bits for different API calls because it's fiddly and hard to understand. It also means that builds and knowledge transfer even less across repositories than they do today.

Brett

Roland McGrath

unread,
Apr 26, 2021, 5:11:54 PM4/26/21
to Brett Wilson, Shai Barack, gn-dev
It's absolutely clear that no one size fits all.  However, I think it's entirely sensible for each project to want to maintain its own invariants about what's permissible in its build.  For Fuchsia, I think we have made a clear decision, that we're happy with, that an absolute ban on absolute paths in our build processes is an invariant worthwhile for us to achieve, maintain, and enforce in our build.  Life is complicated and there are indeed downsides to such a policy, but we have experienced the downsides of its lack and made the call that it's right for us.  Other projects will make different decisions, and I don't think this kind of diversity of project policy should be discouraged.  Code and knowledge transfer between projects that use GN is valuable, but there are other things of greater value to each project that lead to contrary decisions on how they should use GN.  I think GN should facilitate projects setting and enforcing the policies that work for them, rather than having a GN policy that requires that projects have no policy in a specific area if they want to use GN.

Dirk Pranke

unread,
Apr 26, 2021, 5:40:12 PM4/26/21
to Roland McGrath, Brett Wilson, Shai Barack, gn-dev
This sort of argument can be applied to many things (e.g., code formatting), and yet some may also fairly argue that having more knobs adds complexity for not an obviously positive  value (e.g., `go fmt`, the general pattern of convention-over-configuration).

That said, I'll note a couple other things about this specific topic:

1) It is true that absolute paths can cause issues for reproducibility when the paths are embedded in other things (e.g., debug symbols). However, they may not cause issues (e.g., invoking a command line script may not care whether the path is relative or absolute), and it's debatable whether having things like the ninja files themselves be hermetic and path-independent is nearly as useful.

2) Chromium appears to use the 1-arg format of rebase_path() approximately 5-6% of the time (~100 out of 1800 calls in chromium/src). I suspect nearly all if not all of them could easily be changed to default to path relative to root_build_dir or a 2-arg format. I do think in retrospect having the default be relative to root_build_dir would've been more useful, and I would be fine with a global (potentially breaking) change to that default, or a global (potentially breaking) change to make the second arg be required to be more self-documenting. I wouldn't be all that excited about a project-level switch for this.

-- Dirk

--
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.

Brett Wilson

unread,
Apr 26, 2021, 5:49:59 PM4/26/21
to Dirk Pranke, Roland McGrath, Shai Barack, gn-dev
On Mon, Apr 26, 2021 at 2:40 PM 'Dirk Pranke' via gn-dev <gn-...@chromium.org> wrote:
This sort of argument can be applied to many things (e.g., code formatting), and yet some may also fairly argue that having more knobs adds complexity for not an obviously positive  value (e.g., `go fmt`, the general pattern of convention-over-configuration).

That said, I'll note a couple other things about this specific topic:

1) It is true that absolute paths can cause issues for reproducibility when the paths are embedded in other things (e.g., debug symbols). However, they may not cause issues (e.g., invoking a command line script may not care whether the path is relative or absolute), and it's debatable whether having things like the ninja files themselves be hermetic and path-independent is nearly as useful.

2) Chromium appears to use the 1-arg format of rebase_path() approximately 5-6% of the time (~100 out of 1800 calls in chromium/src). I suspect nearly all if not all of them could easily be changed to default to path relative to root_build_dir or a 2-arg format. I do think in retrospect having the default be relative to root_build_dir would've been more useful, and I would be fine with a global (potentially breaking) change to that default, or a global (potentially breaking) change to make the second arg be required to be more self-documenting. I wouldn't be all that excited about a project-level switch for this.

I would also prefer the default to be root_build_dir, actually. It's a little bit mysterious without the second argument, but it's probably 95% of all uses. The current special case is there because the root path isn't actually relative to anything, so you can't pass any 2nd parameter that would give that result. This was added long before get_path_info was added which is another way to get this.

So actually I would be supportive of changing this to default to root_build_dir. get_path_info is a better way to get the absolute path. It's also different and annoying enough that people won't use it by mistake (what I expect most get_label_info uses are).

I still don't think there should be a project setting for this.

Brett

Shai Barack

unread,
Apr 26, 2021, 5:51:02 PM4/26/21
to Brett Wilson, Dirk Pranke, Roland McGrath, gn-dev
I was intrigued by the data that Dirk presented for Chromium, so I checked the same thing for Fuchsia and discovered that about 1/3 of rebase_path() calls are 1-arg.
I manually reviewed a random sample of 30 such instances and found that most of them were in args or cflags or other such things that we seem to agree should be rebased to root_build_dir, and the rest were unnecessary (could replace rebase_path(x) with x for the same effect). Not a single usage of rebase_path(x) that I reviewed in my random sample was in accordance with what I'd say are established best practices, or at least in accordance with the project's goals.

As a maintainer of a large GN tree (~23k BUILD.gn files, total of 800 *.gn? files containing some form of rebase_path) I'd be interested in conducting a cleanup, if I could introduce a regression stop feature into GN to help me lock in the wins.

Shai Barack

unread,
Apr 26, 2021, 5:52:09 PM4/26/21
to Brett Wilson, Dirk Pranke, Roland McGrath, gn-dev
I like Brett's idea of changing the default value of new_base to root_build_dir. But I reckon we'd need to put this behavior behind a flag, at least initially, no?

Brett Wilson

unread,
Apr 26, 2021, 5:56:31 PM4/26/21
to Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
On Mon, Apr 26, 2021 at 2:52 PM Shai Barack <sha...@google.com> wrote:
I like Brett's idea of changing the default value of new_base to root_build_dir. But I reckon we'd need to put this behavior behind a flag, at least initially, no?

I think we'd remove the one-arg version completely, roll GN everywhere, and then add it back with the new behavior.

I don't think new cases are getting added at such a rate that it's worth adding a complicated regression feature. It's easy to change the existing cases and it's also easy to check for existing ones with a local GN build. Even if somebody races with you and adds a new one while you're trying to roll, that's fast enough to fix. The cases where the allowlists helped were more complicated changes where somebody might have to actually think about how to fix it.

Brett

Shai Barack

unread,
Apr 26, 2021, 5:59:49 PM4/26/21
to Brett Wilson, Dirk Pranke, Roland McGrath, gn-dev
Do we have the means to communicate a pending behavioral breaking change to all users of GN?

Dirk Pranke

unread,
Apr 26, 2021, 6:00:48 PM4/26/21
to Shai Barack, Brett Wilson, Roland McGrath, gn-dev
I think this list is the best mechanism we have.

-- Dirk

Brett Wilson

unread,
Apr 26, 2021, 6:17:11 PM4/26/21
to Dirk Pranke, Shai Barack, Roland McGrath, gn-dev
Since there's no rush, I would also leave a substantial amount of time (>6 months?) before adding back a one-argument version. Or maybe we shouldn't even do it. I guess we can re-evaluate when the time comes.

Brett

Nico Weber

unread,
Apr 26, 2021, 6:33:10 PM4/26/21
to Brett Wilson, Dirk Pranke, Shai Barack, Roland McGrath, gn-dev
If anything happens here, please post a thread with a new subject that looks like "psa: one-arg rebase_path() going away, use $whatever instead". This would be easy to miss if it's mentioned only in this thread here, and I'm not clear on what $whatever is :)

Roland McGrath

unread,
Apr 26, 2021, 6:34:42 PM4/26/21
to Nico Weber, Brett Wilson, Dirk Pranke, Shai Barack, gn-dev
$whatever == get_path_info(path, "abspath")

Brett Wilson

unread,
Apr 26, 2021, 7:04:10 PM4/26/21
to Roland McGrath, Nico Weber, Dirk Pranke, Shai Barack, gn-dev
On Mon, Apr 26, 2021 at 3:34 PM Roland McGrath <mcgr...@chromium.org> wrote:
$whatever == get_path_info(path, "abspath")

I think we can just make the error message say that if you pass one argument, so it should be self-explanatory for people not watching the lists.
 
Brett

Aaron Wood

unread,
Apr 26, 2021, 8:19:06 PM4/26/21
to Brett Wilson, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
On Mon, Apr 26, 2021 at 2:56 PM Brett Wilson <bre...@chromium.org> wrote:
On Mon, Apr 26, 2021 at 2:52 PM Shai Barack <sha...@google.com> wrote:
I like Brett's idea of changing the default value of new_base to root_build_dir. But I reckon we'd need to put this behavior behind a flag, at least initially, no?

I think we'd remove the one-arg version completely, roll GN everywhere, and then add it back with the new behavior.

Could we drop the one-arg version of `rebase_path()`, and then add `rebase_path_from_root()` (or a suitable better name), so that the one-arg usage is clearer in what it's behavior is?
 

I don't think new cases are getting added at such a rate that it's worth adding a complicated regression feature. It's easy to change the existing cases and it's also easy to check for existing ones with a local GN build. Even if somebody races with you and adds a new one while you're trying to roll, that's fast enough to fix. The cases where the allowlists helped were more complicated changes where somebody might have to actually think about how to fix it.

Brett

--

Brett Wilson

unread,
Apr 26, 2021, 8:29:08 PM4/26/21
to Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
On Mon, Apr 26, 2021 at 5:19 PM 'Aaron Wood' via gn-dev <gn-...@chromium.org> wrote:
On Mon, Apr 26, 2021 at 2:56 PM Brett Wilson <bre...@chromium.org> wrote:
On Mon, Apr 26, 2021 at 2:52 PM Shai Barack <sha...@google.com> wrote:
I like Brett's idea of changing the default value of new_base to root_build_dir. But I reckon we'd need to put this behavior behind a flag, at least initially, no?

I think we'd remove the one-arg version completely, roll GN everywhere, and then add it back with the new behavior.

Could we drop the one-arg version of `rebase_path()`, and then add `rebase_path_from_root()` (or a suitable better name), so that the one-arg usage is clearer in what it's behavior is?

 Yes, I've considered that in the past, too. It might be better.

Brett

Wez

unread,
Apr 27, 2021, 7:51:41 AM4/27/21
to Brett Wilson, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
From the peanut gallery: +1 to avoiding providing a single-arg rebase_path(); during some recent GN hacking that definitely kept making me double-take; more explicit naming definitely seems preferable, IMO.

Peter Wen

unread,
Apr 27, 2021, 8:32:45 AM4/27/21
to Wez, Brett Wilson, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
Yup agreed. I learned about single-arg rebase_path() from this thread. Previously I always thought it was a two-arg function.

Peter

Brett Wilson

unread,
Apr 27, 2021, 10:41:36 AM4/27/21
to Peter Wen, Wez, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
I'm sold on a new function, too. My favorite name so far is get_build_path() since this is the function you use to get arguments to build tools.

Brett

Nico Weber

unread,
Apr 27, 2021, 11:00:36 AM4/27/21
to Brett Wilson, Peter Wen, Wez, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
Would the function just do the same thing as ` get_path_info(path, "abspath")`?

I'd caution against a function that makes it too easy to create absolute paths. If you pass absolute paths to a tool, it makes it harder to create something like distcc for that tool. In Chromium, we keep generated ninja files deterministic build-dir-independent. Not all projects will want to do that, but gently encouraging that in the cases where it's easily possible (often) seem like a good thing.

Brett Wilson

unread,
Apr 27, 2021, 11:05:55 AM4/27/21
to Nico Weber, Peter Wen, Wez, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
The new function is the same as rebase_path(foo, root_build_dir). The original point of the thread is to make it harder to create absolute paths, and removing the single-argument version of rebase_path(foo) which does this.

Nico Weber

unread,
Apr 27, 2021, 11:11:38 AM4/27/21
to Brett Wilson, Peter Wen, Wez, Aaron Wood, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
Oh, I see. All good then.

(Not sure why `rebase_path(foo, root_build_dir)` needs a separate name, but if that's the price for removing the one-arg version...)

Aaron Wood

unread,
Apr 27, 2021, 6:21:29 PM4/27/21
to Nico Weber, Brett Wilson, Peter Wen, Wez, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev
I also like the idea of  `get_build_path(foo)` as a replacement for `rebase_path(foo, root_build_dir)`.

Philipp Wollermann

unread,
Jul 14, 2023, 12:51:24 AM7/14/23
to gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Shai Barack, Dirk Pranke, Roland McGrath, gn-dev, Nico Weber, chrome-build-team
Hi folks,

As you might know, the Chrome Build Team is working on making all build actions in Chromium's build compatible with remote caching / execution, which conveniently would also imply making them compatible with local sandboxing.

A requirement for this is to ensure that all paths used in the build are relative, because otherwise they can't easily be reproduced inside sandboxes (whether locally or on a remote execution backend). Absolute paths in command-lines and generated files also prevent sharing cache hits between users and different Chromium checkouts on a machine, due to action hashes being different.

I filed b/291175318 to track the clean-up to make all paths in Chromium's build relative and already sent a few CLs to fix what I could find.

In order to prevent regressions, it would be great if we could require passing a `new_base` parameter to `rebase_path`. I was super happy to find this thread this discussing this exact idea :D

What was the outcome of this? Would it be OK to attempt to change `rebase_path`?

Cheers,
Philipp

Shai Barack

unread,
Jul 14, 2023, 12:55:34 AM7/14/23
to Philipp Wollermann, gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Dirk Pranke, Roland McGrath, Nico Weber, chrome-build-team
Philipp,
I agree with your goals and with your reasoning.

To my knowledge, none of the proposed changes that were discussed here were implemented. 

Andrew Grieve

unread,
Jul 14, 2023, 9:56:36 AM7/14/23
to Shai Barack, Philipp Wollermann, gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Dirk Pranke, Roland McGrath, Nico Weber, chrome-build-team
If you (or someone else) is going to take it on, it might make sense to write a refactoring script to convert existing calls to the new way (whatever syntax is decided upon).

I don't love the idea of adding a new function with a similar sounding name (get_build_path). My vote would be to take on some migrations and add "system_absolute" as an option to get_path_info().

E.g.:
  1. Add the refactor script to GN's .git
  2. Use the script to convert: rebase_path(single_arg) -> get_path_info(single_arg, "system_absolute")
  3. Make rebase_path's second arg mandatory
  4. Wait ~ a month
  5. Make rebase_path's second arg optional, and default to root_build_dir
  6. Use a script to convert: rebase_path(path, root_build_dir) -> rebase_path(path)

Dirk Pranke

unread,
Jul 19, 2023, 8:15:25 PM7/19/23
to Andrew Grieve, Shai Barack, Philipp Wollermann, gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Roland McGrath, Nico Weber, chrome-build-team
It seems generally bad to me to change the meaning of things in non-backwards-compatible ways, so I'm not actually wild about changing anything at all, even if there is general consensus that single-arg rebase_path() is too easily used. In Chromium, I'd assume we'd handle banning it via a presubmit check and call that a day.

IMO the alternatives should be that or some way to declare which version of GN's semantics we want, like Ninja has with `ninja_required_version`. I agree with Brett's comment in the original thread that we shouldn't add per-project settings for this if we can avoid it, and I think we can avoid it here. I would have no objection to a `gn_required_version`, I think it would've been useful to have had that in the past, and I imagine it will be useful again in the future.

Separately, if we did decide to break backwards compatibility, I'd probably still want to only make single-arg rebase_path() an error, rather than (eventually) changing it to have a different default. If you wanted `rebase_path(foo, root_build_dir)` you should just use `rebase_path(foo, root_build_dir)`. I think the value in being explicit in this case is greater than the value of having a default value.

-- Dirk

Roland McGrath

unread,
Jul 19, 2023, 9:24:07 PM7/19/23
to Dirk Pranke, Andrew Grieve, Shai Barack, Philipp Wollermann, gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Nico Weber, chrome-build-team
I think the other idea that's been operative in this thread is to make the commonly right thing to do be very easy to spell correctly and hard to accidentally spell wrong.  That's where something like `get_build_path(foo)` as the new common idiom has the advantage, basically in making any kind of `rebase_path` spelling sufficiently uncommon that each use is likely to get good scrutiny for doing exactly what it intends to do and intending the best things for its circumstance--based on the understanding that `rebase_path` has arcane nontrivial semantics one always needs to think about.  If the `rebase_path` keyword doesn't appear in the most common use case at all, and there's no way to misuse the spelling that is used in the most common case (`get_build_path` or whatever it is), then having any hard removals of the `rebase_path` footgun cases is much less likely to be compelling or important to folks.

Dirk Pranke

unread,
Jul 19, 2023, 10:01:21 PM7/19/23
to Roland McGrath, Andrew Grieve, Shai Barack, Philipp Wollermann, gn-dev, Aaron Wood, Brett Wilson, Peter Wen, Wez, Nico Weber, chrome-build-team
I can see the merit in the argument that adding a new function might make all other uses of rebase_path only needed in edge cases that have to be considered specially. 

However, it seems like you still have to ban `rebase_path(foo)`, right? Otherwise you don't meet Shai and Philipp's goals. If so, there's no way to avoid making a breaking change, and so I think my argument for `required_gn_version` still seems like a good idea to me.

And, if you added `get_build_path(foo)`, would you also try to ban `rebase_path(foo, root_build_dir)` to avoid the confusion? That feels kinda awkward at first thought.

-- Dirk


Reply all
Reply to author
Forward
0 new messages