Re: Runfiles#fingerprint

38 views
Skip to first unread message

Lukács T. Berki

unread,
May 12, 2023, 11:45:16 AM5/12/23
to Fabian Meumertzheim, Xudong Yang, Nathan Harmata, bazel-discuss
+Nathan in case he has a moment to spare and thinks otherwise

On Fri, May 12, 2023 at 2:13 PM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:
Hi Lukács,

While working on fixing some subtle aspects of RepoMappingManifestAction related to runfiles symlinks, I delved into Runfiles#fingerprint and wondered why it is the way it is.

Specifically:

1. Some collection sizes aren't included in the fingerprint, which, at least in theory, could result in incorrect incremental builds: https://cs.opensource.google/bazel/bazel/+/1061a8c2bc5327452ede67d5a0c3cc2820c2a1eb:src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java;l=1177-1184
This is a slam dunk. We'd be grateful for a fix, I think this was a simple oversight on our part.


2. #fingerprint and #getEmptyFilenames flatten all nested sets and don't use the fingerprint cache for nested sets provided by ActionKeyContext. It seems that this could be changed simply by having custom MapFns for SymlinkEntry and Artifact. Assuming that the EmptyFilesSupplier is a pure function of its inputs, it probably wouldn't have to be included in the digest at all.
I think for #fingerprint(), this is also a slam dunk, it's just that the code evolved in such a way that it didn't occur to anyone to use the fingerprint cache. It's a little more complicated than trivial because Artifact.getRunfilesPath() also needs to be part of the key, but that's an obstacle that's entirely surmountable.

The reason why the set of empty files needs to be in the key is that even though it only gets a list of paths, it can close over any data produced during the analysis phase, so the same set of runfiles can result in a different set of empty files. I could imagine fixing this by adding a computeKey() method to EmptyRunfilesSupplier, but ignoring it doesn't work.

I don't know what the performance impact would be. 


Happy to improve any of that, but I would first like to know confirm that 1) this isn't intended for a reason I am unaware of and 2) isn't negligible anyway (e.g., since it only applies to executable targets, which only make up a small fraction of all targets).

Best,
Fabian


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Nathan Harmata

unread,
May 12, 2023, 12:22:48 PM5/12/23
to Lukács T. Berki, Fabian Meumertzheim, Xudong Yang, bazel-discuss, Eric Fellheimer
+cc fe...@google.com (Eric, this is a public email thread)

I took a glance and I agree with Lukács. Here is a little more information.

1. This looks like an oversight in https://github.com/bazelbuild/bazel/commit/c31c8c9c54958b7ac3ac568aa5564ae7d7f4a718 (internal CL 210798673). Eric, given that we're expanding the nested set already, any concerns with adding its size to the fingerprint?

2. Similar story. https://github.com/bazelbuild/bazel/commit/3d1a194ff9e76f25f1a7242ff2d021523ba8e4a0 (internal CL 177359607) added ActionKeyContext and predates c31c8c9c54958b7ac3ac568aa5564ae7d7f4a718. It migrated a ton of code to use ActionKeyContext and then I guess in the new code in c31c8c9c54958b7ac3ac568aa5564ae7d7f4a718 we forgot to use ActionKeyContext. Tbh I'm not super hip on this part of the codebase so I might not have caught this during code review myself---I wonder if there are other new things that postdate 3d1a194ff9e76f25f1a7242ff2d021523ba8e4a0 that ought to be using ActionKeyContext.


> I don't know what the performance impact would be. 

Depends on the situation.

For your RepoMappingManifestAction situation, I'm imagining the nested sets involved with RepoMappingManifestAction don't have the massive scale where ActionKeyContext would have large impact.

For the original situation in 3d1a194ff9e76f25f1a7242ff2d021523ba8e4a0, that commit was motivated by something that doesn't exist in Bazel. I don't have good intuition about that situation's scale so idk.

For the situations of other callers of Runfiles#fingerprint, same thought about scale. Eric, do you have good intuition here? Is the `runfiles` used in https://github.com/bazelbuild/bazel/blob/43306947cc5448df2215ebbb2aa5a5e3c64c754c/src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java#L139 potentially referring to a large nested set?

Eric Fellheimer

unread,
May 24, 2023, 4:53:33 PM5/24/23
to Nathan Harmata, Lukács T. Berki, Fabian Meumertzheim, Xudong Yang, bazel-discuss
No concerns with adding the size to the fingerprint. I agree with your intuition on this.

Eric Fellheimer
fe...@google.com

Nathan Harmata

unread,
May 24, 2023, 4:57:37 PM5/24/23
to Eric Fellheimer, Lukács T. Berki, Fabian Meumertzheim, Xudong Yang, bazel-discuss
Thanks Eric.

Fyi this got implemented in https://github.com/bazelbuild/bazel/pull/18384.
Reply all
Reply to author
Forward
0 new messages