trimja

70 views
Skip to first unread message

Elliot Goodrich

unread,
Oct 21, 2024, 9:33:08 AM10/21/24
to ninja-build
I'd like to share my project trimja - a tool to cut down a Ninja build file to only the build commands that depend on or are needed by a set of paths.  This can be used to reduce the amount of work done in CI by only building things that could be affected by the files changed by the pull request.

The main restriction is that you need access to the .ninja_log and .ninja_deps files from a full, successful build in order for trimja to work. There's an implementation of a Github action to do this - trimja-action.

One thing we could do is implement https://groups.google.com/g/ninja-build/c/J47BjzF6A10/m/9tFJsZYvAAAJ to replace all build commands that are needed, but not affected with a different rule to get that file from a cache.

If trimja + file hashes were both added to ninja, then we'd be able to dynamically remove additional work from the graph in situations were a file was modified but in a way that did not affect its output, e.g. changing comments in a C/C++ source file.

Ben Boeckel

unread,
Oct 21, 2024, 6:18:23 PM10/21/24
to Elliot Goodrich, ninja-build
Hi,

On Sun, Oct 20, 2024 at 23:04:56 -0700, Elliot Goodrich wrote:
> I'd like to share my project trimja
> <https://github.com/elliotgoodrich/trimja> - a tool to cut down a Ninja
> build file to only the build commands that depend on or are needed by a set
> of paths. This can be used to reduce the amount of work done in CI by only
> building things that could be affected by the files changed by the pull
> request.
>
> The main restriction is that you need access to the .ninja_log and
> .ninja_deps files from a full, successful build in order for trimja to
> work. There's an implementation of a Github action to do this -
> trimja-action <https://github.com/elliotgoodrich/trimja-action>.
>
> One thing we could do is implement
> https://groups.google.com/g/ninja-build/c/J47BjzF6A10/m/9tFJsZYvAAAJ to
> replace all build commands that are needed, but not affected with a
> different rule to get that file from a cache.

Neat tool. Alas, needing a deps and log from a previous build means that
it is difficult to use from CI (paths need pinned, build configuration
needs to be compared, etc.). But for "simple" projects without 100s of
options, this sounds like a good tool to help out with that.

> If trimja + file hashes <https://github.com/ninja-build/ninja/issues/1459>
> were both added to ninja, then we'd be able to dynamically remove
> additional work from the graph in situations were a file was modified but
> in a way that did not affect its output, e.g. changing comments in a C/C++
> source file.

Note that comments can affect output, particularly if debugging is
included as source locations can change based on comment content. Note
that you also might want to rerun due to comment modification for things
like `clang`'s `-Wdocumentation` flag that issues diagnostics based on
Doxygen syntax in comments.

I don't think a straight content hash is as useful as some kind of
"fingerprint" extractor that can actually ignore irrelevant changes to
the file (e.g., diffing a comment-aware AST to see that

if (foo)
bar;

and

if (foo) {
bar;
}

are the same (of course, debugging info can make even this a semantic
change, so the command line in use also matters).

See this issue: https://github.com/ninja-build/ninja/issues/1459

Thanks,

--Ben

elliotg...@gmail.com

unread,
Oct 22, 2024, 12:58:35 AM10/22/24
to ninja-build
> Neat tool. Alas, needing a deps and log from a previous build means that
> it is difficult to use from CI (paths need pinned, build configuration
> needs to be compared, etc.). But for "simple" projects without 100s of
> options, this sounds like a good tool to help out with that.

Thank you. trimja-action has a `build-configuration` input (https://github.com/elliotgoodrich/trimja-action?tab=readme-ov-file#instructions) to uniquely describe the build config so that you can have multiple caches of .ninja_deps/.ninja_logs. If you get this wrong then the command hashes will differ and trimja won't be able to remove those commands and you won't get any speed-up. What paths are you talking about when you mentioned pinning?


> I don't think a straight content hash is as useful as some kind of
> "fingerprint" extractor that can actually ignore irrelevant changes to
> the file (e.g., diffing a comment-aware AST to see that

An AST fingerprinter would be nice and could avoid compiling the file in the first place.  You would benefit hashing the output object file and then using that to avoid linking if the hash didn't change. My knowledge of object file internals is limited, but I'd hope this would catch modifications that affected only comments or other aesthetic changes.

Thanks,
Elliot

Ben Boeckel

unread,
Nov 6, 2024, 7:18:54 AM11/6/24
to elliotg...@gmail.com, ninja-build
On Mon, Oct 21, 2024 at 21:58:34 -0700, elliotg...@gmail.com wrote:
> > Neat tool. Alas, needing a deps and log from a previous build means that
> > it is difficult to use from CI (paths need pinned, build configuration
> > needs to be compared, etc.). But for "simple" projects without 100s of
> > options, this sounds like a good tool to help out with that.
>
> Thank you. trimja-action has a `build-configuration` input
> (https://github.com/elliotgoodrich/trimja-action?tab=readme-ov-file#instructions)
> to uniquely describe the build config so that you can have multiple caches
> of .ninja_deps/.ninja_logs. If you get this wrong then the command hashes
> will differ and trimja won't be able to remove those commands and you won't
> get any speed-up. What paths are you talking about when you mentioned
> pinning?

That sounds good. By "pinning" I meant that if you have CI running on N
machines, they all need to agree on an absolute path to use for the
build so that caches hit.

> > I don't think a straight content hash is as useful as some kind of
> > "fingerprint" extractor that can actually ignore irrelevant changes to
> > the file (e.g., diffing a comment-aware AST to see that
>
> An AST fingerprinter would be nice and could avoid compiling the file in
> the first place. You would benefit hashing the output object file and then
> using that to avoid linking if the hash didn't change. My knowledge of
> object file internals is limited, but I'd hope this would catch
> modifications that affected only comments or other aesthetic changes.

Alas, things like C++'s `std::source_location` (and equivalents in other
languages) show up all over the place, not least for diagnostic
reporting, even in binaries. YMMV though.

--Ben

elliotg...@gmail.com

unread,
Nov 7, 2024, 4:05:43 PM11/7/24
to ninja-build
> That sounds good. By "pinning" I meant that if you have CI running on N
> machines, they all need to agree on an absolute path to use for the
> build so that caches hit.

Thank you, I follow now. Yes that can definitely be an issue. I believe that trimja could optionally convert absolute paths to relative paths in the output .build file and the .ninja_deps file to counteract this.  We would have to assume that the commands would also work with relative inputs. I've added an issue to keep track of it https://github.com/elliotgoodrich/trimja/issues/112

Reply all
Reply to author
Forward
0 new messages