The problem here is that bazel is currently fairly conservative about what configuration changes mean the outputs change.
As an example, if you have a simple dependency tree like this:
go_binary(name = "binary", srcs = ..., deps = [":library"],)
go_library(name = "library", srcs = ...)
When you run `bazel build --flags //:binary`, Bazel knows that both `:binary` and `:library` are in the same configuration and have the same flags.
When you add your transition on any flag (not just a label flag) to the top level, the configuration changes. Bazel doesn't know[1] that the `:library` target doesn't care about that changed flag, so it re-analyzes and re-builds it anyway. With something large like the go sdk, that can be very visible.
[Footnote 1]: More details: Bazel actually treats target analysis and action execution separately. Even with the changed configuration, if the action to build `:library`'s outputs doesn't change, Bazel will skip re-executing. The problem is, again, that Bazel doesn't know what flag changes are and are not different, and so a hash of the Starlark flags (like your label flag) ends up in the generated output path, and thus the action is different (it's reading and writing to different paths).
We have a couple of efforts to deal with this:
1. Output path stripping: this aims to reduce the amount of unneeded re-execution by removing unused data from the output path. This probably won't help you with your label_flag, because the flag is actually changing, not just staying at the default.
2. Improved exec transitions: This might help, because parts of the go sdk are tools used in the build, and parts are libraries, and this will at least keep the tools using the same configuration, so that those don't rebuild. The sdk libraries would still, unfortunately, rebuild
3. We keep discussing ways to do true analysis improvements by tracking what flags a target actually cares about, and only re-analyzing the target when those flags change. This turns out to be wildly complicated, and every prototype we've tried has spent more time and memory tracking flag usage than was actually saved. We still have some remaining ideas and intend to persue them, but it's tricky.