Reproducible builds and rewriting buildid in binary

1,098 views
Skip to first unread message

Ivan Daniluk

unread,
Sep 16, 2018, 7:49:19 AM9/16/18
to golang-nuts
I needed to find a way to create reproducible builds, regardless of the dev environment user uses. Luckily, Go gives almost everything needed for that out-of-the-box, and there is a great blog post by Filippo on the topic: https://blog.filippo.io/reproducing-go-binaries-byte-by-byte. If we have the same Go version and the same set of dependencies (which is easy when using vendor/ approach), the only problem is the difference in the absolute path of the working directory. In other words, the same code, built on the same dev environment in `GOPATH/src/project1` and `GOPATH/src/project2` will yield different binaries. There is an open issue for that in Go, and it will be hopefully addressed in Go 1.12 (https://github.com/status-im/status-react/issues/5587).

For now, the easy approach, of course, is to use docker for the build, but that feels too heavy just for ensuring the same dir. Spoofing directory with LD_PRELOAD hacks or using `chroot` approach also have obvious drawbacks – the need of C toolchain and root access, respectively.

After analyzing the binaries, I realized that they differ only in buildid stamp, the rest is the same. BuildiD is very well explained here: https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L24

For a quick recap, every Go package or binary is stamped with buildid value, which is essentially a 4 hash value:

   actionID(binary)/actionID(main.a)/contentID(main.a)/contentID(binary)

where:
 - actionID means a unique identifier of the inputs (sources, file names, go version, etc)
 - contentID means a unique identifier of the outputs (actual content output by compiler/linker)

So my thought went in the following direction – I don't care if the actionID (inputs) is different, but do care if contentID (outputs) are different.

If contentID is equal, I can just rewrite actionID with "expected" one and get the same binary byte-by-byte. This can be fully automated in Makefile or script. So the steps for the reproducible build are the following:

 - build binary - `go build -ldflags "-s -w" -asmflags=-trimpath="$(pwd)" -gcflags=-trimpath="$(pwd)"`
 - extract buildid - `go tool buildid myapp`
 - compare buildid's contentID values to known ones - `diff <(go tool buildid ./myapp  | cut -d'/' -f3) <(cat release.buildid.txt  | cut -d'/' -f3)`
 - if they're equal, assume that build is the same, and just rewrite the buildid value inside the binary - `objcopy --update-section .note.go.buildid=release.buildid.bin ./myapp` for ELF

In my tests that result in byte-by-byte equal binaries.

I have two concerns with this approach:
 1) I might be missing some corner cases, especially with hacking binaries of different formats. What perils of patching binary can be here?
 2) buildID hash is actually a truncated version of real hash (259 to 67 bytes), which increases the collision probability and is totally fine for the task "determine if binary should be rebuilt", but might be a concern for the task "guarantee that the build is the same". More explanation here: https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L113

Any thoughts on that? What else am I missing? Would this be a viable workaround for having reproducible build until #5587 is solved?

Ivan Daniluk

unread,
Sep 16, 2018, 7:50:31 AM9/16/18
to golang-nuts
Wrong link to the related Go's issue. It's https://github.com/golang/go/issues/16860.
Reply all
Reply to author
Forward
0 new messages