bit-identical build output

388 views
Skip to first unread message

Albert Strasheim

unread,
Mar 31, 2013, 4:13:55 AM3/31/13
to golan...@googlegroups.com
Howdy

I am reading the articles on Google's build that iant referred to in another post with interest.

Looking at this one:


I was curious whether Go produces bit-identical output these days.

It seems it's not the case for binaries that use cgo.

Looking at a diff of hexdumps, it seems there are still a few TMPDIR paths for .cgo2.c files towards the end of the file.

$ eu-elfcmp -l foo1 foo2
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content
eu-elfcmp: foo1 foo2 differ: build ID content

objdump -d produces identical output.

Maybe worth filing an issue?

Regards

Albert

minux

unread,
Mar 31, 2013, 4:27:58 AM3/31/13
to Albert Strasheim, golan...@googlegroups.com
On Sun, Mar 31, 2013 at 4:13 PM, Albert Strasheim <ful...@gmail.com> wrote:
> I was curious whether Go produces bit-identical output these days.
>
> It seems it's not the case for binaries that use cgo.
>
> Looking at a diff of hexdumps, it seems there are still a few TMPDIR paths
> for .cgo2.c files towards the end of the file.
>
> $ eu-elfcmp -l foo1 foo2
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
> eu-elfcmp: foo1 foo2 differ: build ID content
>
> objdump -d produces identical output.
>
> Maybe worth filing an issue?
Definitely, please. I fixed the problem in the past
(https://codereview.appspot.com/6445085).

Let's see if it could be fixed before Go 1.1 ships, i think identical
binary for identical
source is a good feature to have.

Albert Strasheim

unread,
Mar 31, 2013, 4:36:07 AM3/31/13
to golan...@googlegroups.com, Albert Strasheim

Rob Pike

unread,
Mar 31, 2013, 11:11:36 AM3/31/13
to Albert Strasheim, golan...@googlegroups.com
Why does it matter? In fact, security people often argue that the binaries should vary, to make it harder to use certain attacks?

-rob

Aram Hăvărneanu

unread,
Mar 31, 2013, 11:21:54 AM3/31/13
to Rob Pike, Albert Strasheim, golan...@googlegroups.com
> Why does it matter? In fact, security people often argue that the binaries
> should vary, to make it harder to use certain attacks?

It creates a link back to the source, which can be useful. If I
suspect a binary has been tampered with, I can compare it with a new
build from my vetted source and toolchain. If it doesn't match, I can
throw it out. I can't do this if the build product is always
different. I can also publish the binaries' hashes on my website so
people that get my application through a package manager can verify
that they have my vetted version.

--
Aram Hăvărneanu

Albert Strasheim

unread,
Mar 31, 2013, 11:24:36 AM3/31/13
to golan...@googlegroups.com, Albert Strasheim
Hello


On Sunday, March 31, 2013 5:11:36 PM UTC+2, Rob Pike wrote:
Why does it matter? In fact, security people often argue that the binaries should vary, to make it harder to use certain attacks?

The Google article I linked to makes one useful case for bit identical output.

In my team we recently talked about another scenario: as part of a release, you tag all your sources and file away your build environment as a list of packages or a VM image. A few months later, you rebuild your old tag/branch before making some changes to an old release branch. At that point, it's very nice to be able to sanity check your environment by checking that, before you changed any code, if it made the same exact binaries as a few months ago. This is the same thing that Aram is talking about.

I don't know about attacks that are made harder by subtle changes in the binary. In this case, I don't see that the binaries vary in any useful way that could thwart an attack. I understand about address space randomization, but that happens when the binary runs. Do you have more info?

Regards

Albert

Brad Fitzpatrick

unread,
Mar 31, 2013, 11:56:25 AM3/31/13
to Rob Pike, Albert Strasheim, golan...@googlegroups.com
Inside Google the build system assumes all output is identical and each artifact's digest is used as cache keys.  The build tools people take this very seriously.  I remember a bug of mine for a tool written in Go produced output as a function of map enumeration order (random in Go) and another team using this tool then got differing output for each run, freaked out (it was slowing their builds massively), and then I had to fix it quickly.

But we probably use the same fake cwd to build a command for a given input, so would be unaffected by the "go" tool creating tempdirs and cgo picking them up.

I definitely understand non-Googlers caring about this bug, though, considering how much we care.

On Sun, Mar 31, 2013 at 8:11 AM, Rob Pike <r...@golang.org> wrote:
Why does it matter? In fact, security people often argue that the binaries should vary, to make it harder to use certain attacks?

-rob

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sean Russell

unread,
Apr 3, 2013, 6:50:45 AM4/3/13
to golan...@googlegroups.com, Rob Pike, Albert Strasheim
Hi Brad,


On Sunday, March 31, 2013 11:56:25 AM UTC-4, bradfitz wrote:
Inside Google the build system assumes all output is identical and each artifact's digest is used as cache keys.  The build tools people take this very seriously.

How does the Google build process take into account that Go doesn't provide a mechanism for importing a specific version of a non-core library?  Since one can't import a specific revision (or tag) in Go, the build server is always at risk of getting an artifact with entirely different behavior than what the developer sees.  Also, unless the build server snapshots the build environment -- including all dependent libraries -- then every time it tries to reproduce the build it going to get a different binary with (potentially) wildly different behavior even if it uses the exact same revision of the source repository. 

I'm curious how Google addresses this internally.

Thanks,

--- SER

Aram Hăvărneanu

unread,
Apr 3, 2013, 7:32:10 AM4/3/13
to Sean Russell, golang-nuts, Rob Pike, Albert Strasheim
This is well explained in the Google Engineering Tools blog:

http://google-engtools.blogspot.com/

Nate Finch

unread,
Apr 3, 2013, 7:36:09 AM4/3/13
to golan...@googlegroups.com, Rob Pike, Albert Strasheim
This is actually a huge problem for us at my job. We use C#, and the assemblies are ALWAYS different, even if the code is identical. It bugs the hell out of me, for a few reasons:

If the same code produces the same binary output, you know without a doubt that the behavior will be identical. If the binaries are different, you have to worry if something got changed accidentally.

Binary identical output makes it much easier to track changes to components of a large system. If you're building multiple binaries, but only some have changed, you only have to redistribute the changed files. In our system, we have an update mechanism that only updates files that have changed. We used to have a complicated and somewhat buggy method for detecting what code had changed so we'd know what binaries had real changes, so we'd only distribute the ones that changed. Now we just suck it up and redistribute everything.

It's also incredibly useful for being able to produce binaries from a tagged build later that are identical to the ones that were built from the code in the first place... we do this a lot in order to debug bugs found in an older production binary. If the binaries are different, it can hinder debugging efforts (may not apply to Go directly, but it's been a problem for us using C#).

And really, it just makes sense. The same code should always make the same binary. It should be a 1:1 mapping. That's the whole point, right? Developer says do XYZ, compiler makes a binary that does XYZ. The compiler should do the same thing every single time, so that it is a repeatable process.

Albert Strasheim

unread,
Apr 3, 2013, 7:43:06 AM4/3/13
to golan...@googlegroups.com, Sean Russell, Rob Pike, Albert Strasheim
On Wednesday, April 3, 2013 1:32:10 PM UTC+2, Aram Hăvărneanu wrote:
This is well explained in the Google Engineering Tools blog:
 http://google-engtools.blogspot.com/

What might be missing it this:


If I had to guess, I would say that Google keeps all third party dependencies in their source control system. Their build doesn't go get.

Cheers

Albert

Sean Russell

unread,
Apr 3, 2013, 9:45:19 AM4/3/13
to golan...@googlegroups.com, Sean Russell, Rob Pike, Albert Strasheim
On Wednesday, April 3, 2013 7:32:10 AM UTC-4, Aram Hăvărneanu wrote:
This is well explained in the Google Engineering Tools blog:

 http://google-engtools.blogspot.com/

I find nothing in any posts in this blog that deal with intra-Google project dependencies.  Further, this blog appears to be very C(++) focused, not Go, which (because of the way imports work) have different constraints.

--- SER

Sean Russell

unread,
Apr 3, 2013, 9:55:01 AM4/3/13
to golan...@googlegroups.com, Sean Russell, Rob Pike, Albert Strasheim
This is my question.  I can think of only two options:
  • The developer using an external package (not necessarily external to Google, but external to the project) either re-owns the dependency, which means changing all imports on any dependencies.  In the case of X -> A, X -> B, but also A -> B (where A & B are external), the developer would have to re-own A & B, and also change all of the imports of B from A.  This sounds like a maintenance nightmare, especially when it comes time to update from upstream.
  • The build system re-uses the build environment per project, never auto-updating upstream components.  There would need to be some mechanism for developers to trigger an update of a specific upstream package in the build environment.  This seems more likely; given the stated Google practice of not using branches, it would contribute to why branches are not used, since the build environment would need to keep a snapshot for every branch of every project.

--- SER

Péter Szilágyi

unread,
Apr 3, 2013, 10:35:50 AM4/3/13
to Sean Russell, golang-nuts, Rob Pike, Albert Strasheim
I'm fairly confident you won't get the true story out of a Google employee, since they're not allowed to discuss internals. That's why they will happily direct you to blog posts (which already passed all verification).

One solution I can think of that would solve the issue of referring to outside projects without having to care about versions or re-owning/import updates is to simply clone a needed codebase into a local git/mercurial/whatever server (frozen to the specific version needed) and redirect go get/imports to that server (i.e. redirect the github domain).

Though I'd find it very surprising if Google pulled in third party code into their projects given their high profile nature. Of course this doesn't mean they don't, only that it's probably an admin hell to get such a thing through :)


--

Damian Gryski

unread,
Apr 3, 2013, 10:49:11 AM4/3/13
to golan...@googlegroups.com, Sean Russell, Rob Pike, Albert Strasheim


Le mercredi 3 avril 2013 13:32:10 UTC+2, Aram Hăvărneanu a écrit :
This is well explained in the Google Engineering Tools blog:

 http://google-engtools.blogspot.com/


Damian 

Brad Fitzpatrick

unread,
Apr 3, 2013, 11:00:16 AM4/3/13
to Péter Szilágyi, Sean Russell, golang-nuts, Rob Pike, Albert Strasheim
We have a giant source repo that contains most of everything (search, MapReduce, self-driving-cars, etc.)  It has a top-level directory "third_party" for outside code which we didn't write.  Some people elsewhere call this "vendor".  Each third_party directory contains metadata about where it came from, its license, etc.  Our build system even enforces licenses.  (You can see some discussion of this at e.g. https://sites.google.com/site/skiadocs/developer-documentation/contributing-code/how-to-upstream-a-google3-third_party-patch-to-the-skia-repo)

We have tools to slurp in and update dependencies.

So if github.com/foo/bar gets updated to a new version, that doesn't affect us until we explicitly update our internal copy (in its own separate commit, checking to see if it breaks any tests in the world).  Then a later commit can use the new version.

I do a similar thing in my side project ... you can see my third_party directory at http://camlistore.org/code/?p=camlistore.git;a=tree at the bottom.  That means I can sync to any point in time in the project's history and get the same build as it built on that date.

Sean Russell

unread,
Apr 3, 2013, 7:29:35 PM4/3/13
to golan...@googlegroups.com, Péter Szilágyi, Sean Russell, Rob Pike, Albert Strasheim
On Wednesday, April 3, 2013 11:00:16 AM UTC-4, bradfitz wrote:
We have a giant source repo that contains most of everything (search, MapReduce, self-driving-cars, etc.)  It has a top-level directory "third_party" for outside code which we didn't write.  Some people elsewhere call this "vendor".  Each third_party directory contains metadata about where it came from, its license, etc.  Our build system even enforces licenses.  (You can see some discussion of this at e.g. https://sites.google.com/site/skiadocs/developer-documentation/contributing-code/how-to-upstream-a-google3-third_party-patch-to-the-skia-repo)

Thanks Brad!  From that documentation, it looks like Google is following the "re-own" model.

I do a similar thing in my side project ... you can see my third_party directory at http://camlistore.org/code/?p=camlistore.git;a=tree at the bottom. That means I can sync to any point in time in the project's history and get the same build as it built on that date.

How do you manage upstream changes?  You'd necessarily lose revision history in your local copy of the upstream repositories, right?
 
--- SER

David Anderson

unread,
Apr 3, 2013, 8:11:01 PM4/3/13
to Sean Russell, golang-nuts, Péter Szilágyi, Rob Pike, Albert Strasheim
With git, it's fairly easy to rebase local changes on a new upstream revision of vendor code. But by far the easiest way is to send your patches upstream, and then update your local snapshot. That way, your management of vendor code is simple, and upstream gets patches. Everybody wins.

- Dave
Reply all
Reply to author
Forward
0 new messages