How does the Golang compiler handle dependencies?

705 views
Skip to first unread message

kev kev

unread,
Nov 13, 2020, 5:54:34 PM11/13/20
to golang-nuts
I recently read the post by Rob Pike about language choices for Golang: https://talks.golang.org/2012/splash.article#TOC_5.

The seventh point refers to how Golang handles dependencies. It mentions an "object file" for packages that a _dependent_ reads.

Below I go through my interpretation of this section:

Example:

package A imports package B.

When I compile package A, package B would have already been compiled. What package A receives is not the AST of package B, but an "Object file". This object file only reveals data about the publicly accessible symbols in that package. From the example, if B had a private struct defined inside of it, this private struct would not be in the object file.

This part seems to make sense for me, hopefully I did not make any mistakes.

It seems that the speedup compared to C/C++ is because the object file is created once per package, while in C/C++ you need to re-compile the thing you are including each time?

Followup question:

Is a single file a compilation unit or is it a package?

Thanks

Kevin Chowski

unread,
Nov 13, 2020, 7:14:41 PM11/13/20
to golang-nuts
C/C++ also has object file caching (depending on how your build is set up, I guess). In C/C++ the issue is that you need to possibly open a large number of header files when you import any header file.

For example, if I write a file "main.c" which imports "something.h", which in turn imports "another.h" and "big.h", and compile just main.c, the compiler has to open all three header files and include them in the parsing of main.c in order for the compilation to correctly move forward. In Go, the compiler arranges things such that it only has to open one file per package that is imported. The post you linked goes into greater detail, so I will avoid duplicating the details for now, but feel free to ask a more specific question and I can try to answer.

There's a bit of nuance there, which the post also goes into: Go's strategy ends up requiring that some package much be compiled before any package which imports it is compiled. In C/C++ the ordering is a little more flexible due to the more decoupled nature of header files, meaning that theoretically more builds could occur in parallel. But I suspect that in your average Go program the dependency tree would still allow you to execute a large number of builds in parallel.

Also note that the article claims this is "the single biggest reason" Go compilation is fast, not the only one. There are lots of smaller, yet important, reasons as well. For example, parsing the language is pretty straightforward because it is not very complex, and linking the final binary together is continually being optimized. Plus there are no turing-complete meta-language features like the templates C++ compilers have to deal with ;)

As for your following, the whole set of files in some package are the compilation unit, at least as far as I understand the terms. This is because if a.go and b.go are both in the same package (e.g. in the same directory), code in a.go can call code in b.go without explicitly declaring anything. So before the code in a.go can be fully compiled into an object file, b.go must be considered as well.

kev kev

unread,
Nov 13, 2020, 8:18:08 PM11/13/20
to golang-nuts

Thanks for the answer. If C/C++ has object files, is it not possible to see “something.h” and then fetch the corresponding object file?

With go, if I import “package something” and that package imports another package called “package bar” then at some point I will need to compile “bar” and “something”. This to me is like your header example.

 I think you are maybe saying that this traversal is only done once for golang and the information is stored in an object file? While in C, the header traversal is done each time I see include?

Robert Engels

unread,
Nov 13, 2020, 8:21:51 PM11/13/20
to kev kev, golang-nuts
In C there are precompiled headers which avoid the recompilation. 

On Nov 13, 2020, at 7:18 PM, kev kev <kevthem...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/2e237e13-37c9-4741-8ea1-67f813923fafn%40googlegroups.com.

kev kev

unread,
Nov 13, 2020, 8:54:28 PM11/13/20
to golang-nuts
Oh right, I seem to not understand why golang is faster in that respect. If you can include the precompiled headers and or have an object file

Robert Engels

unread,
Nov 13, 2020, 8:58:14 PM11/13/20
to kev kev, golang-nuts
I think a lot is because a lack of macros. With macros it is difficult to figure out changed dependencies. 

On Nov 13, 2020, at 7:55 PM, kev kev <kevthem...@gmail.com> wrote:

Oh right, I seem to not understand why golang is faster in that respect. If you can include the precompiled headers and or have an object file

Aleksey Tulinov

unread,
Nov 14, 2020, 12:57:27 AM11/14/20
to kev kev, golang-nuts
There is no direct relationship between headers and object files in C
or C++. Compilation process is two stage:

1. Source files are compiled into object files
2. Object files are linked together into executable of sorts

(Actually it's a three stage process, but i'm going to describe it
using two stages).

Stages are isolated from each other and to some extent autonomous.
Object files are kind of intermediate representation of source code
that is later used to produce machine code. I'm sure someone can
correct me on this, but for the sake of simplicity, i think it's OK to
think of them as of IR.

When *compilation unit* is compiled into an object file, it is also
separated from other units. To compile it separately from other units
all relevant source code has to be pulled into the current unit and
compiled. So `#include <something.h>` doesn't include just
something.h, it includes something.h, then all includes that
something.h includes and so on. This is a process similar to
amalgamation of source code, everything is copied into one place and
then compiled as a single unit. After all units are compiled, they
might be joined together by a linker either into a static library,
dynamic library or executable.

This is actually more sophisticated than that, but it does allow you
to do some cool stuff like you can compile your source code into
objects, then ship object files and then link them elsewhere. In fact,
static libraries are just a bunch of object files packed together, but
headers are still required because you need symbol names to refer to
on source code level, therefore libraries are shipped with headers:
you compile with headers and then link with objects.

Since it's the separate stages, you could, for instance, write your
own headers for 3rd party objects, think open source headers for
closed source DirectX SDK.

This description is very superficial and doesn't cover a lot of what
is really going on. The process is very flexible and allows to do all
kinds of stuff in various combinations. Alas this process is also not
very fast and requires some costly steps like you need to pull all
required source code into a single unit to compile it.

Modern C++ is also using a lot of templates, even if you're not
writing templates, you're going to use templates from the standard
library and to use templates you need to transform (instantiate) each
template into concrete code and then (simply put) compile instantiated
template as regular non-templated source code. Because every
compilation unit is being "amalgamated", this process has to be
repeated for every unit, which also takes some time.

There is such thing as C++ modules, but they are quite new
(standardized like a month ago) and not yet widespread. I think they
should be more similar to Go *packages* when source code files are
logically joined into a single entity and for that entity another
intermediate representation is created which is called BMI (binary
module interface) even though it doesn't have to be binary, so
sometimes it's called CMI (compiler module interface).

This CMI is basically a compiler cache, a package, or in terms of C++,
a module interface, can be compiled once and then reused to compile
object files without recompiling the same source code for every unit.

Regarding how packages compilation actually works in Go - this is an
interesting topic. I'm afraid i won't be able to explain it more or
less correctly and i would be glad to read about it too.

сб, 14 нояб. 2020 г. в 04:17, kev kev <kevthem...@gmail.com>:

Jesper Louis Andersen

unread,
Nov 14, 2020, 11:05:36 AM11/14/20
to kev kev, golang-nuts
On Sat, Nov 14, 2020 at 2:54 AM kev kev <kevthem...@gmail.com> wrote:
Oh right, I seem to not understand why golang is faster in that respect. If you can include the precompiled headers and or have an object file


The key point is that in C++, declarations in header files tend to be leaky, especially in larger projects. That is, your class includes some private declarations, but those are listed in the (public) header file. This happens transitively/recursively through your whole header file import hierarchy. This means the header file needs to pull in dependent header files in order to satisfy declarations in the private area of the class, yet since they are not publicly facing, this creates a need for parsing more code for a given compilation, slowing down the compiler: you still have to look at every byte to parse, and you can only throw information away once you have analysed which parts you are actually using in the compilation unit. To boot, C++ is a rather complex language to parse, where you may need more than a single pass over the declarations in order to figure out what is going on.

It also creates a situation where changing some of the foundational header files in the project results in large recompilations of everything because there is a potential for things to have changed.

The traditional way of solving this has been to cache heavily and only recompile if need be. However, caching only helps you so much if you are leaking in header files all over the place.

In contrast, Go's design plugs this hole.

kev kev

unread,
Nov 15, 2020, 9:04:41 AM11/15/20
to golang-nuts
Reading Alekseys description, it does seem to be making a bit more sense. The C/C++ compilers use a "file" as a compilation unit. A file is converted to an object file which must contain all of its dependencies. So the includes will need to copy all of the code that they are importing into the file that is being compiled.

In Golang, the object file is more of a blackbox which contains only the necessary data that is needed. I'm assuming that "necessary" relates to type checking and symbol resolution mostly. 

It seems that one key difference is that Golang uses a package as a compilation unit, while C++ uses a file. If Golang used a file also and not a module structure, then it seems that similar issues or a significant decrease in performance would be observed. You would have more object files for one and there would be more dependencies between files.

Robert Engels

unread,
Nov 15, 2020, 11:19:55 AM11/15/20
to kev kev, golang-nuts
Object files do not contain dependencies except for code that the compiler inlines. It has linkage referees that are resolved during linking. 

On Nov 15, 2020, at 8:05 AM, kev kev <kevthem...@gmail.com> wrote:

Reading Alekseys description, it does seem to be making a bit more sense. The C/C++ compilers use a "file" as a compilation unit. A file is converted to an object file which must contain all of its dependencies. So the includes will need to copy all of the code that they are importing into the file that is being compiled.

Aleksey Tulinov

unread,
Nov 15, 2020, 2:12:49 PM11/15/20
to Robert Engels, kev kev, golang-nuts
Yeah, that's a good point. A C unit that is using f() won't
necessarily include f's implementation if it is defined somewhere
else. It may create a (weak?) reference to f() and leave the rest to
the linker. However to compile correctly it would *normally* include a
header where f() is declared to at least check that f() is accessed
using the correct interface, so f's declaration would normally be
pulled into a unit.

In C++ what to include into a unit and what not to include is a bit
more complicated especially if templates are involved (which is
usually the case), but perhaps such details are out of the scope of
this mailing list.

But yeah, i didn't realize that my email might be misleading in that
regard, sorry about that.

вс, 15 нояб. 2020 г. в 18:19, Robert Engels <ren...@ix.netcom.com>:
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/774661CD-0F92-4F72-ABA8-03F0B585E3C8%40ix.netcom.com.

Haddock

unread,
Nov 17, 2020, 11:08:37 AM11/17/20
to golang-nuts
Here is an article by Walter Bright that explains what the C++ compiler is doing: https://www.digitalmars.com/articles/b54.html Walter Bright is the creator of the first C++ compiler (if I remember right this was the Zortech C++ compiler, later acquired by Symantec).
Reply all
Reply to author
Forward
0 new messages