Created a Lexer, Parser and AST for a compiler front-end in Go. How can it be compiled to a binary?

1,983 views
Skip to first unread message

Rob Thornton

unread,
Aug 5, 2013, 12:44:56 AM8/5/13
to golan...@googlegroups.com
As part of continuing to grow as a programmer, I decided it was high time I wrote my own toy language compiler. I have completed a working lexer (I really enjoyed that part) and a parser which creates an AST. I can evaluate the generated AST to execute simple expressions like basic math and assignment. It was pretty amazing to get this far and create an interpreter but I really would like to take it to what I see as the final step. After much searching I have found there tends to be only one resolution and that's to output an intermediate language.  Some languages are converted to bytecode for a VM and others have the AST passed directly into the compiler back-end like in GCC'S and LLVM's (or is it LLVM IR?) case. One other solution is to output another high-level language like C which would in tern be compiled by a C compiler but seems abysmally complicated. From what I can gather, to create a front-end for GCC I would need to use C or C++ and for LLVM C++ so it doesn't appear I could use my Go front-end but, again, I could be mistaken.

So, I have a few questions I'm hoping someone would be willing to answer for me:

1) What would be the simplest way to produce an executable from the state I'm at? Is it best to attempt to output Go code or some other intermediate language to be compiled by a secondary compiler? Thereby the 'compiler' would in effect be a script which executes a two-stage process: i) language -> intermediate language, ii) intermediate language -> executable?

2) After looking at Go's sources, it seems that the GC generates assembler instructions. Would it not need an external assembler to compile the assembly to binary? What stage(s) am I missing to produce the final executable? Obviously there's the linking stage but there must be more I'm missing.

Rob Pike

unread,
Aug 5, 2013, 12:54:18 AM8/5/13
to Rob Thornton, golan...@googlegroups.com
You could just generate the instructions yourself, without going through an intermediate language. It's remarkably easy to do, at least in principle.

-rob

bronze man

unread,
Aug 5, 2013, 1:06:04 AM8/5/13
to golan...@googlegroups.com
A compiler is used to transform one language to another language.

For example Golang on amd64 windows:
6g transform golang source code to assembly language on amd64, 
6s transform assembly language to machine code on amd64, 
6l put those machine code with some library into a exe file.

I think you can transform your AST to assembly language,then use some tools in GCC to build binary.
I think you can also transform your AST to LLVM IR,then use LLVM to build binary.

quarnster

unread,
Aug 5, 2013, 1:09:20 AM8/5/13
to golan...@googlegroups.com
LLVM's got a C interface too: http://llvm.org/docs/doxygen/html/group__LLVMC.html, which in turn has a Go interface via https://github.com/axw/gollvm.

Going from an AST to any other language (including directly to the binary opcode) is pretty much the same no matter what the language is in my mind. If you are looking for the simplest and quickest route, make your compiler output code in a language you know and make use of that language's compiler. 

/f

Carlos Castillo

unread,
Aug 5, 2013, 5:22:29 AM8/5/13
to golan...@googlegroups.com
You could make your interpreter handle extra data at the end of the binary (if any) as the input script. Then "compiling" could essentially be:

cat interpreter script.txt > app ; chmod +x app

If you want to save yourself some start up time, you could gob-encode your ast to bypass needing to run your lexer/parser.

IE - make a -compile option that:
  1. Parses/Checks script into AST
  2. Open output file
  3. Copy interpreter to the file (use http://godoc.org/bitbucket.org/kardianos/osext to find running binary in safe manner)
  4. Gob-Encode AST to same output file
  5. Close file & Change permissions of file to mark as executable
To find the end of the executable data you can use the standard library debug/{pe,elf,macho} packages. An example on how to do so can be found at (https://github.com/cookieo9/resources-go/blob/master/v2/resources/zipexe.go). It finds workable zip files at the end of an executable, and potentially within it as well (although that behaviour is untested). You would only need the code that finds the end of the executable.

Or you could output to a language you do know... but that seems so 4 hours ago ;-)

André Moraes

unread,
Aug 5, 2013, 7:47:11 AM8/5/13
to Rob Thornton, golan...@googlegroups.com
This could give you some help regarding code generation

https://www.coursera.org/course/compilers

--
André Moraes
http://amoraes.info

Rob Thornton

unread,
Aug 5, 2013, 2:24:26 PM8/5/13
to golan...@googlegroups.com
I would, ultimately, like to generate the instructions myself as @Rob Pike suggests (though the 'in theory' sounds ominous. LOL!). I wasn't aware of the C interface, and thereby the Go LLVM library so that might be a very good option for me, too. Thanks @quarnster! I think perhaps before I go any further more knowledge is required. Thank you @André Moraes for the link to the Coursera page! I didn't know you could preview the course (watch all the lectures but without assignments)! I'm really looking forward to viewing the course! 

Also, thanks @bronze man for pointing out the 3-stage process. That makes a lot more sense now.

Thanks everyone for your replies!

Robin Eklind

unread,
Aug 6, 2013, 5:19:19 AM8/6/13
to golan...@googlegroups.com
Nice to see interest in learning how to write compilers!

llgo is a project which uses the standard library to produce an AST and transform it to LLVM IR using gollvm.

The projects blog is located at: http://blog.awilkins.id.au/
The github repository is located at: https://github.com/axw/llgo

Good luck and have fun :)

jonathan....@gmail.com

unread,
Aug 6, 2013, 11:10:15 AM8/6/13
to golan...@googlegroups.com
If you use gollvm to produce LLVM bitcode, you can also run the bitcode directly in the LLVM JIT compiler/interpreter, lli, which could be useful for testing and debugging your language. The speed is quite reasonable; I've tested some extremely simple llgo programs with lli and they were at least half as fast as equivalent programs compiled with the Go compiler. 

Archos

unread,
Sep 8, 2013, 7:32:43 AM9/8/13
to golan...@googlegroups.com
Hi Rob,

I need to build a Go's compiler to be called/used from Rust although I've no decided whether to write it in C or to use Rust.

Does your compiler is open source? I would like to read it to take inspiration.
Which language did you use to build it?

Rob Thornton

unread,
Sep 10, 2013, 5:09:50 PM9/10/13
to Archos, golan...@googlegroups.com
To be honest, the Go sources would probably stand as a better example than my own code (see: http://golang.org/pkg/go/).  My own toy language was based around LISP S-Expressions akin to http://godoc.org/launchpad.net/twik by Gustavo Niemeyer. I can put my code up on github if you like but I promise you'll learn more from these other two resources than my own code as I'm still a novice/student at compiler design.


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/QMK8uAOm_D0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Archos

unread,
Sep 10, 2013, 6:03:38 PM9/10/13
to golan...@googlegroups.com
Thanks! Yesterday, I started to code based in "http://golang.org/pkg/go/". I prefer to build a lexer and parser hand-written for better performance and because I get more control than using a parser-generator.
Kudos for the author/s of those packages.
Reply all
Reply to author
Forward
0 new messages