Good example of hand-built lexer and use of yacc together?

2,071 views
Skip to first unread message

Zellyn

unread,
Nov 7, 2014, 4:02:08 PM11/7/14
to golan...@googlegroups.com
Hi folks,

I'm (slowly) working my way through the Coursera compilers course, writing a COOL compiler in go.

I watched the excellent Lexical Scanning in Go video, then modified http://golang.org/src/pkg/text/template/parse/lex.go until I can successfully lex go.

Now, I'd like to do the yacc step.

I was wondering if anyone knew of a go yacc example with the following characteristics:
  • uses a hand-build lexer in the style from the talk and/or text/template
  • has the lexer in its own file (not in the .y file)
  • is fairly simple
Failing that, a quick sanity check on code organization would help:

I'm thinking that the yacc-output file has to be the one to define integer values for tokens. So I'm thinking I'll put the lexer interface definition in one file, so the yacc file can reference it, then generate the yacc output, and then have the hand-built lexer reference the term values defined there. It means I can't build the lexer without having yacc output, but for just testing lexing, I can create a mostly empty grammar that will at least create item numbering. Does that sound sane?

Thanks, and sorry if that was confused/confusing: ask me questions :-)

Zellyn

Nick Craig-Wood

unread,
Nov 8, 2014, 3:17:25 AM11/8/14
to Zellyn, golan...@googlegroups.com
On 07/11/14 21:02, Zellyn wrote:
> I was wondering if anyone knew of a go yacc example with the following
> characteristics:
>
> * uses a hand-build lexer in the style from the talk and/or text/template
> * has the lexer in its own file (not in the .y file)
> * is fairly simple

Funnily enough I wrote one last week which satisfies the above three
points (maybe a bit weak on the last!). I could send it to you off list
if you want to see (not ready for release yet).

I've used bison before (a long time ago) but this was my first time
through with "go tool yacc".

I decided to hand build the lexer as it was the go way, and it turned
out to be not that difficult, and a lot easier that fighting with flex
would have been.

> Failing that, a quick sanity check on code organization would help:
>
> I'm thinking that the yacc-output file has to be the one to define
> integer values for tokens.

Correct.

> So I'm thinking I'll put the lexer interface
> definition in one file, so the yacc file can reference it, then generate
> the yacc output, and then have the hand-built lexer reference the term
> values defined there.

That is exactly what I did. I ended up with lexer.go and grammar.y
files. The grammar.y file built a y.go file with the token definitions in.

> It means I can't build the lexer without having
> yacc output, but for just testing lexing, I can create a mostly empty
> grammar that will at least create item numbering. Does that sound sane?

That sounds perfect. I build the grammar first then the lexer when I
did it, but just defining the %token will define the symbols I think.

Documentation on yacc or bison is helpful for working out what go tool
yacc can do. If you look at the source code you can see exactly which
%operations are defined.

The hardest part for me was killing all the shift/reduce and
reduce/reduce errors. That is standard yacc stuff though, not go
specific. At this point it is helpful to have used the -v flag to
generate the parsing tables as you can't really understand the debug
output of y.go withouth them.

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

gerald...@gmail.com

unread,
Nov 8, 2014, 8:03:41 AM11/8/14
to golan...@googlegroups.com
Hi,

Im implementing a toy compiler which uses Nex (https://crypto.stanford.edu/~blynn/nex/) and Yacc:

Nex is a lexer generator which generates go code. I just mention it in case if you don't want to write the lexer per hand...

@Nick Craig-Wood: is your compiler open source?

-Gerald

Zellyn Hunter

unread,
Nov 8, 2014, 12:36:27 PM11/8/14
to Nick Craig-Wood, golan...@googlegroups.com
Thanks for the input: I also spent a little time last night splitting up https://github.com/golang-samples/yacc/tree/master/simple to match my proposed organization, and everything worked fine. With your validation, I'm good to go! :-)

Zellyn

Zellyn Hunter

unread,
Nov 8, 2014, 12:41:26 PM11/8/14
to gerald...@gmail.com, golan...@googlegroups.com
I ran across Nex in my search for examples, but the lexer is already done :-)

Also, I wanted to try building the lexer by hand. It was quite simple.

Zellyn

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/FFhwVimZYb4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zellyn Hunter

unread,
Nov 8, 2014, 1:51:02 PM11/8/14
to Nick Craig-Wood, golan...@googlegroups.com
FWIW, here is the commit switching from explicitly defined item values to yacc-defined terms.

I also changed single-rune punctuation items to just return the item: I may replace some of them with actual terms later on. We'll see.

Unfortunately, I accidentally included deleting a big comment (the old text/template lexer code) so the change looks bigger than it should.

Zellyn

gerald...@gmail.com

unread,
Nov 8, 2014, 2:08:20 PM11/8/14
to golan...@googlegroups.com, ni...@craig-wood.com
nice :) take a look on my github, I a wrote a function to plot the AST using doxygen, which is pretty helpful...

Nick Craig-Wood

unread,
Nov 10, 2014, 6:02:46 AM11/10/14
to Zellyn Hunter, golan...@googlegroups.com
I would say you are on the right track! I used a traditional state
machine to do the lexing rather than a go routine. The go routine is a
very interesting idea - I'd be worried about leaking it though on syntax
errors.

I finally found a really good example of go yacc written by Russ Cox.
It parses the C language.

https://code.google.com/p/rsc/source/browse/#hg%2Fcc

Which shows some more advanced yacc usage (eg $<val>$) and also how to
return something from the parser without a global variable which I
didn't figure out until I saw the above (answer define the output in the
lexer structure and assign to it with a type assertion
`yylex.(*lexer).prog = ...`
https://code.google.com/p/rsc/source/browse/cc/cc.y#186 )

On 08/11/14 18:50, Zellyn Hunter wrote:
> FWIW, here is the commit switching from explicitly defined item values
> to yacc-defined terms.
>
> I also changed single-rune punctuation items to just return the item: I
> may replace some of them with actual terms later on. We'll see.
>
> Unfortunately, I accidentally included deleting a big comment (the old
> text/template lexer code) so the change looks bigger than it should.
> https://github.com/zellyn/gocool/commit/813110894c6f62ec7efcc4516c9153071d08018b
>
> Zellyn
>
> On Sat Nov 08 2014 at 9:36:15 AM Zellyn Hunter <zel...@gmail.com
> <mailto:zel...@gmail.com>> wrote:
>
> Thanks for the input: I also spent a little time last night
> splitting
> up https://github.com/golang-samples/yacc/tree/master/simple to
> match my proposed organization, and everything worked fine. With
> your validation, I'm good to go! :-)

Nick Craig-Wood

unread,
Nov 10, 2014, 6:12:22 AM11/10/14
to gerald...@gmail.com, golan...@googlegroups.com
On 08/11/14 13:03, gerald...@gmail.com wrote:
> Im implementing a toy compiler which uses Nex
> (https://crypto.stanford.edu/~blynn/nex/) and Yacc:
> https://github.com/geraldstanje/toycompiler

Very nice! I like your approach to AST walking with the BasicNode
keeping a list of children separate to the actual node data - I might
borrow that idea!

> Nex is a lexer generator which generates go code. I just mention it in
> case if you don't want to write the lexer per hand...

I did look at Nex. I decided that I'd probably be fighting with it as
the language I'm using has a very complicated lexer. Your example in
your compiler looks amazing though.

> @Nick Craig-Wood: is your compiler open source?

After it works a bit better yes!

Zellyn

unread,
Nov 10, 2014, 2:34:07 PM11/10/14
to golan...@googlegroups.com, zel...@gmail.com
On Monday, November 10, 2014 3:02:46 AM UTC-8, Nick Craig-Wood wrote:
I finally found a really good example of go yacc written by Russ Cox.
It parses the C language.

https://code.google.com/p/rsc/source/browse/#hg%2Fcc

Which shows some more advanced yacc usage (eg $<val>$) and also how to
return something from the parser without a global variable which I
didn't figure out until I saw the above (answer define the output in the
lexer structure and assign to it with a type assertion
`yylex.(*lexer).prog = ...`
https://code.google.com/p/rsc/source/browse/cc/cc.y#186 )

Ah, fantastic: thanks for that link! :-) 
Reply all
Reply to author
Forward
0 new messages