Why there is not a lexer generator like lex in go?

2,856 views
Skip to first unread message

Jingguo Yao

unread,
Dec 28, 2015, 1:09:46 PM12/28/15
to golang-nuts
Go has [yacc](https://golang.org/cmd/yacc/) which is a parser generator. Go's yacc is adapted from Plan 9 [yacc](http://plan9.bell-labs.com/magic/man2html/1/yacc). But Go does have not a counterpart for Plan 9 [lex](http://plan9.bell-labs.com/magic/man2html/1/lex) which is a parser generator. Why such a counterpart is left out? 




Ian Lance Taylor

unread,
Dec 28, 2015, 1:35:51 PM12/28/15
to Jingguo Yao, golang-nuts
Until quite recently, yacc was needed to generate the parser that the
compiler uses. Now that it is no longer needed, it may move to
golang.org/x. lex was never needed, so it was never added.

In practice parser generates are more useful than lexer generators. A
hand-written lexer is usually fairly easy to write and generally
outperforms a generated lexer, at least with the lexer generators that
I've seen.

Ian

Jan Mercl

unread,
Dec 28, 2015, 2:18:14 PM12/28/15
to Jingguo Yao, golang-nuts
> On Mon, Dec 28, 2015 at 7:09 PM Jingguo Yao <yaoji...@gmail.com> wrote:
>
> Go has [yacc](https://golang.org/cmd/yacc/) which is a parser generator. Go's yacc is adapted from Plan 9 [yacc](http://plan9.bell-labs.com/magic/man2html/1/yacc). But Go does have not a counterpart for Plan 9 [lex](http://plan9.bell-labs.com/magic/man2html/1/lex) which is a parser generator. Why such a counterpart is left out?

You might want to have a look at golex[0].


--

-j

Manlio Perillo

unread,
Dec 28, 2015, 2:39:11 PM12/28/15
to golang-nuts, yaoji...@gmail.com
Il giorno lunedì 28 dicembre 2015 19:35:51 UTC+1, Ian Lance Taylor ha scritto:
On Mon, Dec 28, 2015 at 12:05 AM, Jingguo Yao <yaoji...@gmail.com> wrote:
> [...]

In practice parser generates are more useful than lexer generators.  A
hand-written lexer is usually fairly easy to write and generally
outperforms a generated lexer, at least with the lexer generators that
I've seen.


With PEG (Parser Expression Grammar) you have both a parser and a lexer.


Regards  Manlio Perillo

Jan Mercl

unread,
Dec 28, 2015, 2:44:09 PM12/28/15
to Manlio Perillo, golang-nuts, yaoji...@gmail.com

> On Mon, Dec 28, 2015 at 8:39 PM Manlio Perillo <manlio....@gmail.com> wrote:
>
> With PEG (Parser Expression Grammar) you have both a parser and a lexer.

And for some grammars also O^n runtime performance.

--

-j

Jingguo Yao

unread,
Dec 29, 2015, 3:13:47 AM12/29/15
to golang-nuts, yaoji...@gmail.com
Ian:

Thanks. Your reply clears my mind.

Zhongwei Yao

unread,
Dec 30, 2015, 10:04:18 PM12/30/15
to Jingguo Yao, golang-nuts
Hi, 
package "cmd/asm/internal/lex" Tokenizer is implemented by wrapping "text/scanner.Scanner". Maybe the Scanner is what you are looking for?

--
Best regards,
Zhongwei

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,
Zhongwei

Peter Kleiweg

unread,
Jan 2, 2016, 10:58:40 AM1/2/16
to golang-nuts, yaoji...@gmail.com
Op maandag 28 december 2015 19:35:51 UTC+1 schreef Ian Lance Taylor:

In practice parser generates are more useful than lexer generators.  A
hand-written lexer is usually fairly easy to write and generally
outperforms a generated lexer, at least with the lexer generators that
I've seen.

I have never had the need for a tool like yacc, but I have used flex quite a lot. I would really like a port from flex to Go. There are other tools, but none as fast and flexible and easy to use as flex.

Flex has the combination of regular expression parsing across line boundaries with the use of start conditions. With these, you can build fast and powerful automata. Add a stack, and you have a push-down automaton. Add a queue, and you can parse languages you can't parse with yacc. 

These things are not easy to write without a lexer.


Michael Jones

unread,
Jan 2, 2016, 11:02:41 AM1/2/16
to Peter Kleiweg, yaoji...@gmail.com, golang-nuts

Great reasons to port flex, and a good set of example programs too!

Jan Mercl

unread,
Jan 2, 2016, 11:16:18 AM1/2/16
to Peter Kleiweg, golang-nuts, yaoji...@gmail.com


On Sat, Jan 2, 2016, 16:58 Peter Kleiweg <pkle...@xs4all.nl> wrote:
Op maandag 28 december 2015 19:35:51 UTC+1 schreef Ian Lance Taylor:

In practice parser generates are more useful than lexer generators.  A
hand-written lexer is usually fairly easy to write and generally
outperforms a generated lexer, at least with the lexer generators that
I've seen.

I have never had the need for a tool like yacc, but I have used flex quite a lot. I would really like a port from flex to Go. 

Please let me know which features of flex do you miss in golex.


--

-j

Damian Gryski

unread,
Jan 2, 2016, 1:10:29 PM1/2/16
to golang-nuts
I've had success using http://www.colm.net/open-source/ragel/ to generate lexers, and you can use them with yacc.

Damian

Peter Kleiweg

unread,
Jan 2, 2016, 2:34:47 PM1/2/16
to golang-nuts, pkle...@xs4all.nl, yaoji...@gmail.com
Op zaterdag 2 januari 2016 17:16:18 UTC+1 schreef Jan Mercl:



On Sat, Jan 2, 2016, 16:58 Peter Kleiweg <pkle...@xs4all.nl> wrote:
Op maandag 28 december 2015 19:35:51 UTC+1 schreef Ian Lance Taylor:

In practice parser generates are more useful than lexer generators.  A
hand-written lexer is usually fairly easy to write and generally
outperforms a generated lexer, at least with the lexer generators that
I've seen.

I have never had the need for a tool like yacc, but I have used flex quite a lot. I would really like a port from flex to Go. 

Please let me know which features of flex do you miss in golex.

Rules with the longest match should be selected.

Flex uses look-up tables for speed.

Does it support commands like BEGIN, REJECT, yymore(), yyless()? I haven't tested this.

I tried converting a simple lexer from flex to golex, using the example as a guide. It is a very simple thing, no BEGIN or REJECT, but I can't get it to work as it should. Sometimes I get the wrong output, sometimes I get a scanner error.

Peter Kleiweg

unread,
Jan 3, 2016, 12:53:51 PM1/3/16
to golang-nuts
Op zaterdag 2 januari 2016 19:10:29 UTC+1 schreef Damian Gryski:

I've had success using http://www.colm.net/open-source/ragel/ to generate lexers, and you can use them with yacc.

Ragel is one of those other tools that is not as easy to use as flex. It looks promising, but the manual is not simple. I would really like to see examples that demonstrate each and every concept in Ragel.

Jan Mercl

unread,
Jan 3, 2016, 4:46:13 PM1/3/16
to Peter Kleiweg, golang-nuts, yaoji...@gmail.com
On Sat, Jan 2, 2016 at 8:35 PM Peter Kleiweg <pkle...@xs4all.nl> wrote:

Rules with the longest match should be selected.

[0] documents the mechanism providing longest match. It is also the default behavior of lex.Lexer[1] when Mark[2] and Abort[3] are used.
 
Flex uses look-up tables for speed.

Golex for speed uses code generation that switches on the current DFA state and the current look-ahead character/character class. The compiler may implement such switch using a lookup table so the performance is perhaps comparable.
 
Does it support commands like BEGIN, REJECT, yymore(), yyless()? I haven't tested this.

I think BEGIN is a preprocessor macro(?). The golex equivalent is to set some variable/struct field to the start condition number and provide access to that variable/struct field via the %yyt directive.

REJECT is not supported. It could be supported, but it would not be trivial to add.

yy{more,less} are not directly supported. Adding support for those two functions to [1] seems easy to me.

Feel free to fill issues about the above if you could you those features with golex.
 
I tried converting a simple lexer from flex to golex, using the example as a guide. It is a very simple thing, no BEGIN or REJECT, but I can't get it to work as it should. Sometimes I get the wrong output, sometimes I get a scanner error.

Is the flex lexer, you've tried to convert, available for me to look at? I could then give it a try and would I succeed, it may become a useful example to add to the documentation.


--

-j

Peter Kleiweg

unread,
Jan 4, 2016, 5:27:46 AM1/4/16
to yaoji...@gmail.com, golan...@googlegroups.com
Jan Mercl schreef op de 3e dag van de louwmaand van het jaar 2016:

> > I tried converting a simple lexer from flex to golex, using the example as
> > a guide. It is a very simple thing, no BEGIN or REJECT, but I can't get it
> > to work as it should. Sometimes I get the wrong output, sometimes I get a
> > scanner error.
> >
>
> Is the flex lexer, you've tried to convert, available for me to look at? I
> could then give it a try and would I succeed, it may become a useful
> example to add to the documentation.

Attached.

Running demo-go and demo-c on file 'in' should give the same
result. Running it twice should give back the original 'in'.

--
Peter Kleiweg
http://pebbe.github.io/
Makefile
demo-c.l
demo-go.l
in

Jan Mercl

unread,
Jan 4, 2016, 11:29:42 AM1/4/16
to Peter Kleiweg, golan...@googlegroups.com

On Mon, Jan 4, 2016 at 11:27 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:

> Running demo-go and demo-c on file 'in' should give the same
result.

Thanks for the sources. Attached version of the .l code has some modifications and seems to work. Please test.

--

-j

demo-go.l

Peter Kleiweg

unread,
Jan 4, 2016, 1:06:20 PM1/4/16
to golang-nuts, pkle...@xs4all.nl
Op maandag 4 januari 2016 17:29:42 UTC+1 schreef Jan Mercl:
Yes, it works.

Those are quite some modifications. I wouldn't know why to do it that way. 

Jan Mercl

unread,
Jan 4, 2016, 1:47:30 PM1/4/16
to Peter Kleiweg, golang-nuts
On Mon, Jan 4, 2016 at 7:06 PM Peter Kleiweg <pkle...@xs4all.nl> wrote:

> Those are quite some modifications. I wouldn't know why to do it that way. 

You're welcome to suggest where you see particular room for improvements of the godocs at [0] as well as pointing me to where you miss important information. Thanks in advance.


--

-j

Peter Kleiweg

unread,
Jan 5, 2016, 3:25:17 PM1/5/16
to golang-nuts, pkle...@xs4all.nl
Op maandag 4 januari 2016 19:47:30 UTC+1 schreef Jan Mercl:
> On Mon, Jan 4, 2016 at 7:06 PM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>
> > Those are quite some modifications. I wouldn't know why to do it that way. 
>
>
> You're welcome to suggest where you see particular room for improvements of the godocs at [0] as well as pointing me to where you miss important information. Thanks in advance.

Do you care to explain why my version didn't work?

Jan Mercl

unread,
Jan 5, 2016, 5:12:52 PM1/5/16
to Peter Kleiweg, golang-nuts
On Tue, Jan 5, 2016 at 9:25 PM Peter Kleiweg <pkle...@xs4all.nl> wrote:

> Do you care to explain why my version didn't work?

One issue is the rule `['b-df-hj-np-tv-xz]{2}`. Golex does not support the `{2}` part, ie. it interprets it literally. It's because I didn't yet needed it and honestly your program is the first time I see it used in an .l source code. Not your fault of course. Please fill an issue[0] about it if you would like golex to support this.

The second issue is the longest match property. Your version of demo-go.l does not have that property because a golex generated lexer per se does not implement that. It's documented at [1]:

""""
- The generated FSM picks the rules in the order of their appearance in the .l source, but "flex picks the rule that matches the most text".
""""

It's documented how to implement the longest match[2]. I "implanted" the code in [2] in your demo-go.l and it seems to work. Please test the attached code.

The version I posted yesterday is how I would write the code. Using the golex/lex[3] package makes it all simpler/shorter, but admittedly at some performance cost. The today's version does basically the same without relying on the golex/lex package.

Hope this helps.



--

-j

demo-go.l

Damian Gryski

unread,
Jan 6, 2016, 5:04:57 AM1/6/16
to golang-nuts
Ragel is pretty complex, but I've used it successfully for a few simple things.
   1) beginnings of a toy pascal-like compiler https://github.com/dgryski/dpc  <-- also has yacc integration
   3) playing around with matching domains in an input stream https://github.com/dgryski/trifles/tree/master/matcher

Damian
Reply all
Reply to author
Forward
0 new messages