[erlang-questions] Better error messages would be helpful

74 views
Skip to first unread message

Richard A. O'Keefe

unread,
May 15, 2013, 11:04:23 PM5/15/13
to Erlang-Questions Questions
I'm in the middle of writing to someone who asked for help with
some Erlang code. With identifiers replaced, he had

21: f(A, B, C) ->
22: D = lists:map(fun (E) ->
23: case proplist:get_value(E, B, error) of
24: {F} -> {E,F};
25: error -> {error}
26: end, A).

erlc reported

./foobar.erl:26: syntax error before: ')'
./foobar.erl:16: function f/3 undefined

When there are error messages that prevent any function from being processed,
it would be helpful if erlc did *NOT* complain about undefined, because that
message is untrue. The function *was* defined, the compiler just gave up on
it.

But the message that really had him confused was the "syntax error before ')'"
one. This is the kind of error message one typically gets from something
using Yacc-style parsing.

I didn't like his layout, so I fed his file to an Erlang pretty-printer I
wrote a couple of years. The code is fast but rather horrible C, so I do
not care to share it. It was thrown together as an experiment, with an
eye to using it in a text editor. What did that pile of hacks say?

Line 26: 'fun' at line 22 closed by ')' (expected 'end').

So a better error message is certainly possible.

This is a pretty-printer, not a parser. Almost the only syntax checking it
does is to keep track of brackets. But bracket errors are common enough
that it's worth while.

I don't like the idea of slowing erlc down, but I wonder if decent bracket
matching checks could be folded in with macro expansion without too much
trouble?




_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Anthony Ramine

unread,
May 16, 2013, 4:30:46 AM5/16/13
to Richard A. O'Keefe, Erlang-Questions Questions
Hello Richard,

To silence the undefined error, the parser would need to be able to know it indeed gave up on it; that is not so straightforward.

For the syntax error to be more useful, that also would require either heavy yecc surgery or rewritting erl_parse to be an handwritten descent-recursive parser --that is a thing I've always wanted to do.

For pairs we can indeed use epp to do it as it already needs to track them for macro arguments.

Otherwise we can still improve the situation quite a lot with just column numbers, when they get merged someday.

Regards,

--
Anthony Ramine

Richard A. O'Keefe

unread,
May 16, 2013, 9:08:06 PM5/16/13
to Anthony Ramine, Erlang-Questions Questions

On 16/05/2013, at 8:30 PM, Anthony Ramine wrote:

> Hello Richard,
>
> To silence the undefined error, the parser would need to be able to know it indeed gave up on it; that is not so straightforward.
>
> For the syntax error to be more useful, that also would require either heavy yecc surgery or rewritting erl_parse to be an handwritten descent-recursive parser --that is a thing I've always wanted to do.

It all depends on the structure of the compiler.
For example, I have an AWK->Java compiler written in C++ as an exercise;
it needs some work on the symbol table and the runtime needs finishing,
but then it will be open-sourced.

There are three phases, not entirely unlike Erlang.
(1) Lexical analysis builds a sequence of token records in memory.
(2) Parsing recycles those tokens into an AST, resolves variables,
and does some very crude type inference.
(3) Code generation emits Java.

If a lexical error occurs, no parsing is done.
And brackets are checked during lexical analysis:
the lexical analyser only needs to keep a simple stack
of what brackets are open.

My crude Erlang pretty-printer plays the same check-brackets-during-
lexical-analysis-with-a-straightforward-stack trick.

In the days when compiling meant submitting a deck of punched cards in the
morning and getting a listing in the late afternoon, it was important for
a compiler to check as much as it could.

These days, it is more important that a compiler's messages should not be
misleading, and parse errors following lexical errors (including bracket
errors as lexical errors) are generally pretty misleading.

> Otherwise we can still improve the situation quite a lot with just column numbers, when they get merged someday.

In this specific case, it would not have helped. It was perfectly clear
exactly where the parser ran into trouble. It just wasn't clear _why_ it
ran into trouble.

Part of the problem is precisely the fact that the parser is written in Yecc.

Anthony Ramine

unread,
May 17, 2013, 4:27:28 AM5/17/13
to Richard A. O'Keefe, Erlang-Questions Questions
While I'm absolutely buying your argument about checking brackets during lexical analysis, I'm not sure the OTP team would like that seeing how they are quite frisky with my =<< fix; wouldn't checking them there bring syntax knowledge into the lexer, something they want to avoid?

Regards,

--
Anthony Ramine

Joe Armstrong

unread,
May 17, 2013, 6:02:30 AM5/17/13
to Richard A. O'Keefe, Erlang-Questions Questions
On Fri, May 17, 2013 at 3:08 AM, Richard A. O'Keefe <o...@cs.otago.ac.nz> wrote:

On 16/05/2013, at 8:30 PM, Anthony Ramine wrote:

> Hello Richard,
>
> To silence the undefined error, the parser would need to be able to know it indeed gave up on it; that is not so straightforward.
>
> For the syntax error to be more useful, that also would require either heavy yecc surgery or rewritting erl_parse to be an handwritten descent-recursive parser --that is a thing I've always wanted to do.

It all depends on the structure of the compiler.
For example, I have an AWK->Java compiler written in C++ as an exercise;
it needs some work on the symbol table and the runtime needs finishing,
but then it will be open-sourced.

There are three phases, not entirely unlike Erlang.
(1) Lexical analysis builds a sequence of token records in memory.
(2) Parsing recycles those tokens into an AST, resolves variables,
    and does some very crude type inference.
(3) Code generation emits Java.

If a lexical error occurs, no parsing is done.
And brackets are checked during lexical analysis:
the lexical analyser only needs to keep a simple stack
of what brackets are open.

That's a nice idea - we could easily stick an extra pass between
scanning and parsing to do this

/Joe
 

David Mercer

unread,
May 17, 2013, 9:40:13 AM5/17/13
to Richard A. O'Keefe, Anthony Ramine, Erlang-Questions Questions
Richard A. O'Keefe wrote:

> Part of the problem is precisely the fact that the parser is written in
> Yecc.

Back in the day, I'd have known the answer to this, but I figure some of you are more knowledgeable than I ever was: What parsing technology besides hand-rolled would produce better error messages? ANTLR? Parsec? PEG?

Cheers,

DBM

Siraaj Khandkar

unread,
May 17, 2013, 9:48:24 PM5/17/13
to David Mercer, Erlang-Questions Questions

On May 17, 2013, at 9:40 AM, David Mercer <dme...@gmail.com> wrote:

> Richard A. O'Keefe wrote:
>
>> Part of the problem is precisely the fact that the parser is written in
>> Yecc.
>
> Back in the day, I'd have known the answer to this, but I figure some of you are more knowledgeable than I ever was: What parsing technology besides hand-rolled would produce better error messages? ANTLR? Parsec? PEG?


Menhir

http://gallium.inria.fr/~fpottier/menhir/


--
Siraaj Khandkar

o...@cs.otago.ac.nz

unread,
May 18, 2013, 7:41:32 PM5/18/13
to Siraaj Khandkar, Erlang-Questions Questions
>
> On May 17, 2013, at 9:40 AM, David Mercer <dme...@gmail.com> wrote:
>
>> Richard A. O'Keefe wrote:
>>
>>> Part of the problem is precisely the fact that the parser is written in
>>> Yecc.
>>
>> Back in the day, I'd have known the answer to this, but I figure some of
>> you are more knowledgeable than I ever was: What parsing technology
>> besides hand-rolled would produce better error messages? ANTLR?
>> Parsec? PEG?
>
>
> Menhir
>
> http://gallium.inria.fr/~fpottier/menhir/

Menhir is definitely interesting, and if I had experienced anything
other than extreme frustration with Godi, I would have liked to
try it. But judging from section 8 of the Menhir manual, a
Menhir-built parser would be no better than a Yecc-built one here.
It might even be worse: the manual says that the current lookahead
token is *replaced* by an ERROR token as part of recovery, and in
the example we are discussing, the token in question is ")" and it
is correct, and should be retained. What is more, the Menhir manual
is quite explicit that error detection may be deferred.

What the user needs in cases like this is the information that
an earlier token (fun) caused the parser to be looking for a
particular token (end) but it encountered a closer (right paren)
before finding it. Something basically top-down should have an
easier time of it.

Robert Virding

unread,
May 19, 2013, 2:55:26 PM5/19/13
to o...@cs.otago.ac.nz, Siraaj Khandkar, Erlang-Questions Questions
You can in some cases alleviate the problem by adding explicit grammar rules which generate errors. It is not a good solution as you clutter up the syntax rules.

Robert

David Mercer

unread,
May 21, 2013, 11:56:59 AM5/21/13
to o...@cs.otago.ac.nz, Siraaj Khandkar, Erlang-Questions Questions
Is there a grammar language/parsing technology that you could use for this, or would it all pretty much be hand-roll-your-own?

Anthony Ramine

unread,
May 21, 2013, 4:52:02 PM5/21/13
to David Mercer, Siraaj Khandkar, Erlang-Questions Questions
I don't think there is any parsing technology that does all the things Clang do, and Clang is definitely the way to go with regard to compiler diagnostics. I would love to be wrong on this.

--
Anthony Ramine

David Mercer

unread,
May 21, 2013, 6:51:40 PM5/21/13
to Anthony Ramine, Siraaj Khandkar, Erlang-Questions Questions
Isn't Clang and example of roll-your-own parser? I thought it was developed to parse C and C only (and C derivatives).

Cheers,

DBM


> -----Original Message-----
> From: Anthony Ramine [mailto:n.o...@gmail.com]
> Sent: Tuesday, May 21, 2013 15:52
> To: David Mercer
> Cc: o...@cs.otago.ac.nz; 'Siraaj Khandkar'; 'Erlang-Questions Questions'
> Subject: Re: [erlang-questions] Better error messages would be helpful
>

Anthony Ramine

unread,
May 21, 2013, 9:06:44 PM5/21/13
to David Mercer, Siraaj Khandkar, Erlang-Questions Questions
Yes it is a roll-your-own parser.

--
Anthony Ramine
Reply all
Reply to author
Forward
0 new messages