Articles/books that discuss separating the context-free part of a language from the context-sensitive part?

Costello, Roger L.

unread,

May 17, 2013, 4:57:38 PM5/17/13

to

I read this fantastic post on the comp.theory list:

http://coding.derkeiler.com/Archive/General/comp.theory/2004-03/0189.html

The poster makes the point that most programming languages define a
context-free core, and then have additional algorithms which run on the parse
tree to filter out constructs that are illegal in the language:

This separates out the context-free part of the language
from the context-sensitive part -- which is generally
regarded as good practice (a kind of modular "programming"
discipline for language design).

Are there any articles or books that discuss this technique?

[That's been the conventional wisdom at least since I took a compilers
course in the early 1970s. I believe I mentioned it in lex&yacc and
later in flex&bison. It's easier, and it also permits much better
error messages. -John]

glen herrmannsfeldt

unread,

May 17, 2013, 8:18:54 PM5/17/13

to

Costello, Roger L. <cost...@mitre.org> wrote:
> I read this fantastic post on the comp.theory list:

> http://coding.derkeiler.com/Archive/General/comp.theory/2004-03/0189.html

> The poster makes the point that most programming languages define a
> context-free core, and then have additional algorithms which run on the parse
> tree to filter out constructs that are illegal in the language:

(snip)

> [That's been the conventional wisdom at least since I took a compilers
> course in the early 1970s. I believe I mentioned it in lex&yacc and
> later in flex&bison. It's easier, and it also permits much better
> error messages. -John]

As I remember it, it is not so easy to do for recursive descent
parsers. As a result, as noted, you get poor error messages, such as:

SYNTAX ERROR

somewhere near where the problem is. For languages with reserved works,
it is nice to say exactly what the problem is. It gets more
interesting without reserved words. Yesterday, following a post
in comp.lang.fortran I tried:

TYPE :: DEEP
TYPE(DEEP) :: POINTER
END TYPE

When compiling it, the error message, pointing right to the
word POINTER, say "must have POINTER attribute".

Fortunately it only took me a few seconds to figure out, and then
laugh at it. But yes, you can get funny messages without reserved
words.

-- glen

Aleksey Demakov

unread,

May 18, 2013, 8:52:58 AM5/18/13

to

On Sat, May 18, 2013 at 12:57 AM, Costello, Roger L. <cost...@mitre.org> wrote:
> Are there any articles or books that discuss this technique?
>
> [That's been the conventional wisdom at least since I took a compilers
> course in the early 1970s. I believe I mentioned it in lex&yacc and
> later in flex&bison. It's easier, and it also permits much better
> error messages. -John]

Yes, perhaps any book on compilers would present exactly this
approach. I personally found "Engineering a Compiler" by Keith Cooper
and Linda Torczon to be a very good intro into compilers.

Regards,
Aleksey

Gene Wirchenko

unread,

May 19, 2013, 9:02:40 PM5/19/13

to

On Sat, 18 May 2013 00:18:54 +0000 (UTC), glen herrmannsfeldt
<g...@ugcs.caltech.edu> wrote:

[snip]

>As I remember it, it is not so easy to do for recursive descent
>parsers. As a result, as noted, you get poor error messages, such as:
>
> SYNTAX ERROR
>
>somewhere near where the problem is. For languages with reserved works,
>it is nice to say exactly what the problem is. It gets more

It is nice for any language. When parsing string, I tend to
avoid complex regexes for the same reason. If I build an FSA, I can
often get specific on what the error is.

>interesting without reserved words. Yesterday, following a post
>in comp.lang.fortran I tried:
>
>TYPE :: DEEP
> TYPE(DEEP) :: POINTER
>END TYPE
>
>When compiling it, the error message, pointing right to the
>word POINTER, say "must have POINTER attribute".
>
>Fortunately it only took me a few seconds to figure out, and then
>laugh at it. But yes, you can get funny messages without reserved
>words.

Could you explain what the error was? I do not know modern
Fortran.

Sincerely,

Gene Wirchenko

glen herrmannsfeldt

unread,

May 19, 2013, 11:17:58 PM5/19/13

to

Gene Wirchenko <ge...@telus.net> wrote:

(snip, I wrote)

>>somewhere near where the problem is. For languages with reserved works,
>>it is nice to say exactly what the problem is. It gets more

> It is nice for any language. When parsing string, I tend to
> avoid complex regexes for the same reason. If I build an FSA, I can
> often get specific on what the error is.

Yes, but there are some errors that you only get when there are
reserved words, when you use the word in the wrong context.

If you say:

for = 1;

in Java or C, the compiler could note that you used the reserved
word in the wrong context, instead of, for example, an invalid
'for' statement.

>>interesting without reserved words. Yesterday, following a post
>>in comp.lang.fortran I tried:

>>TYPE :: DEEP
>> TYPE(DEEP) :: POINTER
>>END TYPE

>>When compiling it, the error message, pointing right to the
>>word POINTER, say "must have POINTER attribute".

>>Fortunately it only took me a few seconds to figure out, and then
>>laugh at it. But yes, you can get funny messages without reserved
>>words.

> Could you explain what the error was? I do not know modern
> Fortran.

POINTER is the name of the variable, which doesn't have the
same-named attribute. It could be:

TYPE(DEEP), POINTER :: POINTER

(since Fortran doesn't have reserved words).

Where the variable POINTER has the POINTER attribute.

Though when I tried:

TYPE(TYPE) :: TYPE

it didn't work.

-- glen

Anton Ertl

unread,

May 20, 2013, 6:53:39 AM5/20/13

to

"Costello, Roger L." <cost...@mitre.org> writes:
>The poster makes the point that most programming languages define a
>context-free core, and then have additional algorithms which run on the parse
>tree to filter out constructs that are illegal in the language:

...
>[ [...] it also permits much better
>error messages. -John]

That depends on the syntax of the language. E.g., at one time
Modula-2 was the language used in our introductory programming course
25 years ago, our students used to write things like

a<3 AND b<3

and the compiler reported a type error (a and b were of type INTEGER).
But it really was a syntax error. The students should have written

(a<3) AND (b<3)

because the implicit binding was

a<(3 AND b)<3

and "3 AND b" was a type error. This error message confused the
students no end.

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Gene Wirchenko

unread,

May 21, 2013, 2:24:36 AM5/21/13

to

On Mon, 20 May 2013 10:53:39 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

[snip]

>That depends on the syntax of the language. E.g., at one time
>Modula-2 was the language used in our introductory programming course
>25 years ago, our students used to write things like
>
>a<3 AND b<3

I would do that.

>and the compiler reported a type error (a and b were of type INTEGER).
>But it really was a syntax error. The students should have written
>
>(a<3) AND (b<3)

Nasty.

>because the implicit binding was
>
>a<(3 AND b)<3
>
>and "3 AND b" was a type error. This error message confused the
>students no end.

Things like this are why I like details in the error messages
about what is being complained about, say:
"3" is not of a valid type for the "and" operator.
"b" is not of a valid type for the "and" operator.
That does not always help, but it usually does.

Sincerely,

Gene Wirchenko

Joe keane

unread,

May 30, 2013, 11:21:41 PM5/30/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>a<3 AND b<3

This reminds me of a question.

One person designs a language 'C1' with precedence:

[higher]
...
|
&
<, <=, >, <=
==, !=
...
[lower]

Some other person designs a language 'C2' with precedence:

[higher]
...
<, <=, >, <=
==, !=
|
&
...
[lower]

They're the same besides that.

To compromise, they make a new language 'C3' such that a program is
valid C3 if it is valid C1 and valid C2 -and- both ways mean the same.

Is there some good way to do this -in the grammar-?

[I see gcc has warnings like this.]

glen herrmannsfeldt

unread,

May 31, 2013, 1:10:52 AM5/31/13

to

Joe keane <j...@panix.com> wrote:

(snip)

> This reminds me of a question.

> One person designs a language 'C1' with precedence:
>
> [higher]
> ...
> |
> &
> <, <=, >, <=
> ==, !=
> ...
> [lower]

> Some other person designs a language 'C2' with precedence:

Looks not so far from B, see:

http://cm.bell-labs.com/cm/cs/who/dmr/bref.pdf

> [higher]
> ...
> <, <=, >, <=
> ==, !=
> |
> &
> ...
> [lower]

> They're the same besides that.

> To compromise, they make a new language 'C3' such that a program is
> valid C3 if it is valid C1 and valid C2 -and- both ways mean the same.

Seems easy if you disallow the use of the operators with changed
precedence without appropriate ()'s.

> Is there some good way to do this -in the grammar-?

As above, put the restriction in the grammar.

As I understand it, not having actually ever programmed in B,
(but found the reference manual) B only had the bitwise operators,
and shared them (used them also for) the logical operators.
The had (and have) the appropriate precedence.

When C inherited them from B, and then added the && and || operators,
the previous & and | operators kept their precedence. For actual use
as bitwise operators, such as masks, it would be more convenient if
they had higher precedence. As it is, they pretty much always are
parenthesized. (Easy to forget and generate wrong results.)

-- glen

Anton Ertl

unread,

May 31, 2013, 6:54:00 AM5/31/13

to

j...@panix.com (Joe keane) writes:
>One person designs a language 'C1' with precedence:
>
> [higher]
> ...
> |
> &
> <, <=, >, <=
> ==, !=
> ...
> [lower]
>
>Some other person designs a language 'C2' with precedence:
>
> [higher]
> ...
> <, <=, >, <=
> ==, !=
> |
> &
> ...
> [lower]
>
>They're the same besides that.
>
>To compromise, they make a new language 'C3' such that a program is
>valid C3 if it is valid C1 and valid C2 -and- both ways mean the same.
>
>Is there some good way to do this -in the grammar-?

Sure, you don't want | & to asssociate with < <= > >= == != (I find
the remaining associations questionable, but let's stick with your
example). So write the grammar in a way that makes the wanted
associations explicit and does not contain the unwanted associations:

boolexpr: orexpr
| ineqexpr ;
orexpr: orexpr '|' andexpr
| andexpr ;
andexpr: andexpr '&' lower
| lower ;
ineqexpr: eqexpr ineqop eqexpr
| eqexpr ;
eqexpr: lower eqop lower
| lower ;
...
term: '(' highest ')'
| ... ;

(This assumes that a<b<c and a==b==c is not wanted; also, resolving
conflicts is left as an exercise).