I am looking for a description of the grammar of the Prolog language to be
used in building a syntax checker with Lex/Yacc. Since building the syntax
checker is a learning project for students any pointers to existing syntax
checkers for Prolog won't help.
+-----------------------------------------------------------------------------+
| Henk Schouten Software Expertise Center |
| Haagse Hogeschool - Intersector Informatica |
| Louis Couperusplein 1-19 |
| henks@hhinsi@hp4nl 2419 AP Den Haag - The Netherlands |
| henks@nikhef@hp4nl tel: (31) 70 618419 fax: (31) 70 618599 |
+-----------------------------------------------------------------------------+
(A) There is a public-domain tokeniser for Prolog and a public-domain
parser for Prolog. I wrote the tokeniser and adapted the parser,
and posted both to the net in '84. Both are Prolog source code.
I know you say "any pointers to existing .. won't help", but the
tokeniser and parser are about as concise a specification of Prolog
syntax as you will get.
(B) However, Prolog is an extensible language. The lexical structure
is not extensible, so you could use Lex. I have a table-driven
tokeniser in C (I've been using it in an editor for years, and it
was donated to Stony Brook Prolog), so I know you can do that part
statically as Lex requires. The extension mechanism of Prolog is
that you can declare new operators, which can be prefix, infix,
postfix, or any mix of them. For example, after doing
:- op(100, fx, a).
the expression "a X" is legal, "a a X" is illegal, "a(X)" and "a = X"
are legal and would have been without the declaration.
Had the declaration been
:- op(800, fy, a).
the expression "a a X" would have been legal and "a = X" illegal.
After the declarations
:- op(100, fx, a), op(100, xf, a), op(100, xfx, a)
the expressions "a X", "X a", and "X a X" are all legal.
It is easy to find small sets of operator declarations which make
Prolog not LR(k) for any fixed k, still less LR(1).
Some extensible languages (such as Algol 68) can be hacked by having
a fixed set of operator meta-tokens, PREFIX_1, ... INFIX_9 and the
like, and having the tokeniser return those. However, Prolog has
1200 precedence levels, so that would be rather a strain for Yacc.
(C) The BSI/ISO committees have produced several different grammars for
Prolog. I believe the current version to be the one in a document
ISO/IEC JTC1 SC22 WG17 N40, dated July 1989. This one is pretty
straightforward. (Unlike the version which I criticsed so strongly
in this newsgroup, the current version makes no attempt to describe
the syntax of Prolog code. It is frankly a syntax for TERMS, and as
such nearly all of my objections no longer apply.) This document
should be available from NPL.
(D) Prolog is much more like Lisp than it is like Pascal. It is
possible to define a (non-LR(1)) grammar for Prolog *data*, but
every Prolog procedure is made of Prolog data, and "telling the
birds from the flowers" is not always possible. Consider the fact
that many Prolog systems have a macro-expansion predicate called
term_expansion/2 which users can define at run-time. For example,
after defining
:- op(10, fx, a).
:- op(10, fx, an).
term_expansion(X is a Y, Fact) :- Fact =.. [Y,X].
term_expansion(X is an Y, Fact) :- Fact =.. [Y,X].
the clauses
clyde is an elephant.
lr(1) is a restriction.
behave to a Prolog system exactly as if they had been written
elephant(clyde).
restriction(lr(1)).
For semantic analysis, it is the term which results from such
user-defined macro-processing (this includes grammar rule expansion)
that matters. There can be no difference in meaning between
X is (Y+1)/2
and
is(X, /(+(Y,1),2))
because they are the SAME term, even though there is a syntactic
difference.
There are Prolog systems which do not offer term_expansion/2, and they
can be very useful if they are otherwise excellent, but it is a
definite restriction to leave this facility out.
(E) The bottom line is that Yacc is not able to handle the full syntax
of Prolog. You would have to restrict it by leaving out user-defined
operators and user-defined macro-processing. The result would still
be a useful language. But it would not be a very interesting one to
use as a class project, because there is very little that a syntactic
analysis can reject. Consider:
p :- 2.
does not make sense as a Prolog rule, because 2 is not something that
can be called. (Although some Prolog systems use precisely this
syntax to tie basic operations to special instructions.) However,
q :- nonvar(( p :- 2 )).
is perfectly sensible. The Prolog parser in my editor is, excluding
the tokeniser, about 200 lines of C. About a third of that is
disambiguating operators. That really doesn't sound as though it
would make an interesting Yacc grammar.
I agree. My solution (some years ago) was to write an operator precedence
parser for Prolog. The syntax is very straightforward: you just have
operands, operators and brackets. And (this was my main motivation):
there was no harm implementing extensible sets of operators.
> The Prolog parser in my editor is, excluding
> the tokeniser, about 200 lines of C. About a third of that is
> disambiguating operators. That really doesn't sound as though it
> would make an interesting Yacc grammar.
I agree, again. My parser was (I hope, it still *is* :-)) very small, too.