> Hence, in the context of a VALUES statement a parenthesized expression
> reduces to a tuple, not an expression. This syntax alone would be
> simple; I could specify the VALUES statement unambiguously as:
> values
> -> VALUES values_row (',' values_row)*
> ;
> values_row
> -> '(' expression (',' expression)* ')'
> ;
Sorry for the delay in replying. Rewriting your grammar in this
way should remove the ambiguity from your grammar. Its only the
case where something could be either a tuple (starting with
a parenthesis) or an expression (potentially starting with
a parenthesis) that should give rise to ambiguity.
> I naively translated the diagram above (ignoring the NULL alternatives)
> into the following grammar specification:
> values_clause
> -> VALUES values_row (',' values_row)*
> ;
> values_row
> -> expression
> | '(' expression (',' expression)* ')'
> ;
Here the parser cannot disambiguate look at the lookahead
character alone (the parenthesis). Its not until you hit
the comma that it becomes aparent that you're parsing a tuple
rather than an expression. And there is still an ambiguity
in the case of a singleton tuple or a parenthesized expression.
One thing you can do here is to force all tuples to have at
least one comma. That gets rid of the ambiguity. The parsing
engine will still not be able to tell which way its going
when it hits a parenthesis. What will happen is it will parse
both alternatives and drop one of them when it is no longer
viable. Ie. when it hits a comma, the expression parser will
not be able to proceed. If it hits the matching closing parenthesis
without hitting a comma, the tuple parser will not be able to
proceed.
The only minor detail here is that you cant tell the difference
between singleton tuples and parenthesized expressions (what
is "(1)"?). You can fix this in your grammar by picking one
or the other, or you can leave it as ambiguous and pick later
from the multiple parse trees you get back.
> Resulting in the ambiguity. I think it's possible to see how a human
> can tell the difference in that within the context of a VALUES
> statement an expression within *top-level* parentheses is a tuple; it's
> only parentheses *below* the top-level that reduce to an expression.
> Hence the following parentheses reduce to a tuple:
> VALUES (1)
This could be a tuple or an expression, really.
> Whereas in the following statement, the top-level parentheses reduce to
> a tuple, while the internal parentheses reduce to an expression:
> VALUES ((1))
This could also be a tuple or an expression :) I agree, as a human,
I'm inclined to call it a tuple, but I don't completely rule out
it being a tuple. Therein lies the problem.
> This is why I originally stated that I think it's possible to resolve
> the ambiguity, but only within the context of something else.
> Unfortunately, I'm at a loss as to how this could be represented in a
> PyGgy grammar.
Disallow singletons (you can treat all expressions as singletons
in the context of VALUES if you want) or defer the decision till
after parsing.
> Dave.
Tim Newsham
http://www.lava.net/~newsham/