Resolving Ambiguity in SMT-LIB 2.7 grammar

19 views
Skip to first unread message

Jochen Hoenicke

unread,
Feb 6, 2025, 5:10:34 PMFeb 6
to smt...@googlegroups.com
Hello all,

I'm trying to implement the new 2.7 features in our solver, but it's
tricky to say the least. The plan is to add new rules to term:

<term> ::= (_ term term+) // function application
| (term term+) // another syntax for function application
|...

Of course this makes the grammar ambiguous. A simple example is ((_
foo bar) qux), which can either be an indexed function symbol foo with
index bar and one argument applied to argument term qux, or it could
be the indexed constant symbol foo with index bar of some type (-> A
B) applied to term qux of type A. Or it could be a constant symbol
foo of type (-> A (-> B C)) applied to two terms, which could be
alternatively written as (foo bar qux) or (_ foo bar qux) or (_ (_ foo
bar) qux).

My current idea is to parse this always as (IndexedIdentifier Term+),
determine the sort of qux, and then check if the indexed identifier
exists with that argument sort. If yes it's case 1. If not, check if
the indexed identifier exists without arguments (constant). Then it's
case 2. Otherwise, reinterpret all the indices as constant symbols or
term variables, determine their sort and then check if it typechecks
as case 3.

One problem here is that bar may not be a valid term, so we should
only create a term from it if needed. In "((_ is car) list)", I can't
just interpret "car" as a term, as it would be a polymorphic
higher-order term, which is thankfully not included in SMT-LIB 2.7. I
need to keep it as a symbol (string) as long as possible and ask the
theory if it has the function symbol "is" with the index "car". Only
if that fails, I reinterpret it as a term.

When I have a term ((_ foo bar (+ 1 2)) qux), then I know when I see
the open parenthesis before "+" that this is not an indexed identifier
but a function application. This is not easy to write a grammar for.
The parser also needs to reinterpret the partly parsed list of indices
"bar" as a list of terms the moment it sees a term that is not an
index.

There is also the ambiguity between (qualIdentifier term+) and (term
term+), but that is in my opinion much easier to solve: resolve the
reduce-reduce conflict by picking the first option and then if it
doesn't typecheck, check if qualidentifier is a term variable or
constant symbol of sort (-> A B). At least here the arguments are
"term" in both cases and the parser can parse them as such from the
beginning.

I think there is no need to allow both variants of function
applications (_ a b) and (a b) as equivalent terms. I'd rather only
allow the "empty" application operator (a b), as the ambiguity
introduced by using "_" as a symbol is much harder to resolve than
the ambiguity for the "nameless" operator. So my suggestion is to
remove (_ a b) and only retain (a b). Or alternatively, use a
different symbol than _ and remove (a b).

I saw the suggestion to get rid of indexed identifiers and see (_ is
cons) as an application of the function application operator _ on the
terms "is" and "cons". But this would require much more higher-order
and dependent-type reasoning than we want. What is even the sort of
"is"? Also, if "is" by itself is a term, can "let" bind it to a term
variable? And would we then also allow Currying indexed operands like
"((_ (_ extract 7) 3) x)"?

What do you think, and how do you resolve the ambiguity introduced by
allowing the reserved word "_" as an identifier?

BTW, I don't see any problems like these with lambda or the "->" sort
constructor.

Thanks for reading this,
Jochen

PS: Yes it's possible to write a non-ambiguous grammar, here is my ugly attempt:

noIndexTermList ::=
index*:indices noIndexTerm:t // here we can convert partially
parsed indices to terms
| noIndexTermList:l term:t;
term ::= ...
| qualIdentifier // smtlib-2.6 constant but do the disambiguation
if type error
| ( qualIdentifier term+) // smtlib-2.6 rule but do the
disambiguation if type error
| ( noQualIdentifierTerm term+) // smtlib-2.7 empty application operator
| ( _ noSymbolTerm term+) // _ application, the case where the
first arg is not a simple symbol.
| ( _ symbol noIndexTermList) // _ application, the case where
one of the later args cannot be an index
| ...

where noXTerm is the grammar for term with the case "X" removed.
Reply all
Reply to author
Forward
0 new messages