Error when generating Parser for XQuery

108 views
Skip to first unread message

Joey Ezechiëls

unread,
Nov 2, 2016, 9:21:04 AM11/2/16
to antlr-discussion
Hi everyone, 

I need an XQuery parser, and to that end I have copied and then massaged the original XQuery 3.1 grammar I copied from here. The resulting work-in-progress grammar can be found here, the only changes so far are to make the grammar palatable to ANTLR 4, which includes splitting up a couple of rules.

However the grammar as it is gives me errors:
error(134): xquery31.g4:178:26: rule reference PragmaContentsInternal is not currently supported in a set

The same errors appear at other src locations for DirPIContentsInternal, CDataSectionContentsInternal, StringConstructorCharsInternal, and CommentContentsInternal. 

In addition to the above there are a couple of warnings but they look unrelated to this specific problem. 

A quick google search suggests that I should inline the offending rules. Unfortunately that also results in errors:
error(50): xquery31.g4:177:31: syntax error: missing RPAREN at '*' while looking for lexer rule element
error(50): xquery31.g4:177:44: syntax error: extraneous input ')' expecting SEMI while matching a lexer rule

There are similar errors for DirPIContentsInternal, CDataSectionContentsInternal, StringConstructorCharsInternal, and CommentContentsInternal.

Given this catch22 I'm not really sure what I can do about this. Can anyone help me out? 

Eric Vergnaud

unread,
Nov 2, 2016, 10:43:25 AM11/2/16
to antlr-discussion
Hi,

in antlr, lexer token definitions start with an uppercase letter while grammar rules start with a lowercase one.
Also, it is considered good practice to place your lexer definitions in a dedicated 'lexer parser', while the parser rules will go in a 'grammar parser'.
I would suggest you do that to start with, to narrow down your issues.

Eric
(also I would encourage you to start with a minimal grammar, and enrich it gradually, rather than follow a big bang strategy)

Eric Vergnaud

unread,
Nov 2, 2016, 10:55:04 AM11/2/16
to antlr-discussion
A simple way to fix the existing error is as follows:

CharsWithSharpRPAR: Char* '#)' Char*;

PragmaContents : Char* ~ CharsWithSharpRPAR ;



Le mercredi 2 novembre 2016 21:21:04 UTC+8, Joey Ezechiëls a écrit :

Joey Ezechiëls

unread,
Nov 2, 2016, 10:59:15 AM11/2/16
to antlr-discussion
I'm not sure how to split up the rules into separate files so that antlr plays nice. 

Also, I've given the XPath grammar the same treatment and antlr has no issues generating a parser from that, so I'm inclined to believe that that is not the cause. 

Also, the grammar is pretty much defined as-is, and picking it apart like you suggest would take such huge amounts of time that I may as well drop the entire thing (it's not exactly a hobby project, so there are more constraints than "I want it to work and I'll do whatever it takes").

Joey Ezechiëls

unread,
Nov 2, 2016, 11:00:39 AM11/2/16
to antlr-discussion
That is exactly what the grammar does right now, see the gist link. 
Unfortunately that does not work here, while it does work for XPath. 

Jim Idle

unread,
Nov 2, 2016, 9:59:45 PM11/2/16
to antlr-di...@googlegroups.com
Ericying to tell you that you have named all the rules for your grammar as lexer rules and not parser rules. It almost never works to just copy the normative spec EBNF rules as they are for guidance about syntax and not usually an actual grammar definition. 

However this is not a big grammar and you should be able to fix it up in day. ONLY your terminal rules should start with an upper case letter. Change teh 'literal' strings to keyword lexer tokens, and change rules that are not terminals to start with a lower case letter.

However, there seem to be a number of XQuery grammars for v4 already out there. Why not just use one of these? for instance, though I have not checked it, this one looks like a good start, though it is xquery 1?



Jim

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joey Ezechiëls

unread,
Nov 7, 2016, 4:07:56 PM11/7/16
to antlr-discussion
I see. While Eric was correct, and his answer was indeed a part of the solution, the part about upper/lower case for terminal vs parser rules didn't come across to me, and without that my effort was doomed to failure.

Armed with this new knowledge I changed the grammars and all my errors have gone away.
I don't yet know if it all works properly since I haven't gotten around to actually using the generated JS parser yet, but all appearances are good at this point.

I'll post an update when I know more.
Thanks guys!
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

Joey Ezechiëls

unread,
Nov 9, 2016, 10:36:48 AM11/9/16
to antlr-discussion
Ok so here's the update:
The parser generates fine, but the example input "for $p in doc("foo") return $p" yields the following error:

line 1:3 no viable alternative at input 'for $'.

In order to allow anyone willing to help to inspect the situation, I have created a gist with the grammar , example JS file that is run with Node.JS, and the output log, which logs a number of parse nodes to clarify the state a bit.

I have looked at left associativity as a possible culprit but the online docs on it aren't too clear to me, so I can't really eliminate it as a potential factor. 

Joey Ezechiëls

unread,
Nov 9, 2016, 10:42:25 AM11/9/16
to antlr-discussion
Btw, I have noticed something rather strange in that (as can be seen in the logs) the InitialClauseContext seems to be circular, zo something that I don't think should ever happen. It may even be the cause of the issue, but since the rules are recursive that way in the original grammar I'm not sure what can be done about it. 
Reply all
Reply to author
Forward
0 new messages