Parser and Architecture

26 views
Skip to first unread message

David Lee

unread,
Aug 8, 2011, 2:46:55 PM8/8/11
to Carrot
As mentioned at Balisage, I'd love to help with a parser.
But before we go there, we need a disucssion on what the runtime/dev
environment works like.
I would propose "JavaCC" as the parser. This runs within Java. Is
that acceptable ?
Secondly what would be the output of the parser ? An in-memory AST ?
an XML file like XQueryX ?
If the later maybe we can hijack the XQueryX parser from somewhere ?

Suggestions ?

-David


Evan Lenz

unread,
Aug 8, 2011, 3:40:54 PM8/8/11
to carro...@googlegroups.com
I was going to suggest JavaCC (and JJTree) also, since the W3C resources
already provide an XQuery 1.0 baseline to work from:
http://www.w3.org/2010/02/qt-applets/xquery10/

For parser output, I would vote for an intermediate XML format with the
aim of compilation to XSLT (and ideally XQuery as well). I got feedback
from several people that compilation to a standard language would make
them more likely to use it.

This of course wouldn't prevent alternative implementations, such as
native support for Carrot in XQilla, which John Snelson proudly
demonstrated by the end of the day. :-)

Another resource that might be helpful:
http://www.fatdog.com/Extreme.html


Evan

David Lee

unread,
Aug 8, 2011, 3:54:04 PM8/8/11
to carro...@googlegroups.com
I'm pretty experienced (not "expert" but "advanced novice" ) with JavaCC.
Its what I used for xmlsh and have learned the majority of hoops one must go
through to get it to work reasonably well ... ususally:)

I could take a stab at it.
An intermediate XML format would be useful although maybe an in-memory AST
first ? may need to do post-parsing logic to produce the right XML.

But having an intermediate XML lets you use XML tools to do the compilation
... is that what you had in mind ?
I suggest

Parser -> In-memory AST -> XML

Then this decopuples the parser and lets a compiler author pull in at either
the Java/Memory/Object layer or the XML layer.

So given this. What XML format ? Should we go with something close to
XQueryX or invent our own ?
Do you have an XML Schema in mind ?


----------------------------------------
David A. Lee
dl...@calldei.com
http://www.xmlsh.org

David Lee

unread,
Aug 8, 2011, 4:02:18 PM8/8/11
to carro...@googlegroups.com
I looked at this page :

http://www.w3.org/2010/02/qt-applets/xquery10/

If it is what it claims to be ('it works!")
I agree we should start from this. Parser can be JavaCC , In memory tree
can be JJTree
XML format ? a specialization/bastardazation of XQueryX ?

Evan Lenz

unread,
Aug 8, 2011, 5:05:21 PM8/8/11
to carro...@googlegroups.com
Good point. Pre-XML in-memory representation makes sense for allowing
Java-based Carrot implementations that don't require serialization.

I think something like an "extended subset" of XQueryX would work great,
but if it's easier to use existing grammar names (and extra work to
translate to XQueryX-ish names), I would avoid the extra work. I don't
think the intermediate XML format needs to be standardized, just clear.

"Extended" in that we have our own top-level module format, and XQuery
expressions are extended to include:

* ruleset invocations (^foo()),
* shallow copy constructors (copy{}), and
* text node literals (`text`).

"Subset" in that we only use XQuery expressions, not XQuery's syntax for
defining functions, etc.

Evan

Evan Lenz

unread,
Aug 8, 2011, 5:13:52 PM8/8/11
to carro...@googlegroups.com
Then again, if XQueryX++ adds sufficient value to make writing the (e.g.,
XSLT-based) compiler easier (by abstracting just enough away from lexical
details), then it might be worth it. Main point is that I personally don't
care if the format itself is standardized or standard-like.

Evan

David Lee

unread,
Aug 8, 2011, 5:39:44 PM8/8/11
to carro...@googlegroups.com
Agree I dont care if the XML form is "standard" or not.
Just looking pragmatically if its easier to make a "delta" on an existing
standard or make one up from scratch.

John Snelson

unread,
Aug 9, 2011, 7:21:46 AM8/9/11
to carro...@googlegroups.com
The XQuery grammar applet is generated from an XML description of the
language - the W3C have versions for all the extensions to XQuery and
working draft specs as well. It might be as good to start with the
source XML and modify that, rather than with the intermediate JavaCC
files. It's a pretty nice format, and very modular and extensible.

http://www.w3.org/2007/01/applets/

XQueryX is a nasty format to start from for performing the translation,
but it is a place to start. There is also an existing stylesheet which
will translate XQueryX back into XQuery - which is probably a good
foundation for some of the functionality.

Building an XQuery parser is hard - we shouldn't attempt it if we have a
default alternative. That said, I'd love to see an XQuery parser written
in XQuery - one day maybe.

John


--
John Snelson, Senior Engineer http://twitter.com/jpcs
MarkLogic Corporation http://www.marklogic.com

David Lee

unread,
Aug 9, 2011, 8:03:52 AM8/9/11
to carro...@googlegroups.com
Thanks for the info !
I would vote for the fastest leg-up to get a working implementation, and the
one thats most easily modifiable.
Then once we get a working implementation we can (should) revisit various
layers to see if we can do it better.

If noone else is chomping at the bit, I volunteer to take this W3C XQuery
XML and first

1) Try to reproduce in my own environment a working implementation of a
standalone parser (not an applet).
2) Start adding mods one by one to evolve to the carrot specs (there is one
right ? :)

I dont particularly like JJTree because if has too little flexibility in
your in-memory data model, but if a framework is already functional that
uses it, its probably best to keep it for now. This then at least gives us
less variables for debugging.

-David

Evan Lenz

unread,
Aug 9, 2011, 4:02:55 PM8/9/11
to carro...@googlegroups.com
That's a good point. An XML version of the parse tree might be nicer than
XQueryX, especially for compiling trivial expressions. (Just get the
string-value!)

Despite the verbosity of yapp-xslt's output, I appreciated the fact that
the resulting XML was just a bunch of element markup added to the original
expression. That does seem way easier to use for our purposes than XQueryX.

Evan

David Lee

unread,
Aug 9, 2011, 4:06:19 PM8/9/11
to carro...@googlegroups.com, carro...@googlegroups.com
Having studied neither ... What's the difference? Isn't xqueryx just An XML dump of the ast?

Sent from my iPhone

Evan Lenz

unread,
Aug 9, 2011, 4:37:45 PM8/9/11
to carro...@googlegroups.com
XQueryX doesn¹t have the original query text. So you don't see the string
"for", etc. in e.g., http://www.w3.org/TR/xqueryx/#Example1-XQueryX

Whereas yapp-xslt outputs the original query text with element decorations.

Evan

Evan Lenz

unread,
Aug 9, 2011, 8:21:03 PM8/9/11
to carro...@googlegroups.com
Sounds like a great plan! Re: the Carrot spec, I think I misplaced it... ;-)

I can take a stab at some of the grammar modifications. Let me know if the formatting gets screwed up over email.

For the top-level production, we'll eventually add the other top-level stuff (imports, includes, top-level parameters, etc.), but we can start with this for now:

CarrotModule ::= ((VarDecl | FunctionDecl | RuleDecl) Separator)*

VarDecl and FunctionDecl can be modified from their XQuery counterparts:

[24]   VarDecl   ::=   "declare" "variable" "$" QName TypeDeclaration? ((":=" ExprSingle) | "external")

(I changed ExprSingle to Expr in VarDecl above, which I think is best, unless there is some need for it being that way that I'm not aware of.)

[26]   FunctionDecl   ::=   "declare" "function" QName "(" ParamList? ")" ("as" SequenceType)? ":=" (EnclosedExpr | "external")

Let's see if I can take a crack at a production rule for rule definitions:

RuleDecl ::= "^" (ModeName ("|" ModeName)*)? "(" Pattern (";" RuleParamList)? ")" ":=" Expr
ModeName ::= (QName | "#current" | "#default")
RuleParamList ::= ParamWithDefault ("," ParamWithDefault)*
ParamWithDefault ::= Param (":=" ExprSingle)?

(Pattern comes from the XSLT 2.0 spec.)

And now, to extend expressions:

[84]   PrimaryExpr   ::=   Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | OrderedExpr | UnorderedExpr | Constructor
| RulesetCall 

[85]   Literal   ::=   NumericLiteral | StringLiteral | TextNodeLiteral

[109]   ComputedConstructor   ::=   CompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructor
| CompCopyConstructor

TextNodeLiteral is similar to [144] StringLiteral:

TextNodeLiteral ::= ('`' (PredefinedEntityRef | CharRef | EscapeTick | [^`&])* '`')  /* ws:explicit */
EscapeTick ::= "``"

CompCopyConstructor ::= "copy" "{" Expr "}"

RulesetCall ::= "^" ModeName? "(" Expr? (";" RulesetCallParamList)? ")"
RulesetCallParamList ::= InitializedParam ("," InitializedParam)*
InitializedParam ::= Param ":=" ExprSingle

The semi-colon in rules (both definitions and calls) is not my favorite part of the syntax. An alternative would be to use a comma instead, and ExprSingle for the first "argument" instead of Expr. It's an open question in my mind. But maybe let's see how far we get with the above?

Evan

Evan Lenz

unread,
Aug 9, 2011, 8:27:42 PM8/9/11
to carro...@googlegroups.com
Oops, I forgot explicit priorities. Something like this:

RuleDecl ::= "^" (ModeName ("|" ModeName)*)? "(" Pattern (";" RuleParamList)? ")" (IntegerLiteral | DecimalLiteral)? ":=" Expr

Evan

Evan Lenz

unread,
Aug 10, 2011, 12:16:32 AM8/10/11
to carro...@googlegroups.com
I also forgot tunnel parameters:

ParamWithDefault ::= ("tunnel")? Param (":=" ExprSingle)?
InitializedParam ::= ("tunnel")? Param ":=" ExprSingle

Evan

David Lee

unread,
Aug 10, 2011, 6:59:56 AM8/10/11
to carro...@googlegroups.com

I found a newer link to the XQuery parser files

 

http://www.w3.org/2010/02/qt-applets/

 

Any suggestion on what "base" we use for Carrot ? 

 

(The old link had some funky zip files without full source).

 

 

----------------------------------------

David A. Lee

dl...@calldei.com

http://www.xmlsh.org

 

John Snelson

unread,
Aug 10, 2011, 7:05:40 AM8/10/11
to carro...@googlegroups.com
That's not a big deal. You can't do a lot with the original query text
anyhow, and the W3C already provides a stylesheet to output XQueryX as
XQuery.

John

David Lee

unread,
Aug 10, 2011, 7:42:33 AM8/10/11
to carro...@googlegroups.com
I've started working with the code here:

http://www.w3.org/2010/02/qt-applets/

I've made tiny progress but got stuck. I've asked Liam for help.
Other suggestions welcome

Here's the message to Liam:

--------------
Hi Liam. Great to see you again at Balisage.
I'm working on trying to run the XQuery grammar files here :

http://www.w3.org/2010/02/qt-applets/
And so far running into problems. Some I've resolved.

1) missing xerces.jar -> I got this on the web
2) Missing grammar.dtd -> I guessed the URL and got it.

Now I'm stuck in the build process while it looks for various "style" xsl
files none of them are in the source or library zips.

From build.xml:

<property name="strip-grammar-file" value="../../style/strip.xsl"/>
<property name="assemble-spec-file"
value="../../style/assemble-parser-note.xsl"/>
<property name="grammar2spec-file" value="../../style/grammar2spec.xsl"/>
<property name="style-spec-file"
value="../../style/xmlspec-override.xsl"/>
<property name="style-shared-file"
value="../../style/lexnote-shared.xsl"/>


Where would I find these ?
Preferably is there a location with ALL the dependent files so that I could
run the parser generator ?
If not ... I may be asking you more questions as I peel the onion.

Thanks very much !

-David Lee

--------------------


----------------------------------------
David A. Lee
dl...@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: carro...@googlegroups.com [mailto:carro...@googlegroups.com] On

Behalf Of John Snelson
Sent: Wednesday, August 10, 2011 7:06 AM
To: carro...@googlegroups.com
Subject: Re: [Carrot] Re: Parser and Architecture

David Lee

unread,
Aug 10, 2011, 8:15:47 AM8/10/11
to carro...@googlegroups.com
When we get done with the first implementation I suggest we change the build
code to use Carrot instead of XSLT.
Just like the first C compiler was written in ASM but then rewritten in C.

Evan Lenz

unread,
Aug 10, 2011, 11:51:43 AM8/10/11
to carro...@googlegroups.com
I had had the same thought. :-) I believe the Glasgow Haskell Compiler (GHC) is written in Haskell.

David Lee

unread,
Aug 29, 2011, 8:20:14 PM8/29/11
to carro...@googlegroups.com
Not getting anywhere getting all the files from the W3C in order to compile
the full XQuery parser from the source XML.

Thinking of just starting with the JavaCC and/or JTree code and skipping the
source XML.
Not really fond of JTree ... but since its already implemented that way it
might make a good start.

Suggestions ?

John Snelson

unread,
Aug 29, 2011, 8:26:48 PM8/29/11
to carro...@googlegroups.com
Apparently you're building from the wrong directory - or so conversation I was partially paying attention to said. I think you need to build from the grammar-11/parser/ directory.

John

David Lee

unread,
Aug 29, 2011, 8:35:57 PM8/29/11
to carro...@googlegroups.com
Thats not it.

I'm simply missing lots of files.
Whats posted on the W3C site is a tiny subset of what's required. Lots of
references to ../../xxx
which is assumed to exist ...

Liam said he was going to refer my question to the person (unknown?) who's
in charge of the grammer but havent heard anything else.

Also said if I was a member I could pull the files (but I'm not).

John Snelson

unread,
Aug 29, 2011, 8:49:38 PM8/29/11
to carro...@googlegroups.com
Looking at the build.xml file in that directory, I think the only thing you really need from "lower" directories is xalan.jar. The other paths are either for testing or building the zip file. Try stripping down the build file so they aren't called?

Starting with the JJTree files wouldn't be the end of the world, I guess. Just not as nice to modify as it could be.

John

David Lee

unread,
Aug 29, 2011, 8:56:10 PM8/29/11
to carro...@googlegroups.com
For one I appear to be missing a whole set of directories


BUILD FAILED
C:\Work\DEI\carrot\xgrammar\grammar\parser\build.xml:415: The following
error occurred while executing this line:
C:\Work\DEI\carrot\xgrammar\grammar\parser\build.xml:496: input file
C:\Work\DEI\carrot\xgrammar\xquery-11\src\xquery.xml does not exist


The directory
C:\Work\DEI\carrot\xgrammar\xquery-11

Doesnt exist and I cant find it on W3C web site.
Any suggestions ?

David Lee

unread,
Aug 29, 2011, 9:00:00 PM8/29/11
to carro...@googlegroups.com
What would it take to get the entire source/build tree from the W3C ?

John Snelson

unread,
Aug 30, 2011, 6:29:44 AM8/30/11
to carro...@googlegroups.com
This works for me, producing the JJTree output from the
xpath-grammar.xml file:

cd parser
ant gen-grammar -Dlanguage=xquery10 -Dspec=xquery10

You can change the parameters to generate different languages. For instance:

ant gen-grammar -Dlanguage=combined -Dspec=xquery10 -Dspec2=fulltext
-Dspec3=update

The results are put into build/${language}. I don't think you need
anything else in the build file.

John

David Lee

unread,
Aug 30, 2011, 6:55:07 AM8/30/11
to carro...@googlegroups.com
THANK YOU !!!!
That was the ticket.
I had the build options all screwed up.
Reply all
Reply to author
Forward
0 new messages