--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Nice, but providing the grammar as a plain string looks somewhat
unnatural to me. Why not something like this (parser being a macro)?
(def as-and-bs
(parser
S = AB* .
AB = A B .
A = "a" + .
B = "b" + .))
I.e., symbols denote non-terminals, strings denote terminals, and the
dot indicates the end of a rule.
Bye,
Tassilo
On Tue, Apr 9, 2013 at 1:33 AM, Tassilo Horn <ts...@gnu.org> wrote:Nice, but providing the grammar as a plain string looks somewhat
unnatural to me. Why not something like this (parser being a macro)?
(def as-and-bs
(parser
S = AB* .
AB = A B .
A = "a" + .
B = "b" + .))
I played around with that, but even if you suppress evaluation by using a macro, Clojure's reader makes strong assumptions about certain symbols. For example, it is standard in EBNF notation for {} to mean zero-or-more. But if you include {A B C} in your grammar using the macro approach, Clojure's reader will throw an error because it treats {} as a map and expects an even number of forms to follow. That was the main reason, but it also makes the notation much more sensitive to whitespace (for example, AB * versus AB*). Gradually, those little issues start making it look less and less like traditional notation. There's something really nice about just being able to copy and paste a grammar off of a website and have it just work.
What do you think would be gained by making it a macro? From my perspective, a macro is essentially just a string that is being processed by the Clojure reader (and thus subject to its constraints). If the grammar were expressed in the way you propose, is it any easier to build up a grammar programmatically? Is it any easier to compose grammars? If anything, I think it might be harder.
I find this library very exciting. It is incredibly well documented,
already covers a broad range of use cases, I can't wait for trying it.
Do you have a roadmap for the next releases ?
Of interest to me:
- restartable version ? (probably doable, if internal implementation
uses immutable datastructures)
- incremental version ? (Dunno if that's an easy one, and way beyond
my current programming skills anyway to be able to even judge)
- possibility to have transform-map(s) passed as an argument to parse,
so that a collection of views can be applied in parallel and the tree
be "traversed" once ?
Have you tried some tests to compare performance with e.g. same
grammars in Antlr ?
IMHO, it would be cool if a clojure parser library would use a similar
format (exploiting clojure data structures beyond lists where they make
sense), at least internally. Of course, one could still have other
frontends like for the EBNF you are using which would be translated to
the internal format.
Hi Mark,
Amazing stuff, I didn't know, that such general parsing techniques even exist!
One minor comment: it would be nice to add direct links to GLL papers and https://github.com/epsil/gll github repo to save people some googling time.
--
Thanks, this looks fantastic! Is there any way to include comments in the syntax at all?
"expr = add-sub
<add-sub> = mul-div | add | sub
add = add-sub <'+'> mul-div
sub = add-sub <'-'> mul-div
... expr = expr op expr | ...And for LL parser I would do: add = mul-div <'+'> add-sub
add = add-sub <'+'> add-sub
add = mul-div <'+'> add-subAnd in both cases some generated parsers are correct (arithmetically speaking :-) ), but I'd like to understand rules for the first/default parser.But your parser rules are somewhat new to me.Both variations are accepted:add = add-sub <'+'> add-sub add =mul-div <'+'> add-subAnd in both cases some generated parsers are correct (arithmetically speaking :-) ), but I'd like to understand rules for the first/default parser.Could you clarify it a little please?
Here is where my question is coming from:If I were to use such parser in production I'd like it to be unambiguous.And I'd like to detect ambiguity early, before my software ships/deployed. Preferably during build/packaging/deployment time.But since for Clojure projects all these things are somewhat fuzzy, at very least I'd like to detect ambiguity during my app startup.I.e. I'd like to put a big fat assert during initialization phase.Is there a way to do it now (or planned in the future)?
Your readme says "I had difficulty getting his Parsing with Derivatives technique to work in a performant way". I was wondering if you could please elaborate.What kind of performance did you achieve?How does that compare to the GLL parser you implemented?Did you implement memoization/compaction/fixed-point/etc from the latest research?How do the implementations compare in terms of code size and readability?
Thanks, Dmitry.
On Tue, Apr 9, 2013 at 1:33 AM, Tassilo Horn <ts...@gnu.org> wrote:
Nice, but providing the grammar as a plain string looks somewhat
unnatural to me. Why not something like this (parser being a macro)?
(def as-and-bs
(parser
S = AB* .
AB = A B .
A = "a" + .
B = "b" + .))
I.e., symbols denote non-terminals, strings denote terminals, and the
dot indicates the end of a rule.
Bye,
Tassilo
I played around with that, but even if you suppress evaluation by using a macro, Clojure's reader makes strong assumptions about certain symbols. For example, it is standard in EBNF notation for {} to mean zero-or-more. But if you include {A B C} in your grammar using the macro approach, Clojure's reader will throw an error because it treats {} as a map and expects an even number of forms to follow. That was the main reason, but it also makes the notation much more sensitive to whitespace (for example, AB * versus AB*). Gradually, those little issues start making it look less and less like traditional notation. There's something really nice about just being able to copy and paste a grammar off of a website and have it just work.
I understand where you're coming from, though. It definitely is part of the Clojure culture to avoid string representations for many kinds of data (e.g., SQL queries). We do accept it for regular expressions, and for things like #inst "2011-12-31T19:00:00.000-05:00", though, and that's the kind of feel I was going for. Would it be more psychologically palatable to type:
#insta/parser "S = 'a' 'b'"
rather than
(insta/parser "S = 'a' 'b'")
?
What do you think would be gained by making it a macro? From my perspective, a macro is essentially just a string that is being processed by the Clojure reader (and thus subject to its constraints). If the grammar were expressed in the way you propose, is it any easier to build up a grammar programmatically? Is it any easier to compose grammars? If anything, I think it might be harder.
Thanks for the comments,
Mark
Hi, what do you think about dsl version using map?