[ANN] Neotoma 1.3

Sean Cribbs

unread,

Nov 13, 2009, 7:10:45 PM11/13/09

to Erlang, neoto...@googlegroups.com, tri...@googlegroups.com

I'm pleased to announce yet another update to Neotoma. The major features and changes in this release:

* Transformation/semantic analysis code can be written inline with the PEG, enclosed in backticks (`) before the concluding semi-colon for each reduction. The variables available to your code are Node (the parse result, including transformed results of subtrees), and Idx (the current index into the input).
* To do an identity transformation on a rule (pass along the result unchanged), use the tilde (~) instead of backtick-quoted code.
* Extra functions for use inside your parser can be added to the end of the PEG, also enclosed in backticks (`).
* All internal uses of the process dictionary have been expunged and instead use the memoization table.
* All of the modules have been renamed and the parser re-bootstrapped. The most significant user-facing change is that 'peg_gen' is now called 'neotoma'. See http://github.com/seancribbs/neotoma/commit/cbf275a7b638f3c517cbc5b637844bf270b54e5 for more details

What is Neotoma?

Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell.

Browse & Fork: http://github.com/seancribbs/neotoma
Download: http://github.com/seancribbs/neotoma/downloads
Source: git clone git://github.com/seancribbs/neotoma.git
Discussion: http://groups.google.com/group/neotoma-erl

Cheers,

Sean Cribbs

Tony

unread,

Nov 14, 2009, 12:35:37 AM11/14/09

to Neotoma

Really glad to see an inline transformation syntax as this is one of my favorite features of Leex.

Now that you are producing Erlang code AOT, will records be supported as they are in Leex? (since records are output verbatim in the resulting Erlang code from the original Leex files, and thus can get processed by EPP when the resulting Erlang code is compiled by the normal Erlang compiler)

I have continued to develop my attempts at a minimalized, more Erlang-like Reia using Leex because I felt attempts at a Neotoma-based parser were killing my developmental gumption. I would still very much like to switch to Neotoma if I feel it addresses all my needs. I still worry about using a PEG for a large grammar, and I still do not have a good understanding of how PEGs work.

--
Tony Arcieri
Medioh/Nagravision

Phil Pirozhkov

unread,

Nov 14, 2009, 5:57:14 AM11/14/09

to neoto...@googlegroups.com

Sean,

glad to hear Neotoma is getting mature.
Could you please give an update on transformation syntax?
The link you've given doesn't work.
Hope you've adopted syntax i've suggested and partially implemented in my branch:
http://github.com/pirj/neotoma/blob/master/extra/leg.peg

Glad to hear you've finally agreed with that Neotoma PEG syntax should not rely
on Erlang syntax, so that Reia and LFE developers can all use common syntax.

As a matter of fact i've stuck on Retem (templating) development for a different
reason that our misunderstanding. That is an issue that crosses the line between
Neotoma and other PEG parser generators i've tried and prevents from moving
further with the simpliest task i had to do.

So, what a templating engine is? it's a general text and some embraced expressions:
"hello, {name} !"

My syntax rules are here: http://github.com/pirj/ryan/blob/master/src/retem/retem.peg
And yes, i've used my transformations to avoid Erlang syntax in Reia-based templating engine:
dot_expression <- id '.' id : (dot, 1, 3);

I'm doing
peg_gen:file("retem.peg"). c(retem). l(retem).
retem:parse("{for a in aa}{if true}{xxx}{endif}{endfor}").

and it ends up in ILR, for which Neotoma doesn't have any detection/elimination mechanism.

http://github.com/pirj/ryan/commit/84f54825cb89a72d587cc103230c0239c5b4d6ce
That's a commit that seem to work around ILR, but it cannot work fine with text outside braces.

I've tried to port my syntax to that C PEG/LEG library, and it worked just fine.

Since Neotoma is the only one for Erlang, and there are some awesome projects
around the block, such are Herml and ErlyDTL, i've left the idea of writing my own
parser at least until Neotoma resolves this or Reia is mature enough to implement
PEG parser with it.

Phil

Sean Cribbs

unread,

Nov 14, 2009, 12:00:39 PM11/14/09

to neoto...@googlegroups.com

Tony,

AFAIK, you could put your record definition in the "additional code"
section at the bottom of the grammar. However, you might want to check
that assumption out, I'm unsure whether record definitions need to
appear before functions that use them.

Sean

Sean Cribbs

unread,

Nov 14, 2009, 12:24:45 PM11/14/09

to neoto...@googlegroups.com

Phil Pirozhkov wrote:
> Sean,
>
> glad to hear Neotoma is getting mature.
> Could you please give an update on transformation syntax?
> The link you've given doesn't work.
> Hope you've adopted syntax i've suggested and partially implemented in my branch:
> http://github.com/pirj/neotoma/blob/master/extra/leg.peg
>
>

I took your LEG thing into consideration, and also considered a
yecc-like matchspec notation. However, in the short term it seemed
easiest to just take the given code and make it be the body of the
transformation function. You can do so like this:

nonterminal <- reduction ` % erlang code goes here `;

> Glad to hear you've finally agreed with that Neotoma PEG syntax should not rely
> on Erlang syntax, so that Reia and LFE developers can all use common syntax.
>
>

At this point I'm only supporting Erlang syntax (because of the
parser-generation piece), but if there is an easy way to generate Reia
parsers, I'm all ears.

> As a matter of fact i've stuck on Retem (templating) development for a different
> reason that our misunderstanding. That is an issue that crosses the line between
> Neotoma and other PEG parser generators i've tried and prevents from moving
> further with the simpliest task i had to do.
>
> So, what a templating engine is? it's a general text and some embraced expressions:
> "hello, {name} !"
>
> My syntax rules are here: http://github.com/pirj/ryan/blob/master/src/retem/retem.peg
> And yes, i've used my transformations to avoid Erlang syntax in Reia-based templating engine:
> dot_expression <- id '.' id : (dot, 1, 3);
>
> I'm doing
> peg_gen:file("retem.peg"). c(retem). l(retem).
> retem:parse("{for a in aa}{if true}{xxx}{endif}{endfor}").
>
> and it ends up in ILR, for which Neotoma doesn't have any detection/elimination mechanism.
>
> http://github.com/pirj/ryan/commit/84f54825cb89a72d587cc103230c0239c5b4d6ce
> That's a commit that seem to work around ILR, but it cannot work fine with text outside braces.
>
> I've tried to port my syntax to that C PEG/LEG library, and it worked just fine.
>
> Since Neotoma is the only one for Erlang, and there are some awesome projects
> around the block, such are Herml and ErlyDTL, i've left the idea of writing my own
> parser at least until Neotoma resolves this or Reia is mature enough to implement
> PEG parser with it.
>

Maybe there's still some confusion here. Neotoma is just a
parser-generator - you'll still need to do semantic analysis yourself.
You might take a look at the extra functions that the metagrammar-parser
includes - it builds up a symbol table, and then when transforming the
top rule, verifies that:

1) all nonterminals have reductions
2) all reductions are reachable

This is something the parser itself, with simple data transformations,
could not check.

It's also possible that I haven't fully comprehended LEGs yet. Perhaps
LEG could be something that goes alongside neotoma's existing PEG?

Sean

Tony

unread,

Nov 14, 2009, 1:19:04 PM11/14/09

to neoto...@googlegroups.com

On Sat, Nov 14, 2009 at 10:00 AM, Sean Cribbs <seanc...@gmail.com> wrote:

Tony,

AFAIK, you could put your record definition in the "additional code" section at the bottom of the grammar. However, you might want to check that assumption out, I'm unsure whether record definitions need to appear before functions that use them.

I'm fairly certain they do. This works in Leex because Leex places the "additional code" at the top of the output file.

--
Tony Arcieri
Medioh/Nagravision

Sean Cribbs

unread,

Nov 14, 2009, 2:13:16 PM11/14/09

to neoto...@googlegroups.com

Tony wrote:

On Sat, Nov 14, 2009 at 10:00 AM, Sean Cribbs <seanc...@gmail.com> wrote:

Tony,

AFAIK, you could put your record definition in the "additional code" section at the bottom of the grammar. However, you might want to check that assumption out, I'm unsure whether record definitions need to appear before functions that use them.

I'm fairly certain they do. This works in Leex because Leex places the "additional code" at the top of the output file.

It should be simple to put the extra code at the top of the file. I'll look into it.

Sean

Sean Cribbs

unread,

Nov 20, 2009, 3:41:13 PM11/20/09

to Torben Hoffmann, neoto...@googlegroups.com

That's great! Have you published your implementation anywhere?

Sean

Torben Hoffmann wrote:

Hi Tony,

For what it is worth: I have implemented a parser for WikiCreole using
Neotoma and it works quite fine. Not sure if that fits your notion of large,
though ;-)

This was with Neotoma 1.1 and right now I am just dying to upgrade to 1.3
and throw out a bunch of code that really has to be inline transformations.

Cheers,
Torben

On Sat, Nov 14, 2009 at 06:33, Tony Arcieri <to...@medioh.com> wrote:

Sean Cribbs

unread,

Nov 22, 2009, 9:52:08 AM11/22/09

to Torben Hoffmann, neoto...@googlegroups.com

Thanks, looking forward to hearing more about your project. I'm not completely sold on the backtick syntax, but it seemed the easiest, considering it would be rarely used in Erlang code.

Sean

Torben Hoffmann wrote:

Not really, it is part of a on-the-side start-up which we hope can bring in some money.

But the WikiCreole part could perhaps be released as an example of how to use Neotoma - I will ask my "co-founders" if they think it is safe to release it.

BTW: I have to update to the new inline stuff since my implementation is based on 1.1 and it is a lot prettier to generate the AST as inline commands on the grammar, not to mention easier to understand!

Cheers,
Torben

--
http://www.linkedin.com/in/torbenhoffmann

Sean Cribbs

unread,

Nov 22, 2009, 3:59:45 PM11/22/09

to Torben Hoffmann, neoto...@googlegroups.com

The only other PEG implementation I'm familiar with is Ruby's Treetop (and a little bit of Haskell from Ford's thesis). Treetop wraps each node in the syntax tree in an object, into which accessors for labels and RHS NTs are added. Obviously since this is a very OO pattern, it would not translate well into Erlang without some crazy closure-magic that hardly seems worth it to me.

I did consider yecc's matchspec-style of transformation, and Phil has some LEG transformations (although I don't quite understand them yet) here: http://github.com/pirj/neotoma. I'm not certain that my solution is best, but here's some of the rationale for why I did it the way I did:

CFGs for yecc are factored very differently than PEGs - each possibility is enumerated, allowing instance-specific matching, whereas PEGs tend to present all possible derivations at once, using optionals, repetition, lookahead and ordered choice within the same rule. That makes automatic matching more challenging, at best. This may be a personal preference, but I don't agree with Treetop's philosophy that allows transformations to be done within a choice operator, I'd rather push those down to a lower level in the grammar. The choices would have to be factored out into separate rules inside the parser anyway to support that.

Now, if you factor your grammar in certain ways, you could achieve nearly what yecc provides w.r.t. pattern matching, but it's likely your higher-level rules won't contain much but the identity transformation or an extraction of a lower-level result. If I recall correctly, you tend to call out to support functions anyway when you want to do something complicated in yecc.

Most of all, I want to be flexible and the generated parsers to remain comprehensible. Having a clear mapping between the grammar and the code (as opposed to the splayed-out state machine that Treetop generates, and likewise yecc) is important to me.

Sean

Torben Hoffmann wrote:

Sean,

Let's step a bit back from the joy of having a new feature that makes life easier and gaze a bit at what is really important.

As I see it the key is to be able to generate a nice AST that is easy to work with going forward. In Neotoma 1.1 the transformation functions could help out with that and in Neotoma 1.3 we have the inline transformations with the backtick syntax.

Looking at yecc they use structure building code [1] and that seems to be a clean way of solving the problem - at least it has the added benefit that it can be spelled out for each clause in a production rule (not sure if Neotoma supports that with the current syntax).

How does other PEG implementations bridge the gap between the production rules and the AST? There might be some good ideas on this that could be used to lift the backtick solution to even higher hights!

References:
[1] http://erlang.org/doc/man/yecc.html

Cheers,
Torben

--
http://www.linkedin.com/in/torbenhoffmann

Reply all

Reply to author

Forward