Which Parser Generator to Use?

Shlomi Fish

unread,

Sep 6, 2009, 10:40:44 AM9/6/09

to module-...@perl.org

Hi all!

Which parser generator do you recommend to use for a Perl project. What I've
looked at so far:

1. Berkeley Yacc for Perl - works pretty well, but is kinda limited.

2. Parse::RecDescent - very impressive feature set, but a little slow, and has
been under-maintained (though it seemed to have improved slightly with several
new releases in 2009). It also tends to be hard to debug its errors.

3. Parse::Yapp - http://search.cpan.org/dist/Parse-Yapp/ - I tried to use it
in https://svn.berlios.de/svnroot/repos/web-cpan/Text-Qantor/ but it gives me
an error for what appears to be a valid syntax, and for the life of me I
cannot understand why it is.

4. There's a new version of GNU bison with support for multiple language
backends. I tried writing a backend for Perl 5, but I gave up on the m4
hacking (I think that m4 must die!).

5. There's also ANTLR - http://www.antlr.org/ :

http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets says:

Perl - Early prototyping. Simple lexer is working.

6. Can I interact with the Parrot Grammar Engine (PGE)? Any input would be
useful.

--------------

I probably missed many others. Any recommendations would be appreciated.

Regards,

Shlomi Fish

--
-----------------------------------------------------------------
Shlomi Fish http://www.shlomifish.org/
The Case for File Swapping - http://shlom.in/file-swap

Chuck Norris read the entire English Wikipedia in 24 hours. Twice.

Ryan Voots

unread,

Sep 6, 2009, 3:35:50 PM9/6/09

to module-...@perl.org

On Sunday 06 September 2009 10:40:44 Shlomi Fish wrote:
> Hi all!
>
> Which parser generator do you recommend to use for a Perl project. What
> I've looked at so far:
>
> 1. Berkeley Yacc for Perl - works pretty well, but is kinda limited.

Can't say i've used it before, but if its really Yacc then i believe that
Parse::Yapp does a better job of having a perl yacc.

> 2. Parse::RecDescent - very impressive feature set, but a little slow, and
> has been under-maintained (though it seemed to have improved slightly with
> several new releases in 2009). It also tends to be hard to debug its
> errors.

I've mostly had issues with either speed or getting precedence right with
Parse::RecDescent

> 3. Parse::Yapp - http://search.cpan.org/dist/Parse-Yapp/ - I tried to use
> it in https://svn.berlios.de/svnroot/repos/web-cpan/Text-Qantor/ but it
> gives me an error for what appears to be a valid syntax, and for the life
> of me I cannot understand why it is.

there's an ambiguity there in your grammar, i've run into that many times with
Math::Farnsworth

yapp -v will make a file.output that describes everything in the grammar and
can easily help with the debugging of things, the warnings about useless and
unused terminals are harmless (though could mean you've got a
typo). useless rules just means that they're in there but can't be reached,
its sort of like

if (something that is always true)
{
do something here
}
else
{
die "true != true";
}

your shift reduce conflict is in the rule

plain_para_text: TEXT
| plain_para_text TEXT { my $t1 = $_[1] ; my $t2 = $_[2] ; [$t1->[0].$t2-
>[0], $t1->[1]] }
;

its happening because it can recurse infinitely on itself and then see the TEXT
rule, (it'd also be expecting a second TEXT after that)

i'm not 100% sure what you were trying to do there, but the following rule
does what i THINK you were intending

plain_para_text: TEXT { my $t1 = $_[1] ; my $t2 = $_[2] ; [$t1->[0].$t2->[0],
$t1->[1]] };

as far as bison and ANTLR go i've never used either of them so i won't
comment, and i've got no idea about using PGE with perl so...

There's also Parse::Earley out there i played with that a bit but not as much
as i have with RecDescent and Yapp

P.S. sorry if this got sent twice, i found an oddity in my smtp settings and
don't think anything was ever making it out.

Jonathan Leto

unread,

Sep 8, 2009, 2:35:41 PM9/8/09

to Shlomi Fish, module-...@perl.org

Howdy,

I haven't used it in production, but you might want to look at
HOP::Parser as well:

http://search.cpan.org/~ovid/HOP-Parser-0.02/

Cheers,

--

Jonathan Leto
jona...@leto.net
http://leto.net

Austin Schutz

unread,

Sep 8, 2009, 3:43:47 PM9/8/09

to Jonathan Leto, Shlomi Fish, module-...@perl.org

>>
>> 1. Berkeley Yacc for Perl - works pretty well, but is kinda limited.
>>

I'm not sure what (if any) practical advantage this would have over
bison. I get the sense it's less well maintained.

>> 2. Parse::RecDescent - very impressive feature set, but a little
slow, and has
>> been under-maintained (though it seemed to have improved slightly
with several
>> new releases in 2009). It also tends to be hard to debug its errors.
>>

I tried this. Works ok, two or three orders of magnitude slower than
bison/C for me. Debugging tools are pretty good. Docs are pretty good.

>> 3. Parse::Yapp - http://search.cpan.org/dist/Parse-Yapp/ - I tried
to use it
>> in https://svn.berlios.de/svnroot/repos/web-cpan/Text-Qantor/ but it
gives me
>> an error for what appears to be a valid syntax, and for the life of me I
>> cannot understand why it is.

For people who aren't experts in the field most of the grammar errors
are completely inscrutable for all of these tools, even after reading
the docs. Start with something simple and make it more complex until it
stops working, then try phrasing it differently. Well, works for me. I
haven't tried this specific tool.

>>
>> 4. There's a new version of GNU bison with support for multiple language
>> backends. I tried writing a backend for Perl 5, but I gave up on the m4
>> hacking (I think that m4 must die!).

Bison is fast and relatively simple. I'm not sure about using it
directly from perl, but I wrote a C++ program to parse router configs
and spit out Data::Dumper() style perl struct output. Very fast.
_Relatively_ easy to add in exceptions for poorly behaved grammar. You
should be able to use this directly via XS or Inline if you care about
more direct integration. bison + c/c++ + valgrind works pretty well to
make a well behaved parser (w/out valgrind I always leak a bunch of
memory when writing in more uh.. "hands on" languages).

I did look at yacc but bison has more features. Also bison is consistent
across platforms.

>>
>> 5. There's also ANTLR - http://www.antlr.org/ :
>>
>> http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets says:
>>
>> Perl - Early prototyping. Simple lexer is working.
>>

I have used the python version of antlr. I found it to be more difficult
than bison/C (steeper learning curve), and maybe one or two order of
magnitude slower than bison/C.

The author and the users seem very competent, but if you are like me and
just want to get some parsing done this may be more effort than it's worth.

Cross language support is excellent between java/python, not sure about
perl. If you are trying to create an AST it will probably work fine.
Beyond that... ?

If you are writing your own grammar this is a really powerful tool. If
you are stuck trying to parse someone else's jive it may not be as useful.

Error messages for the user when something isn't parsable are also
fairly inscrutable, imo.

>> 6. Can I interact with the Parrot Grammar Engine (PGE)? Any input would be
>> useful.
>>

If you look at the parrot stuff I would be interested to hear how well
that work for you.

There's also the 'roll your own' approach. That's pretty fast and less
difficult than you might expect. Also depending on what you are parsing,
if a grammar is poorly behaved (possibly because the !@!@ing vendor
decides to be retarded instead of consistent) it can be brutal to try to
add weird exceptions in a format any of the above tools will recognize
happily.

Another advantage is that when you write your own parser it's easier to
figure out what happens and what to do when you get something unexpected.

Knowing what I know now I would tend to opt for bison/C or roll your
own, depending on how speed critical your app is. If I was writing my
own language maybe antlr.

If you would like examples for any of the above I'd be happy to share,
please email me off-list.

Austin

Flavio S. Glock

unread,

Sep 8, 2009, 3:54:15 PM9/8/09

to module-...@perl.org, Shlomi Fish

v6.pm has some support for grammars.
here is one usage example:

http://cpansearch.perl.org/src/MSILVA/Language-Tea-0.03/lib/Language/Tea/Grammar.pm

- Flávio S. Glock

Shlomi Fish

unread,

Sep 24, 2009, 9:12:27 PM9/24/09

to module-...@perl.org, publiustemp-m...@yahoo.com

Hi Ovid!

On Tuesday 08 Sep 2009 22:46:45 Ovid wrote:
> --- On Sun, 6/9/09, Shlomi Fish <shl...@iglu.org.il> wrote:
> > From: Shlomi Fish <shl...@iglu.org.il>

> >
> > 2. Parse::RecDescent - very impressive feature set, but a
> > little slow, and has
> > been under-maintained (though it seemed to have improved
> > slightly with several
> > new releases in 2009). It also tends to be hard to debug
> > its errors.
>

> Hi Shlomi,
>
> I didn't see it mentioned yet, but you might want to check out Damian
> Conway's new http://search.cpan.org/dist/Regexp-Grammars/
>
> It requires 5.10, but it's much faster than Parse::RecDescent and has a
> clean syntax very close to Perl 6 rules. It's well documented and hooks
> directly into the Perl regex engine, hence its speed.
>

Thanks for the recommendation!

I've converted the code in the svn to use it. After some bugs I had in my
grammar, due to things I didn't understand there, I got all tests to finally
pass there. Now, it seems to be working pretty well.

So now I can continue to enhance Qantor after a long time of inability to work
on it.

Regexp-Grammars seems very nifty.

Regards,

Shlomi Fish

--
-----------------------------------------------------------------
Shlomi Fish http://www.shlomifish.org/

http://www.shlomifish.org/humour/ways_to_do_it.html

Jonathan Swartz

unread,

Apr 29, 2012, 8:56:14 AM4/29/12

to module-...@perl.org

I'm thinking about a module/script to unify a bunch of code tidiers and validators in a single place. With a single command, e.g.

% tidyall

you could apply the appropriate tidiers and validators to files in your project (e.g. your git directory hierarchy). It would tidy each file as needed and throw an error result if any of the validations failed.

Features:
* Only tidy files that haven't changed since the last time (using File::Modified and a file cache)
* A single config file with options for all the tidiers and validators, as well as which files to apply them to
* Easy to add new validators/tidiers as plugins

This would be a command that the anal-retentive among us could run on our projects before each commit.

Tidiers and vaildators for Perl include Perl::Tidy, Pod::Tidy, and Perl::Critic. There are also various tidiers and validators for HTML, css and javascript out there.

Comments and suggestions for names welcome. Devel::MultiTidy?

Thanks
Jon

David Nicol

unread,

Apr 30, 2012, 2:17:20 PM4/30/12

to Jonathan Swartz, module-...@perl.org

On Sun, Apr 29, 2012 at 7:56 AM, Jonathan Swartz <swa...@pobox.com> wrote:
> Comments and suggestions for names welcome. Devel::MultiTidy?

I like the Any:: name space for things that offer single interfaces to
multiple back-ends. One could argue that Tidying really doesn't fit
there though. And that argument could be countered with hypothetical
examples involving large projects with different styles, enforced by
different tidying engines, for different parts of it, and the desire
for the scripts called at check-in time to run the correct tidying
engine by looking up per-file metadata.

--
In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" are to be interpreted using situational ethics.

Jonathan Swartz

unread,

Apr 30, 2012, 2:36:36 PM4/30/12

to David Nicol, module-...@perl.org

Yeah, Any::Tidy seemed to suggest to me "use the best of available tidiers".

I'm leaning now towards Devel::TidyAll, with tidyall being the name of the script.

Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯

unread,

May 2, 2012, 3:54:35 AM5/2/12

to module-...@perl.org

> Devel::MultiTidy?

<https://pause.perl.org/pause/query?ACTION=pause_namingmodules#Avoid_the_too_general_nouns_like_Dev>

I suggest top-level `Code` instead, it is already in use:

cpan[1]> d /[/]Code/
Distribution ALFIE/Code-Dumper-0.01.tar.gz
Distribution ANDREWF/CodeBase-0.86.tar.gz
Distribution CODECHILD/XML-Bare-0.07.tar.gz
Distribution CODECHILD/XML-Bare-SAX-Parser-0.01.tar.gz
Distribution CODEHELP/XML-QOFQSF-0.05.tar.gz
Distribution FDALY/Code-Perl-0.03.tar.gz
Distribution FRANCISCO/CodeManager-0.02.tar.gz
Distribution KITOMER/Code-Class-C-0.08.tar.gz
Distribution MITHALDU/Code-Statistics-1.112980.tar.gz
Distribution NAZRI/Code-Generator-Perl-0.03.tar.gz
Distribution SWALTERS/Code-Splice-0.01.tar.gz
Distribution SZABGAB/Code-Explain-0.02.tar.gz
12 items found

signature.asc