Testing a grammar

ioba...@gmail.com

unread,

Aug 9, 2016, 10:21:55 PM8/9/16

to marpa parser

I know in advance that my target grammar is complex. So I would like to start at the lower, simpler levels and start testing my lexeme and grammar rules as I write them.

* Can I change the starting rule of a (SLIF) grammar at runtime? I would like to test very basic rules -- the kind that I'll only see in slices far into a file -- (bottom-up) before defining the grammar from the top down. If I can specify a rule and a string (to G->parse or R->read), I can write easy regression tests that each rule recognizes valid strings and rejects invalid strings. I could modify the grammar file(s) for each test, but that seems like a bad idea. I'm all ears if there's a better way to do this.

* At a higher level, can you point me to any great examples of regression tests for a Marpa grammar?

* Even more generally, how do people develop and test a Marpa grammar?

Thanks!

- Ryan

Jeffrey Kegler

unread,

Aug 9, 2016, 10:36:23 PM8/9/16

to Marpa Parser Mailing LIst

Re changing the starting rule -- you can write the SLIF DSL at runtime, adding different ":start" statements. http://search.cpan.org/~jkegl/Marpa-R2-3.000000/pod/Scanless/DSL.pod#Inaccessible_symbol_statement will be useful to quiet various errors and warnings.

Marpa::R2 has a test suite in the cpan/t/ directory. In my regular development I do lots and lots of regression testing on SLIF grammars.

One pretty general approach to developing and testing the grammar is to initially have to produce an AST, and test that's it's right for a test suite of inputs. Once you've got the right AST, add the semantics.

While obviously I've had practice at this, you'll want to take into account other's suggestions -- being the author gives me a perspective which can be helpful to others, but which is very skewed so it can also be quite unhelpful.

Ron Savage: you wrote a couple of blog posts about exactly how to develop a Marpa grammar, didn't you? Jean-Damien: didn't some of yours describe your process? And there may be others in the archives.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ron Savage

unread,

Aug 10, 2016, 12:04:09 AM8/10/16

to marpa parser

(1) On-the-fly fiddling of the grammar is in - ta-da! - the FAQ: http://savage.net.au/Perl-modules/html/marpa.faq/faq.html#q126

(2) Tutorials and articles are on Marpa's unofficial home page: http://savage.net.au/Marpa.html#Tutorials

ioba...@gmail.com

unread,

Aug 10, 2016, 3:09:04 PM8/10/16

to marpa parser

This makes perfect sense. Since the grammar object takes a string, simply manipulate the string!

Thanks for the pointer to handling the inaccessible warnings/errors.

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.

ioba...@gmail.com

unread,

Aug 10, 2016, 3:19:02 PM8/10/16

to marpa parser

Thanks!

I read the FAQ, but dismissed the on-the-fly grammar fiddling as an advanced topic that I didn't need. Now I find it's one of the basic concepts I want to use. :-)

I've read most of those tutorials. In particular, Stuifzand's tutorial includes the tantalizing statement, "[a]nd this leads to developing a set of test files...." What I'm trying to ask is *how* to develop these tests. Re-reading the tutorials, I still don't see any help in this direction. (Of course, I missed FAQ 126, too.)

Thanks for your patience!

- Ryan

Jeffrey Kegler

unread,

Aug 10, 2016, 6:29:18 PM8/10/16

to Marpa Parser Mailing LIst

My own tests come from 3 sources. First, I almost always do test-driven programming, and these tests become the first part of the test suite. Second, any bug, I add a regression test. Third, I sometimes explicitly want something in the test suite that's not there from the other two sources, and add that as the last, pre-release phase. More often, I have cast the net wide in the development tests and bug-fix tests, so that when the last bug is fixed I have a quite adequate test suite already.

Also, I often prioritize debugging and tracing tests, sometimes writing the logic before the code that they debug and/or trace. That way they are available when I develop. Of course, I'm always eager to do the "real" programming. But if it's trace/debug stuff you *know* you'll need (and often you do), you gain time overall by delaying work on the core logic until they are in place. This relates to testing, because it means when I finish development my trace/debug code is itself thoroughly tested.

I often have tests of the trace and debug logic in the test suite. This is not too hard to do if you've developed the trace/debug logic and debugged it along with the main code -- you develop it on a test-driven basis and when done, simply add that test to the test suite. I notice that other programmers rarely test their diagnostics and tracing code, and it can be a chore, but I find it's a life-saver.

Ron Savage

unread,

Aug 10, 2016, 6:30:39 PM8/10/16

to marpa parser, ioba...@gmail.com

Yes, manipulate the string. But now, try very hard not to keep manipulating the same string, but save lots of versions of your work in different files, and you'll end up with a test suite.

Jeffrey Kegler

unread,

Aug 10, 2016, 8:00:21 PM8/10/16

to Marpa Parser Mailing LIst

Oh, and by the way, manipulating the string is (ta dum) Language-Driven Programming, or the Interpreter Design Pattern. It has considerably more potential than most hacks. As Steve Yegge points out, of all the programming Design Patterns, it is the only one that will help your code get smaller.

On Wed, Aug 10, 2016 at 3:30 PM, Ron Savage <r...@savage.net.au> wrote:

Yes, manipulate the string. But now, try very hard not to keep manipulating the same string, but save lots of versions of your work in different files, and you'll end up with a test suite.

--

You received this message because you are subscribed to the Google Groups "marpa parser" group.

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.

ioba...@gmail.com

unread,

Aug 10, 2016, 8:26:30 PM8/10/16

to marpa parser, ioba...@gmail.com

Thanks for your time!

I'm not sure I understand, but I want to. My current plan is to store my grammar in one or more files. For each test battery, I'll load the entire grammar, and each test will specify its top level rule and provide strings that it expects to match or not. Something like the following.

# read in the grammar

my $rules;

{ ... } # fill $rules with the file contents

# test 0: identifier

if (1) {

my $cur_rules = ":start ::= identifier\n" + $rules;

my $grammar = ...;

...

}

# test 1: operator

if (1) {

my $cur_rules = ":start ::= operator\n" + $rules;

my $grammar = ...;

...

}

This sounds what you warn against, but it feels like I'll end up with a good test suite. Can you help me understand?

- Ryan

Jeffrey Kegler

unread,

Aug 10, 2016, 8:29:08 PM8/10/16

to Marpa Parser Mailing LIst

Did somebody warn against that? It looks pretty close to how I create my test suites.

--

You received this message because you are subscribed to the Google Groups "marpa parser" group.

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser+unsubscribe@googlegroups.com.

ian.t...@gmail.com

unread,

Aug 26, 2016, 2:02:35 AM8/26/16

to marpa parser, ioba...@gmail.com

While I realize the question is really in reference to particulars with Marpa, I'd wondered how to "test grammars" as well. I'd initially thought about proving equivalence, but here's a reasonable explanation about why that's not possible:

http://math.stackexchange.com/questions/231187/an-efficient-way-to-determine-if-two-context-free-grammars-are-equivalent

However, the discussion points to,

"Comparison of Context-free Grammars Based on Parsing Generated Test Data"
http://slps.github.io/testmatch/

In the present paper, we leverage systematic test data generation, by which we mean that test data sets are generated by effective enumeration methods for the coverage criteria of interest. These methods do not require any configuration. Also, these methods imply minimality of the test data sets in both an intuitive and a formal sense.

Has anyone any experience with this, or similar, efforts?

Ruslan Shvedov

unread,

Aug 26, 2016, 2:19:06 AM8/26/16

to marpa-...@googlegroups.com

One thing to consider is whether the grammar under testing can be used to generate test inputs, as if that grammar has a bug, so will the inputs -- seems obvious, but I've not really tested it.

My experience of writing a parser for Lua was this: I took the EBNF grammar from the language spec, conferted it to SLIF manually, took the lua test suite files and tested by parsing them to AST, serializing the AST and running the lua interpreter to see if it produced the same results for oritinal file and file produced by serializing the ast. Then I wrote a round-trip parser and was able to just compare the original file and serialized AST.

So, to me, testing a grammar boils down to finding a representative set of inputs and then using the reference semantics producer (lua interpreter in my case) to test serialized AST files.

Hope this helps.

[1] https://github.com/rns/MarpaX-Languages-Lua-AST/tree/master/t/lua5.1-tests

Jeffrey Kegler

unread,

Aug 26, 2016, 12:35:43 PM8/26/16

to Marpa Parser Mailing LIst

Some points I hope will be useful:

First, the "equivalence" of the Stack Exchange article you cite is "equivalence of languages". In most cases you aren't simply recognizing a language (a set of strings), but parsing it according to a grammar. Two language-equivalent CFG's do not necessarily use the same grammar, and therefore for most practical purposes are not equivalent at all.

Second, there is theoretical correctness and correctness of implementation. Theoretical correctness depends on what you're targeting. When I do JSON parsers, for example, there's BNF, and it's the BNF that is the standard -- it's correct if I type it in OK. There's also a test suite, but it's a check that I got the *implementation* right.

Third, most of the problem is getting the semantics right, I find. I don't think either of the articles address that -- they treat correctness as a matter of recognizing a set of strings.

You might want to glance at my recently posted timeline, which touches on this issues.

Hope this helps, jeffrey

Reply all

Reply to author

Forward