Overriding error handlers

44 views
Skip to first unread message

Zeev Atlas

unread,
Dec 30, 2014, 4:06:33 PM12/30/14
to marpa-...@googlegroups.com
Is there a way to tell Marpa to ignore a certain, obviously wrong, element in the input.

Case in mind a language that by at large adhere with the BNF, but a vendor added some extensions without publishing the proper BNF and I would like to postpone doing that task for various reasons.

Example: SQL Server  variety of SQL 2003, keyword TOP:

SELECT TOP 5 column1, column2 FROM table1;

This is a valid  SQL 2003 SELECT statement except of the 'TOP 5'

What I have in mind is something like this

my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl, Handlers => {UNKNOWNTOKEN => \&handle_unknowntoken}});

Now handle_unknowntoken could potentially see the rest of the input from the point where the unknown token was found, report it, make it into spaces and return to the module with a code that say, continue anyway with what you see, or die the way the module is doing now.

I do not know whether this option is available in any shape or whether it is at all feasible.

Thanks
ZA


Jeffrey Kegler

unread,
Dec 30, 2014, 4:35:29 PM12/30/14
to Marpa Parser Mailing LIst
Off the top of my head, and not 100% sure it's helpful in your context, but here goes:

A technique I'd like to see used more is "error rules" -- rules which represent things which should *not* appear in the parse.  These have not been used much because traditional parsers have trouble enough parsing a full set of correct rules, so that any technique that involves more rules is very problematic.

Marpa::R2 has a special "bail out" static method, which can be used in the semantic action of an error rules.

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ron Savage

unread,
Dec 30, 2014, 5:32:26 PM12/30/14
to marpa-...@googlegroups.com
There are 2 ways to handle this:

o Pre-process the input, so 'select top 5' becomes 'select'.

o Change the grammar.

If choosing the 2nd, I'd subclass MarpaX::Languages::SQL2003::AST and commit myself to adding as much MS-specific code to the grammar as needed to make the project work. The advantage is that you can get help here on the syntax for such grammar extensions. And it's immeasurably better if 1 module does this than if everyone tries to do it in parallel, since we all then have access to a new custom module.


Ron Savage

unread,
Dec 30, 2014, 5:35:02 PM12/30/14
to marpa-...@googlegroups.com
I meant to add that I've moved completely away from actions to events, and that's how I think when I approach such problems.

Some people may not want to do that, of course. And it may not be compatible with the super class anyway.

Zeev Atlas

unread,
Dec 30, 2014, 8:03:11 PM12/30/14
to marpa-...@googlegroups.com
Thank you, but both suggestions are not feasible for now.
Changing the grammar is not in my league (at least not yet :)  My knowledge here is limited to what I've learnt in college many years ago.  While I agree that I have to study that, I also have to study XPath and other topics for the same project [Indeed I AM using MarpaX::Languages::SQL2003::AST and switched to XML output based on Durand's suggestion].  That's why I've stated that I would like to postpone that task for now!

Pre-process will require me to do in RegExp many of the tasks that Marpa is doing better and that was the reason that I've switched to Marpa based solution to begin with... and I am still in the begining of a long learning curve :|

ZA

Zeev Atlas

unread,
Dec 30, 2014, 8:08:29 PM12/30/14
to marpa-...@googlegroups.com
My learning ability is now totally overwhelmed... Once I finish learning how to deal with grammars (see above,) I promise to look into the difference between actions and events and maybe adopt your approach.  I am not being factitious, I really have no idea what you are talking about, but I detect that it is important,.
ZA

Zeev Atlas

unread,
Dec 30, 2014, 8:13:09 PM12/30/14
to marpa-...@googlegroups.com
The "bail out" method seems to be of interest, but it is scarcely documented.  Is there any better documentation.  When is it issued?  Could the parsing continue after that method from a different point?  I have no problem to program it as elaborately as needed, but I have no idea what could it do besides of spitting a message. 

Thank you
ZA

Ron Savage

unread,
Dec 30, 2014, 9:16:03 PM12/30/14
to marpa-...@googlegroups.com
OK. But to clarify: My suggestion about pre-process was just to let you fiddle the input string to discard 'TOP\s+\d+\s+' before passing the resulting string to Marpa.

Jeffrey Kegler

unread,
Dec 30, 2014, 10:33:39 PM12/30/14
to Marpa Parser Mailing LIst
In sending you the link, I noticed the documentation is insufficient, and fixed that.  It may take a while to make its way into a release, but here it is on Github.  The "bail out" is what it sounds like, it occurs right in the middle of one of your actions, and as soon as you call it, the parse is over -- it's totally destructive.

Your other suggestions are possible, but require some advanced trickery, so this may be something you wish to come back to later in your researches.

Zeev Atlas

unread,
Dec 31, 2014, 7:37:10 AM12/31/14
to marpa-...@googlegroups.com
As I read more and begin to grasp the complexity of Marpa solution, and the desire to work all issues from within the context of grammar, actions (evemts, etc.) I can see why you guys want to insist on using rules and only rules, whether positive or negative, to parse the subject. Let me please givee you a counter argument:
I am not a native English speaker. When I learned the language, till today, when reading and parsing text, i may and do encounter words that I do not know. Instead of going to the dictionary, I encapsulate such words and substitite them with either a context based approximation or with null and try to continue parsing. Rarely, indeed very rarely, I cannot proceed.
What I would want is something similar, in which Marpa would tell me that it have encountered such an element and allow me to advise it to substitute it with something else or nullify it and continue.
I do not know how hard is that to implement, but I suspect that it should not be that hard. And the practical benefits would be enormous
ZA

Durand Jean-Damien

unread,
Dec 31, 2014, 12:02:28 PM12/31/14
to marpa-...@googlegroups.com


Le mardi 30 décembre 2014 23:32:26 UTC+1, Ron Savage a écrit :
o Change the grammar.

If choosing the 2nd, I'd subclass MarpaX::Languages::SQL2003::AST and commit myself to adding as much MS-specific code to the grammar as needed to make the project work. The advantage is that you can get help here on the syntax for such grammar extensions. And it's immeasurably better if 1 module does this than if everyone tries to do it in parallel, since we all then have access to a new custom module.

This is a bit like the C grammar. By itself, it will never parse a preprocessed output of GNU CC nor MS cl. So extensions are added explicitely in the grammar.
Extending and/or subclassing the grammar is one nice way, 'error rules' is another very nice way -; 

Jeffrey Kegler

unread,
Dec 31, 2014, 12:24:58 PM12/31/14
to Marpa Parser Mailing LIst
Zeev -- the sort of thing you suggest can be done, in various ways.  You can pause the parse, switch to manual parsing, and resume the input at an input location of your choice.

As an example of this sort of thing, I recently did my own example of delimiter handling which, when it encounters a missing delimiter, supplies it.

The reason I think you see the others (and myself) preferring a direct, rule-based, approach is that it makes for simpler, faster and more maintainable code, when it is possible.  And what with the wide variety of grammars that Marpa can parse, plus various tricks such as ranking rules, it often is possible.

ZA

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.

Zeev Atlas

unread,
Dec 31, 2014, 1:30:58 PM12/31/14
to marpa-...@googlegroups.com
I agree with you guys and would prefer a rule based solution.  I am not sure what would be the differences among the various ideas that have been explored here, but obviously (in layman terminology) the preferred solution would be to have the Microsoft (SQL Server), Oracle, MySQL and all other vendors' extensions present, preferably in their own subsections (I assume that this is what you guys mean by sub-grammars). 
As I've stated in this and past conversations, there are two obstacles:
1. Such grammars are not readily published/available (and perhaps are even kept quasi-secret or intentionally obfuscated), so they should either be developed or somehow extracted from obfuscated documents.  And this bring me to my second, clearly stated, disadvantage, namely
2. I do not, even if I managed to extract such grammars, posses the knowledge of how to put them in correct BNF format and how to add them, sub-grammars or otherwise to the existing ones.
Again, I added it to my list to study that stuff, but the learning curve would be pretty stiff.  Also, I did not yet have managed to find any place where Microsoft clearly publishes their extensions.  Preliminary evaluation of Oracle publications showed me that extracting their rules and discern their extensions would be a project unto itself.  And I did not even try to look at MySQL and others.

For the current project in its current state, I will certainly look at Jeffrey's code and see how to adapt it

Thank you and have a Happy New Year
ZA

Zeev Atlas

unread,
Jan 8, 2015, 2:25:25 PM1/8/15
to marpa-...@googlegroups.com
The most important case that I have in mind is the SQL Server 'SELECT TOP 5 col1, col2, ...'.  Wouldn't the Pre-lexeme event be better then the Rejection in that particular case?  I am reading the documentation as is written now [http://search.cpan.org/~jkegl/Marpa-R2-2.101_000/pod/Event.pod#The_life_cycle_of_events].  The events and their use is not always clear from the documentation.  You've mentioned that what I want to achieve could be done in various ways and I assume that you meant by judicious use of the events.  I need a bit of help here, like better, less formal and more example oriented explanation of the events.
Thank you
ZA

Jeffrey Kegler

unread,
Jan 8, 2015, 2:28:54 PM1/8/15
to Marpa Parser Mailing LIst
You might look at Ron Savage's recent code and writings -- his new modules are heavily event-driven, and he's written an article on one of them: http://savage.net.au/Ron/html/Fancy.Matching.of.Delimited.Text.html

Durand Jean-Damien

unread,
Jan 8, 2015, 2:34:47 PM1/8/15
to marpa-...@googlegroups.com
Zeev,

... wil that be ok for you ?

Durand Jean-Damien

unread,
Jan 8, 2015, 3:14:52 PM1/8/15
to marpa-...@googlegroups.com
Oups... copy/paste problems, sorry.




  • For the particular "TOP 5" case, this is just a grammar extension like GNU or MS did to the C language.

Ron Savage

unread,
Jan 8, 2015, 6:14:19 PM1/8/15
to marpa-...@googlegroups.com

Zeev Atlas

unread,
Jan 9, 2015, 7:56:13 AM1/9/15
to marpa-...@googlegroups.com
I will look at all the suggested article on both subjects
Thanks

Zeev Atlas

unread,
Jan 9, 2015, 12:51:06 PM1/9/15
to marpa-...@googlegroups.com
Looked at the IRC comments.  My gut feeling is that there is no need to mess with the grammar.  A well defined 'discard event should suffice and is probably not so hard to implement.  I do not believe that many people would want more that that, at least in the short run... Usually (real life from my experience), comments are used for documentation or commenting out unused pieces of code, so they do not really need to be examined, Maybe just spitted out as, well, comments.  In cases where comments are used as markers for pre-processor or post-processor or whatever, you probably want to step aside of the grammar anyway in order to deal with that pre-processor.
My intended use was a very specialized case of parsing Jet4 (A.K.A MS-Access) queries.  There is no CREATE VIEW clause in that language, so I thought about marking the name of the query in some comment and use that in building a schema like analysis.  Please do not tell me to add CREATE VIEW artificially, I want to intervene as less as possible with the original codebase.  Adding a standard comment:
-- QUERY NAME qryXYZ
at the beginning of the source code is the least intrusive thing I could think about.
ZA
Reply all
Reply to author
Forward
0 new messages