On Tuesday, April 23, 2019 at 7:39:02 AM UTC-5, Thiago Adams wrote:
> On Monday, April 22, 2019 at 10:44:35 PM UTC-3, luser droog wrote:
> > It has been alleged that I keep rambling on without actually explaining
> > what I'm talking about in these threads. And I balked at the suggestion
> > when it happened partly due to how keenly it cut to the truth. So herein
> > I will attempt to 'splain.
> >
<snip 'splaining>
> > This concludes the basic introduction to Parser Combinators.
> >
> > Summary: They are parsers, functions that can consume input and
> > perform pattern matching and return success or failure. They are
> > combinators, functions that receive other functions as arguments
> > and produce functions as results.
> >
> >
> > Any questions?
>
> - How to you insert actions, or in other words, how to add
> the code that must be executed when you parse something.
Good question. It was dealing with this very issue that led
to my first post in this series, "Propagating results from the
callbacks". The simple answer is "callbacks". You add another
function pointer in the parser structure and call it from the
handler function.
But, the solution I came to prefer once I began to understand it
is by using 'bind()', the Monadic sequencing combinator. Bind has
extra powers that become useful when the parsers are enhanced
with a few extra features I haven't mentioned yet.
An important aspect I ought to have explained earlier is how
the Parser itself works. The parser accepts input and produces
output. All I said earlier was that this output indicates
success or failure, somehow. My earliest version defined the
parser signature like this:
int parse( char *input, parser p );
So, the input can be just a pointer into the string (initially
the start of the string, but the parser can advance the pointer
when matching sequences). The output here was the length that
was matched. This appeared to me at the time to be the simplest
way to get a minimally-useful behavior. But this arrangement
is not flexibly enough. Particularly if the parser graph has
alternates, there may be more than one possible way to parse
the input. So we need a vector or tuple or list of results.
In the PC literature, this idea was heralded by a paper entitled,
"Replacing failure with a list of successes".
But when I rewrote everything to return lists of matched lengths,
it still wasn't flexible enough. The handler for the callbacks had
lots of extra work to do to extract strings for the callbacks to use.
So, the next refinement that I learned and applied was having the
parser return the matched input chars themselves.
With this arrangement, the parser signature is like this:
result_list parse( parser p, char *input );
The result_list can be NULL which indicates failure, or it can be
a pointer to the start of a linked list of result records.
... probably skipping a few steps here ... (questions welcome)
The magical answer is to use 'bind()' to attach callbacks and
also for connecting parsers together.
Bind attaches your callback to the parser but also handles
the possible multiple results from the parser. The callback
is called by doing a map over the list of results. If there
are 3 results, the callback gets called 3 times, getting one
result at a time. If there are zero results, the map doesn't
call the callback at all.
In my current code, the callback has to have the form of an
"operator function" as I define them:
typedef object fOperator( void *, object );
The 'void *' supplies any context the function needs (it was
bundled up with the function in an 'oper' structure). The 'object'
argument is (single) result from the (child) parser, and the function
must return an 'object' which will be (one of the) return value(s)
from the (compound or composed) parser which 'bind' returns.
For a very short example of a callback, this is a function from
my regular expression parser/constructor.
static parser on_chr( void *v, object o ){ return lit( o ); }
This is the callback or handler for regular characters found in
the regular expression. For each one, we pass the char (the 'object
o') to the 'lit()' constructor which creates a parser to match
a single character.
The callback gets installed or registered by calling 'bind', or
a simpler front-end to it I call 'using'.
parser using( parser p, fOperator *f );
In regex builder, the call looks like this:
parser chr_ = using( noneof( "*+?.|()" ), on_chr );
> - I believe you think that the main advantage is to be able
> to use some descriptive language to implement the parser.
> How do you compare with other alternatives that also have
> descriptive ways to create parsers like parser generators?
That would probably be a more productive way to go about making
a product quickly. But that's not my primary goal. I /am/ working
towards a result, but I'm also attempting to reinvent the world
and learn some fun stuff in the process.
But the downside of external tools is learning the tool itself,
and depending upon it. Having everything /in/ the language reduces
external dependencies (and the potential to become enslaved by
their support schemes).
> - How many tokens ahead do you need/use?
> Can we say that you have something like LL(1) parser?
Hmm. I haven't researched the definition of LL(1) recently but
with the input coming from a lazy list, parsers could easily
look ahead as far as they wish. It would affect the behavior
of other parsers in the graph. Again, without having checked my
definitions, I think the behavior I have implemented is more
like LL(0). Parsers grab at most one item off of the input list
and then do their work with it.
> - Do you have any impact on performance that you would had if
> doing my hand?
Yes, most certainly. My parsers are examining every possible
path through the graph. I believe it's somewhat like a large
Prolog program with no *cuts*. Everything backtracks all over
its own toes to deliver every result at every stage.
Handwritten code would likely begin several orders of magnitude
more efficient even with no special optimizations by the programmer.
But I think the code wouldn't look as nice. :)
[snip C++ comments]
C++ frightens me. I got burned once by STL code that wasn't
working and we couldn't figure it out.
>
> My comments:
>
> I believe this style can be more useful when the parser action is
> pre-defined. For instance, to create a regex. The action regex
> does is always the same that is create a parsing for searching.
>
> This style also can be applied not only for parsers but for building
> search algorithms.
Yes, indeed Eric Meijer was one of the authors of Hutton/Meijer
was my latest, best guide to parser combinators. He was involved
in designing and building LINQ.
It's Monads.
Sadly yes, that is my experience. But there's got to be a way to speed
them up and cut the fat.