Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parser Combinators

603 views
Skip to first unread message

luser droog

unread,
May 30, 2017, 3:35:08 AM5/30/17
to
cf. https://en.wikipedia.org/wiki/Parser_combinator
& https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion

An exercise in using the Parser-Combinator idea in C. Comments welcome.


$ cat pc.c
#include <stdio.h>
#include <stdlib.h>


enum parser_type { TERM, ALT, SEQ };
typedef struct parser *parser;
struct parser {
enum parser_type type;
int c;
parser a, b;
};


parser term( int c );
parser alt( parser a, parser b );
parser seq( parser a, parser b );
int parse( char *input, parser p );

parser
term( int c ){
parser p = malloc( sizeof *p );
p->type = TERM;
p->c = c;
return p;
}

int parse_term( char *input, parser p ){
printf( "%c?\n", p->c );
return *input == p->c;
}

parser
alt( parser a, parser b ){
parser p = malloc( sizeof *p );
p->type = ALT;
p->a = a;
p->b = b;
return p;
}

int parse_alt( char *input, parser p ){
return parse( input, p->a ) || parse( input, p->b );
}

parser
seq( parser a, parser b ){
parser p = malloc( sizeof *p );
p->type = SEQ;
p->a = a;
p->b = b;
return p;
}

int parse_seq( char *input, parser p ){
return parse( input, p->a ) ? parse( input + 1, p->b ) : 0;
}

int
parse( char *input, parser p ){
switch( p->type ){
case TERM: return parse_term( input, p );
case ALT: return parse_alt( input, p );
case SEQ: return parse_seq( input, p );
}
}

int
main( void ){
parser p = seq( alt( term( 'H' ), term( 'h' ) ),
seq( term( 'e' ),
seq( term( 'l' ),
seq( term( 'l' ),
term( 'o' ) ) ) ) );
printf("%d\n", parse( "hello", p ) );
printf("%d\n", parse( "Hello", p ) );
printf("%d\n", parse( "YaMaMa", p ) );
return 0;
}

$ make pc && ./pc
cc pc.c -o pc
H?
h?
e?
l?
l?
o?
1
H?
e?
l?
l?
o?
1
H?
h?
0

luser droog

unread,
Jun 4, 2017, 10:34:22 PM6/4/17
to
On Tuesday, May 30, 2017 at 2:35:08 AM UTC-5, luser droog wrote:
> cf. https://en.wikipedia.org/wiki/Parser_combinator
> & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
>
> An exercise in using the Parser-Combinator idea in C. Comments welcome.
>
> $ make pc && ./pc
> cc pc.c -o pc
> H?
> h?
> e?
> l?
> l?
> o?


Developed further on the postscript side, but no idea
how to translate to C. Too lispy, perhaps.


Ben Bacarisse

unread,
Jun 5, 2017, 7:46:43 AM6/5/17
to
I was going to comment but I had no helpful suggestions on first
reading. I could not see how what you were doing related to combinators
but then I suppose that was the question -- how to do it in C.

C lacks the key components to write a parser in that style, so there
would be so much supporting gubbins that I doubt it would be worth it.
Your post made me want to have a go, but never got round to it.

--
Ben.

bartc

unread,
Jun 5, 2017, 10:06:08 AM6/5/17
to
On 30/05/2017 08:34, luser droog wrote:
> cf. https://en.wikipedia.org/wiki/Parser_combinator
> & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
>
> An exercise in using the Parser-Combinator idea in C. Comments welcome.

> enum parser_type { TERM, ALT, SEQ };
> typedef struct parser *parser;
> struct parser {
> enum parser_type type;
> int c;
> parser a, b;
> };
>
>
> parser term( int c );

> int
> parse( char *input, parser p ){
> switch( p->type ){
> case TERM: return parse_term( input, p );
> case ALT: return parse_alt( input, p );
> case SEQ: return parse_seq( input, p );
> }
> }

No idea about parser-combinators. But the pattern above calls out for a
function pointer in place of p->type, then it becomes:

return (p->type)(input, p);

However, gcc couldn't optimise that as well so was slower (when you
parse the same thing at least 10M times - without the print %c? line).

(Another issue, when you convert to a language that is case-insensitive,
is that TERM and term clash. But that's the usual C thing. Even looking
at the C, there was a bit of confusion as to what 'parser' was, as it's
not just a typedef version of 'struct parser' but a pointer to it.)

--
bartc

luser droog

unread,
Jun 5, 2017, 10:55:19 AM6/5/17
to
I think I agree. To to this with C, I think it'll need to be 2 stages.
Generating C-code from the first stage, then compiling and executing
it in the second stage.

That, or more of this sort of 'simulation' style. Maybe the generator
should be a different language, and just produce a parser in C code.

--
good word: gubbins

luser droog

unread,
Jun 5, 2017, 11:00:36 AM6/5/17
to
On Monday, June 5, 2017 at 9:06:08 AM UTC-5, Bart wrote:
> On 30/05/2017 08:34, luser droog wrote:
> > cf. https://en.wikipedia.org/wiki/Parser_combinator
> > & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
> >
> > An exercise in using the Parser-Combinator idea in C. Comments welcome.
>
> > enum parser_type { TERM, ALT, SEQ };
> > typedef struct parser *parser;
> > struct parser {
> > enum parser_type type;
> > int c;
> > parser a, b;
> > };
> >
> >
> > parser term( int c );
>
> > int
> > parse( char *input, parser p ){
> > switch( p->type ){
> > case TERM: return parse_term( input, p );
> > case ALT: return parse_alt( input, p );
> > case SEQ: return parse_seq( input, p );
> > }
> > }
>
> No idea about parser-combinators. But the pattern above calls out for a
> function pointer in place of p->type, then it becomes:
>
> return (p->type)(input, p);
>
> However, gcc couldn't optimise that as well so was slower (when you
> parse the same thing at least 10M times - without the print %c? line).

Thanks. I should have thought of that. The function pointer is nicer,
with less code. Not too concerned about efficiency at this point.

> (Another issue, when you convert to a language that is case-insensitive,
> is that TERM and term clash. But that's the usual C thing. Even looking
> at the C, there was a bit of confusion as to what 'parser' was, as it's
> not just a typedef version of 'struct parser' but a pointer to it.)
>

I'll apologize for the typedef'd pointer, but I'm not likely to stop.
'struct parser' is a struct, and 'parser' is the pointer to it. I know
many programmers do not like this. But I do. I try not to "hide it" by
showing the typedef right at the top.

luser droog

unread,
Jun 7, 2017, 12:55:02 AM6/7/17
to
On Monday, June 5, 2017 at 10:00:36 AM UTC-5, luser droog wrote:
> On Monday, June 5, 2017 at 9:06:08 AM UTC-5, Bart wrote:
> > On 30/05/2017 08:34, luser droog wrote:
> > > cf. https://en.wikipedia.org/wiki/Parser_combinator
> > > & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
> > >
> > > An exercise in using the Parser-Combinator idea in C. Comments welcome.

> > No idea about parser-combinators. But the pattern above calls out for a
> > function pointer in place of p->type, then it becomes:
> >
> > return (p->type)(input, p);
> >
> > However, gcc couldn't optimise that as well so was slower (when you
> > parse the same thing at least 10M times - without the print %c? line).
>
> Thanks. I should have thought of that. The function pointer is nicer,
> with less code. Not too concerned about efficiency at this point.


Rewritten more tersely. Same functionality but it fits on one screen now.

Norah@laptop ~
$ cat pc2.c
#include <stdio.h>
#include <stdlib.h>

typedef struct parser *parser;
typedef int (*parser_action)( parser p, char *input );

struct parser {
parser_action test;
int c;
parser a,b;
};


int parse( parser p, char *input ){
return p->test( p, input );
}


int parse_term( parser p, char *input ){
printf( "%c", p->c );
return *input == p->c;
}

int parse_alt( parser p, char *input ){
// printf(" ALT\n");
return parse( p->a, input ) || parse( p->b, input );
}

int parse_seq( parser p, char *input ){
// printf(" SEQ\n");
return parse( p->a, input ) ? parse( p->b, input + 1 ) : 0;
}


parser term( int c ){
parser p = malloc( sizeof *p );
return p? (*p = (struct parser){ .test = parse_term, .c = c }), p : p;
}

parser alt( parser a, parser b ){
parser p = malloc( sizeof *p );
return p? (*p = (struct parser){ .test = parse_alt, .a = a, .b = b }), p : p;
}

parser seq( parser a, parser b){
parser p = malloc( sizeof *p );
return p? (*p = (struct parser){ .test = parse_seq, .a = a, .b = b }), p : p;
}



int main( void ){
parser p = seq( alt( term( 'H' ), term( 'h' ) ),
seq( term( 'e' ),
seq( term( 'l' ),
seq( term( 'l' ),
term( 'o' ) ) ) ) );
printf("%d\n", p->test( p, "hello" ) );
printf("%d\n", p->test( p, "Hello" ) );
printf("%d\n", p->test( p, "YaMaMa" ) );
return 0;
}


Norah@laptop ~
$ make pc2 CFLAGS+='-std=c99 -g'
cc -std=c99 -g pc2.c -o pc2

Norah@laptop ~
$ ./pc2
Hhello1
Hello1
Hh0

luser droog

unread,
Jun 7, 2017, 1:32:20 AM6/7/17
to
On Tuesday, June 6, 2017 at 11:55:02 PM UTC-5, luser droog wrote:
> On Monday, June 5, 2017 at 10:00:36 AM UTC-5, luser droog wrote:
> > On Monday, June 5, 2017 at 9:06:08 AM UTC-5, Bart wrote:
> > > On 30/05/2017 08:34, luser droog wrote:
> > > > cf. https://en.wikipedia.org/wiki/Parser_combinator
> > > > & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
> > > >
> > > > An exercise in using the Parser-Combinator idea in C. Comments welcome.
>
> > > No idea about parser-combinators. But the pattern above calls out for a
> > > function pointer in place of p->type, then it becomes:
> > >
> > > return (p->type)(input, p);
> > >
> > > However, gcc couldn't optimise that as well so was slower (when you
> > > parse the same thing at least 10M times - without the print %c? line).
> >
> > Thanks. I should have thought of that. The function pointer is nicer,
> > with less code. Not too concerned about efficiency at this point.
>
>
> Rewritten more tersely. Same functionality but it fits on one screen now.
>

Slightly improved. New convenience function 'char_class( char * )'.
Norah is my computer's name.

Norah@laptop ~
$ cat pc2.c
#include <stdio.h>
#include <stdlib.h>

typedef struct parser *parser;
typedef int (*parser_handler)( parser p, char *input );

struct parser {
parser_handler test;
parser a, b;
int c;
};

int debug_parse_term = 1;


int parse( parser p, char *input ){ return p->test( p, input ); }

int parse_fail ( parser p, char *input ){ return 0; }
int parse_succeed( parser p, char *input ){ return 1; }

int parse_term( parser p, char *input ){
return debug_parse_term && printf( "%c", p->c ),
*input == p->c;
}
int parse_alt ( parser p, char *input ){
return parse( p->a, input ) || parse( p->b, input );
}
int parse_seq ( parser p, char *input ){
return parse( p->a, input ) ? parse( p->b, input + 1 ) : 0;
}


parser new_parser( struct parser ps ){
parser p = malloc( sizeof *p );
return p ? ( *p = ps ), p : p;
}

parser fail (){
return new_parser( (struct parser){ .test = parse_fail } );
}
parser succeed(){
return new_parser( (struct parser){ .test = parse_succeed } );
}

parser term( int c ){
return new_parser( (struct parser){ .test = parse_term, .c = c } );
}
parser alt ( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_alt, .a = a, .b = b } );
}
parser seq ( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_seq, .a = a, .b = b } );
}

parser char_class( char *str ){
return *str ? alt( term( *str ), char_class( str + 1 ) )
: fail();
}


int main( void ){
parser p = seq( char_class( "Hh" ),
seq( term( 'e' ),
seq( term( 'l' ),
seq( term( 'l' ),
term( 'o' ) ) ) ) );
printf("%d\n", parse( p, "hello" ) );
printf("%d\n", parse( p, "Hello" ) );
printf("%d\n", parse( p, "YaMaMa" ) );

luser droog

unread,
Jun 7, 2017, 2:03:38 AM6/7/17
to
On Wednesday, June 7, 2017 at 12:32:20 AM UTC-5, luser droog wrote:
> On Tuesday, June 6, 2017 at 11:55:02 PM UTC-5, luser droog wrote:
> > On Monday, June 5, 2017 at 10:00:36 AM UTC-5, luser droog wrote:
> > > On Monday, June 5, 2017 at 9:06:08 AM UTC-5, Bart wrote:
> > > > On 30/05/2017 08:34, luser droog wrote:
> > > > > cf. https://en.wikipedia.org/wiki/Parser_combinator
> > > > > & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
> > > > >
> > > > > An exercise in using the Parser-Combinator idea in C. Comments welcome.
> >
> > > > No idea about parser-combinators. But the pattern above calls out for a
> > > > function pointer in place of p->type, then it becomes:
> > > >
> > > > return (p->type)(input, p);
> > > >
> > > > However, gcc couldn't optimise that as well so was slower (when you
> > > > parse the same thing at least 10M times - without the print %c? line).
> > >
> > > Thanks. I should have thought of that. The function pointer is nicer,
> > > with less code. Not too concerned about efficiency at this point.
> >
> >
> > Rewritten more tersely. Same functionality but it fits on one screen now.
> >
>
> Slightly improved. New convenience function 'char_class( char * )'.
> Norah is my computer's name.
>

Even better. Usage is now pretty simple.

I thought I would need variadic functions and/or macros. I even
copied and #included ppnarg.h. But then string() just wrote itself.
I imagine it could be done with a macro that builds a compound
literal array which can then be counted and a function can work
with a pointer and a count pretty easily.

Next feature it needs is attaching callbacks or returning substrings
or something. Building up to something that can substitute for simple
scanf() uses, perhaps in conjunction with strtol() and strtod().

Perhaps, after very much development, it could even serve as an extensible
scanf() analogous to Tim Rentsch's printx(). maybe? possibly? ... putting
down the pipe now.... :)


Norah@laptop ~
$ make pc2 CFLAGS+='-std=c99 -g'
cc -std=c99 -g pc2.c -o pc2

Norah@laptop ~
$ ./pc2
Hhello1
Hello1
Hh0

parser alternate( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_alt, .a = a, .b = b } );
}
parser sequence ( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_seq, .a = a, .b = b } );
}


parser char_class( char *str ){
return *str ? alternate( term( *str ), char_class( str + 1 ) )
: fail();
}

parser string( char *str ){
return *str ? sequence( term(*str), string( str + 1 ) )
: succeed();
}


int main( void ){
parser p = sequence( char_class( "Hh" ), string( "ello" ) );

luser droog

unread,
Jun 7, 2017, 2:25:05 AM6/7/17
to
On Wednesday, June 7, 2017 at 1:03:38 AM UTC-5, luser droog wrote:
> On Wednesday, June 7, 2017 at 12:32:20 AM UTC-5, luser droog wrote:
> > On Tuesday, June 6, 2017 at 11:55:02 PM UTC-5, luser droog wrote:
> > > On Monday, June 5, 2017 at 10:00:36 AM UTC-5, luser droog wrote:
> > > > On Monday, June 5, 2017 at 9:06:08 AM UTC-5, Bart wrote:
> > > > > On 30/05/2017 08:34, luser droog wrote:
> > > > > > cf. https://en.wikipedia.org/wiki/Parser_combinator
> > > > > > & https://groups.google.com/d/topic/comp.lang.postscript/KiEFeJROOcM/discussion
> > > > > >
> > > > > > An exercise in using the Parser-Combinator idea in C. Comments welcome.
> > >

> Even better. Usage is now pretty simple.
>
> I thought I would need variadic functions and/or macros. I even
> copied and #included ppnarg.h. But then string() just wrote itself.
> I imagine it could be done with a macro that builds a compound
> literal array which can then be counted and a function can work
> with a pointer and a count pretty easily.
>
> Next feature it needs is attaching callbacks or returning substrings
> or something. Building up to something that can substitute for simple
> scanf() uses, perhaps in conjunction with strtol() and strtod().
>

Now this part I've actually implemented over in the
postscript thread. I won't bore you guys with the
gory details, but just the usage section, and then
the output which shows a dump of the number parser.
It generates code. The parser is built dynamically
like it was lisp or something. So I keep bragging
to my rubber ducks...

The usage reads almost like a grammar already, and
has the side effect of parsing the input according
to that description. And optionally, it could convert
the substrings with cvi (ps equivalent of strtol)
or cvn (converts a string to an interned name object,
ie. an atom or identifier).

Here one defines a parser-action pair in a style similar
to defining a regular postscript function which looks like

/funcname {
code for function
} def

A parser-action looks like

/parsername parser-description
{
parser action code
} def

Then the parser may be invoked or called upon a string and index

(my string to be parsed) 0 parsername

or to peek at its code and then execute, I sometimes do this

(my string to be parsed) 0 //parsername == exec

And I'm probably boring anyone still here at this point, so.
Paste.


/digit << (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) >> def
/alpha << (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m)
(n) (o) (p) (q) (r) (s) (t) (u) (v) (w) (x) (y) (z)
(A) (B) (C) (D) (E) (F) (G) (H) (I) (J) (K) (L) (M)
(N) (O) (P) (Q) (R) (S) (T) (U) (V) (W) (X) (Y) (Z) >> def
/medial <<//digit //alpha>> def
/opt-space <<( ) true>>
{ % (string) i (str)# (str)
_
} build-parser-action def

/number [ %//opt-space
//digit << //digit true >> ]
{ % (string) i (str)
%cvi = / = % (string) i str#
4 1 roll
} build-parser-action def

//number ==

/id [ %//opt-space
//alpha //alpha //alpha ]
{
%cvn = / =
4 1 roll
} build-parser-action def


(22Foo47) [ zy 0 opt-space number opt-space id opt-space number _ _ ] ==
( 22 Foo47) [ zy 0 opt-space number opt-space id opt-space number _ _ ] ==
(22 Foo 47 ) [ zy 0 opt-space number opt-space id opt-space number _ _ ] ==

quit




Norah@laptop ~
$ gsnd -q pc2.ps
{++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ (0) test 0= {_ (1) test} {ret} ? 0= {_ (2) test} {ret} ? 0= {_ (3) test} {ret} ? 0= {_ (4) test} {ret} ? 0= {_ (5) test} {ret} ? 0= {_ (6) test} {ret} ? 0= {_ (7) test} {ret} ? 0= {_ (8) test} {ret} ? 0= {_ (9) test} {ret} ? 0^ {mov ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ (0) test 0= {_ (1) test} {ret} ? 0= {_ (2) test} {ret} ? 0= {_ (3) test} {ret} ? 0= {_ (4) test} {ret} ? 0= {_ (5) test} {ret} ? 0= {_ (6) test} {ret} ? 0= {_ (7) test} {ret} ? 0= {_ (8) test} {ret} ? 0= {_ (9) test} {ret} ? 0= {_ empty} {ret} ?} {fail} ? 0^ {{} @ ... z # zy 4 1 roll add y # y sub ... 0} {/parser-fail == ps} ?}
[(22) (Foo) (47)]
[(22) (Foo) (47)]
[(22) (Foo) (47)]

Norah@laptop ~
$

bartc

unread,
Jun 7, 2017, 7:22:40 AM6/7/17
to
On 07/06/2017 06:32, luser droog wrote:

> parser new_parser( struct parser ps ){
> parser p = malloc( sizeof *p );
> return p ? ( *p = ps ), p : p;
> }
>
> parser fail (){
> return new_parser( (struct parser){ .test = parse_fail } );
> }
> parser succeed(){
> return new_parser( (struct parser){ .test = parse_succeed } );
> }
>
> parser term( int c ){
> return new_parser( (struct parser){ .test = parse_term, .c = c } );
> }
> parser alt ( parser a, parser b ){
> return new_parser( (struct parser){ .test = parse_alt, .a = a, .b = b } );
> }
> parser seq ( parser a, parser b ){
> return new_parser( (struct parser){ .test = parse_seq, .a = a, .b = b } );
> }

I don't know what it is about compound literals, but they always make
things look a lot more cluttered.

Also, the compound literal goes to the trouble of passing the struct by
value, but the called function doesn't do anything with it other than
copy its contents. It might as well pass a pointer.

I prefer this:

parser new_parser(parser_handler fn, parser a, parser b, int c){
parser p = malloc( sizeof *p );
if (p==NULL) return NULL;

p->test = fn;
p->a = a;
p->b = b;
p->c = c;
return p;
}

parser fail (){
return new_parser(parse_fail, 0,0,0);
}
parser succeed(){
return new_parser(parse_succeed, 0,0,0);
}
parser term( int c ){
return new_parser(parse_term, 0,0, c);
}
parser alt ( parser a, parser b ){
return new_parser(parse_alt, a,b, 0);
}
parser seq ( parser a, parser b ){
return new_parser(parse_seq, a,b, 0);
}

(And that malloc check is a waste of time as the returned value from
new_parser() is not checked. It might as well abort here. Or use a
wrapper for malloc which will abort. Then no checks are needed at all in
this code, as it can assume it succeeds.)

--
bartc

Melzzzzz

unread,
Jun 7, 2017, 8:43:32 AM6/7/17
to
Heh this is made for Haskell ;)
data Parser = Term Char | Alt Parser Parser | Seq Parser Parser

parse (Term c) (input:_) = c == input
parse (Alt a b) input = parse a input || parse b input
parse (Seq a b) inp@(_:rest) = if parse a inp then parse b rest else False

main = do
let p = Seq (Alt (Term 'H') (Term 'h'))
(Seq (Term 'e') (Seq (Term 'l') (Seq (Term 'l') (Term 'o'))))
print $ parse p "hello"
print $ parse p "Hello"
print $ parse p "YaMaMa"

[bmaxa@maxa-pc haskell]$ ghc -O2 combinator.hs
[1 of 1] Compiling Main ( combinator.hs, combinator.o )
Linking combinator ...
[bmaxa@maxa-pc haskell]$ ./combinator
True
True
False


--
press any key to continue or any other to quit...

bartc

unread,
Jun 7, 2017, 9:43:07 AM6/7/17
to
On 07/06/2017 13:43, Melzzzzz wrote:
> On 2017-06-07, luser droog <luser...@gmail.com> wrote:

>> typedef struct parser *parser;
>> typedef int (*parser_action)( parser p, char *input );
>>
>> struct parser {
>> parser_action test;
>> int c;
>> parser a,b;
>> };

...

> Heh this is made for Haskell ;)

Yeah, but... it's in Haskell.


> data Parser = Term Char | Alt Parser Parser | Seq Parser Parser
>
> parse (Term c) (input:_) = c == input
> parse (Alt a b) input = parse a input || parse b input
> parse (Seq a b) inp@(_:rest) = if parse a inp then parse b rest else False
>
> main = do
> let p = Seq (Alt (Term 'H') (Term 'h'))
> (Seq (Term 'e') (Seq (Term 'l') (Seq (Term 'l') (Term 'o'))))
> print $ parse p "hello"
> print $ parse p "Hello"
> print $ parse p "YaMaMa"
>
> [bmaxa@maxa-pc haskell]$ ghc -O2 combinator.hs
> [1 of 1] Compiling Main ( combinator.hs, combinator.o )
> Linking combinator ...
> [bmaxa@maxa-pc haskell]$ ./combinator
> True
> True
> False

It's quite short but, as it seems to be allowed to cram things onto one
line, I had a go using my own little language, below.

About 15 non-blank lines (ignoring the linebreaks) compared to Haskell's
10. But it's conventional code, and should be as straightforward to
follow as the original C, and easier to port to another language.

----------------------------------------
record rparser=(var p,a,b,c)

proc start=
p :=seq(alt(term("H"),term("h")), seq(term("e"),
seq(term("l"), seq(term("l"), term("o")))))

println parse("hello",p)
println parse("Hello",p)
println parse("YaMaMa",p)
end

function parse_term(input,p)= return input[1]=p.c end
function parse_alt (input,p)= return parse(input,p.a) or
parse(input,p.b) end
function parse_seq (input,p)= return
(parse(input,p.a)|parse(input[2..$],p.b)|0) end

function term(c)= return rparser(parse_term,0,0,c) end
function alt(a,b)= return rparser(parse_alt,a,b,0) end
function seq(a,b)= return rparser(parse_seq,a,b,0) end

function parse(input,p)= return p.p^(input,p) end
----------------------------------------

Probably, I could squeeze it down to 12, but that would be
counter-productive; the syntax is not designed to be compact in that
way, although it could be arranged.

--
bartc

Ben Bacarisse

unread,
Jun 7, 2017, 3:57:43 PM6/7/17
to
luser droog <luser...@gmail.com> writes:
<snip>
Try:

parser p = sequence( string( "ab" ), term( 'c' ) );
printf("%d\n", parse( p, "abc" ) );

> return 0;
> }

--
Ben.

Melzzzzz

unread,
Jun 7, 2017, 8:55:41 PM6/7/17
to
Heh, Haskell version:
data Parser = Term Char | Alt Parser Parser | Seq Parser Parser

parse (Term c) (input:rest) = (c == input,rest)
parse (Alt a b) input = let a' = parse a input
b' = parse b input
in if fst a' then a' else b'
parse (Seq a b) input = let a' = parse a input
b' = parse b (snd a')
in if fst a' then b' else a'

string (x:[]) = Term x
string (x:xs) = Seq (Term x) (string xs)

char_class (x:[]) = Term x
char_class (x:xs) = Alt (Term x) (char_class xs)

main = do
let p = Seq (Alt (Term 'H') (Term 'h'))
(Seq (Term 'e') (Seq (Term 'l') (Seq (Term 'l') (Term 'o'))))
let q = Seq (char_class "Hh") (string "ello")
let r = Seq (string "ab") (Term 'c')
print $ parse p "helloh"
print $ parse p "Hello"
print $ parse p "YaMaMa"
print $ parse q "hello"
print $ parse q "Hello"
print $ parse q "YaMaMa"
print $ parse r "abc"

[bmaxa@maxa-pc haskell]$ ./combinator
(True,"h")
(True,"")
(False,"aMaMa")
(True,"")
(True,"")
(False,"aMaMa")
(True,"")

luser droog

unread,
Jun 7, 2017, 11:18:08 PM6/7/17
to
On Wednesday, June 7, 2017 at 2:57:43 PM UTC-5, Ben Bacarisse wrote:
> luser droog <luser...@gmail.com> writes:
> <snip>
> Try:
>
> parser p = sequence( string( "ab" ), term( 'c' ) );
> printf("%d\n", parse( p, "abc" ) );
>
> > return 0;
> > }
>

Thanks. That's a bug. I think I've fixed it by change the return
values from 0/1 to -1/num-chars-matched. Then parse_sequence()
can add the length from the left piece before trying the right
piece.

Also added quantifiers and more testing.

Norah@laptop ~
$ cat pc2.c
#include <stdio.h>
#include <stdlib.h>

typedef struct parser *parser;
typedef int (*parser_handler)( parser p, char *input );

struct parser {
parser_handler test;
parser a, b;
int c;
};

int debug_parse_term = 1;


int parse( parser p, char *input ){ return p->test( p, input ); }

int parse_fail ( parser p, char *input ){ return -1; }
int parse_succeed( parser p, char *input ){ return 0; }

int parse_term( parser p, char *input ){
return debug_parse_term && printf( "%c%c? ", p->c, *input ),
*input == p->c ? 1 : -1;
}
int parse_alternate ( parser p, char *input ){
int i;
return ( i = parse( p->a, input ) ) >= 0 ? i
: parse( p->b, input );
}
int parse_sequence ( parser p, char *input ){
int i;
return ( i = parse( p->a, input ) ) >= 0 ? i + parse( p->b, input + i )
: -1;
}


parser new_parser( struct parser ps ){
parser p = malloc( sizeof *p );
return p ? ( *p = ps ), p : p;
}

parser fails (){ return new_parser( (struct parser){ .test = parse_fail } ); }
parser succeeds(){ return new_parser( (struct parser){ .test = parse_succeed } ); }

parser term( int c ){
return new_parser( (struct parser){ .test = parse_term, .c = c } );
}
parser alternate2( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_alternate, .a = a, .b = b } );
}
parser sequence2 ( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_sequence, .a = a, .b = b } );
}

parser many( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
q->b = q;
return new_parser( (struct parser){ .test = parse_alternate, .a = q, .b = succeeds() } );
}

parser some( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
q->b = q;
return new_parser( (struct parser){ .test = parse_alternate, .a = a, .b = q } );
}

parser n_or_more( int n, parser a ){
return n == 0 ? many( a ) :
n == 1 ? some( a ) :
sequence2( a, n_or_more( n - 1, a ) );
}

parser char_class( char *str ){
return *str ? alternate2( term( *str ), char_class( str + 1 ) )
: fails();
}

parser string( char *str ){
return *str ? sequence2( term(*str), string( str + 1 ) )
: succeeds();
}

parser sequencen( int n, parser *pp ){
return n ? sequence2( *pp, sequencen( n - 1, pp + 1 ) )
: succeeds();
}

parser alternaten( int n, parser *pp){
return n ? alternate2( *pp, alternaten( n - 1, pp + 1 ) )
: fails();
}


int test( parser p, char *str ){
printf( "%s:", str );
fflush(0);
return printf( "%d\n", parse( p, str ) );
}

int main( void ){
parser p = sequencen( 7, (parser[]){
char_class( "Hhj" ), string( "ello " ),
char_class( "Wwm" ), string( "o" ), char_class( "ru" ), string( "ld" ),
char_class("!.") });

test( p, "hello World." );
test( p, "jello mould!" );
test( p, "YaMaMa" );

parser q = many( term( 'x' ) );
test( q, "y" );
test( q, "xy" );
test( q, "xxy" );

parser r = some( term( 'x' ) );
test( r, "y" );
test( r, "xy" );
test( r, "xxy" );

parser s = n_or_more( 3, term( 'a' ) );
test( s, "aa" );
test( s, "aaa" );
test( s, "aaaa" );
test( s, "aaaaa" );

return 0;
}


Norah@laptop ~
$ make pc2 CFLAGS+='-std=c99 -g'
cc -std=c99 -g pc2.c -o pc2

Norah@laptop ~
$ ./pc2
hello World.:Hh? hh? ee? ll? ll? oo? ? WW? oo? rr? ll? dd? !.? ..? 12
jello mould!:Hj? hj? jj? ee? ll? ll? oo? ? Wm? wm? mm? oo? ru? uu? ll? dd? !!? 12
YaMaMa:HY? hY? jY? -1
y:xy? 0
xy:xx? xy? 0
xxy:xx? xx? xy? 1
y:xy? xy? -1
xy:xx? 1
xxy:xx? 1
aa:aa? aa? a? a? 1
aaa:aa? aa? aa? 3
aaaa:aa? aa? aa? 3
aaaaa:aa? aa? aa? 3

luser droog

unread,
Jun 11, 2017, 3:29:56 AM6/11/17
to
On Wednesday, June 7, 2017 at 10:18:08 PM UTC-5, luser droog wrote:
> On Wednesday, June 7, 2017 at 2:57:43 PM UTC-5, Ben Bacarisse wrote:
> > luser droog <luser...@gmail.com> writes:
> > <snip>
> > Try:
> >
> > parser p = sequence( string( "ab" ), term( 'c' ) );
> > printf("%d\n", parse( p, "abc" ) );
> >
> > > return 0;
> > > }
> >
>
> Thanks. That's a bug. I think I've fixed it by change the return
> values from 0/1 to -1/num-chars-matched. Then parse_sequence()
> can add the length from the left piece before trying the right
> piece.
>
> Also added quantifiers and more testing.

This should be my last post so I can focus on other stuff, but I
fixed the quantifiers. Didn't want to leave the thread dangling
with broken code.

The idea for the Kleene star quantifier combinator is that it's
a sequence

[ x ]

where x is the input parser. But it could be zero, so wrap the
sequence in an alternation with succeeds.

<< [ x ] true >>

Then stitch it up into a loop by putting the alternate inside
the sequence.

y = << [ x y ] true >>

This will match 0 or more of x and return the length matched.



Norah@laptop ~
$ cat pc2.c
#include <stdio.h>
#include <stdlib.h>

typedef struct parser *parser;
typedef int (*parser_handler)( parser p, char *input );

struct parser {
parser_handler test;
parser a, b;
int c;
};

int debug_parse_term = 1;


int parse( parser p, char *input ){ return p->test( p, input ); }

int parse_fail ( parser p, char *input ){ return -1; }
int parse_succeed( parser p, char *input ){ return 0; }

int parse_term( parser p, char *input ){
return debug_parse_term && printf( "%c%c? ", p->c, *input ),
*input == p->c ? 1 : -1;
}
int parse_alternate ( parser p, char *input ){
int i;
return ( i = parse( p->a, input ) ) >= 0 ? i
: parse( p->b, input );
}
int parse_sequence ( parser p, char *input ){
int i,j;
return ( i = parse( p->a, input ) ) >= 0 ?
( j = parse( p->b, input + i ) ) >= 0 ? i + j
: -1
: -1;
}


parser new_parser( struct parser ps ){
parser p = malloc( sizeof *p );
return p ? ( *p = ps ), p : p;
}

parser fails (){ return new_parser( (struct parser){ .test = parse_fail } ); }
parser succeeds(){ return new_parser( (struct parser){ .test = parse_succeed } ); }

parser term( int c ){
return new_parser( (struct parser){ .test = parse_term, .c = c } );
}
parser alternate2( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_alternate, .a = a, .b = b } );
}
parser sequence2 ( parser a, parser b ){
return new_parser( (struct parser){ .test = parse_sequence, .a = a, .b = b } );
}

parser many( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
parser r = new_parser( (struct parser){ .test = parse_alternate, .a = q, .b = succeeds() } );
q->b = r;
return r;
}

parser some( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
parser r = new_parser( (struct parser){ .test = parse_alternate, .a = q, .b = succeeds() } );
q->b = r;
return q;
parser number = some( char_class("0123456789") );
test( number, "0 f" );
test( number, "f 0" );
test( number, "123f" );
test( number, "9999aaa" );

number = sequence2( many( term( ' ' ) ), number );
test( number, " 0 f" );
test( number, " f 0" );
test( number, " 123f" );
test( number, " 9999aaa" );
return 0;
}

Norah@laptop ~
$ make pc2 CFLAGS+='-std=c99 -g'
cc -std=c99 -g pc2.c -o pc2

Norah@laptop ~
$ ./pc2
hello World.:Hh? hh? ee? ll? ll? oo? ? WW? oo? rr? ll? dd? !.? ..? 12
jello mould!:Hj? hj? jj? ee? ll? ll? oo? ? Wm? wm? mm? oo? ru? uu? ll? dd? !!? 12
YaMaMa:HY? hY? jY? -1
y:xy? 0
xy:xx? xy? 1
xxy:xx? xx? xy? 2
y:xy? -1
xy:xx? xy? 1
xxy:xx? xx? xy? 2
aa:aa? aa? a? -1
aaa:aa? aa? aa? a? 3
aaaa:aa? aa? aa? aa? a? 4
aaaaa:aa? aa? aa? aa? aa? a? 5
0 f:00? 0 ? 1 ? 2 ? 3 ? 4 ? 5 ? 6 ? 7 ? 8 ? 9 ? 1
f 0:0f? 1f? 2f? 3f? 4f? 5f? 6f? 7f? 8f? 9f? -1
123f:01? 11? 02? 12? 22? 03? 13? 23? 33? 0f? 1f? 2f? 3f? 4f? 5f? 6f? 7f? 8f? 9f? 3
9999aaa:09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 0a? 1a? 2a? 3a? 4a? 5a? 6a? 7a? 8a? 9a? 4
0 f: ? 0? 00? 0 ? 1 ? 2 ? 3 ? 4 ? 5 ? 6 ? 7 ? 8 ? 9 ? 2
f 0: ? ? f? 0f? 1f? 2f? 3f? 4f? 5f? 6f? 7f? 8f? 9f? -1
123f: ? ? ? 1? 01? 11? 02? 12? 22? 03? 13? 23? 33? 0f? 1f? 2f? 3f? 4f? 5f? 6f? 7f? 8f? 9f? 6
9999aaa: ? ? ? ? 9? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 09? 19? 29? 39? 49? 59? 69? 79? 89? 99? 0a? 1a? 2a? 3a? 4a? 5a? 6a? 7a? 8a? 9a? 8

Norah@laptop ~
$

Ben Bacarisse

unread,
Jun 11, 2017, 7:11:19 AM6/11/17
to
luser droog <luser...@gmail.com> writes:
<snip>
> This should be my last post so I can focus on other stuff, but I
> fixed the quantifiers. Didn't want to leave the thread dangling
> with broken code.
>
> The idea for the Kleene star quantifier combinator is that it's
> a sequence
>
> [ x ]
>
> where x is the input parser. But it could be zero, so wrap the
> sequence in an alternation with succeeds.
>
> << [ x ] true >>
>
> Then stitch it up into a loop by putting the alternate inside
> the sequence.
>
> y = << [ x y ] true >>
>
> This will match 0 or more of x and return the length matched.

What about:

parser p = sequence2(some(term('x')), string("xy"));
test(p, "xxxxy");

?

--
Ben.

luser droog

unread,
Jun 12, 2017, 12:21:43 AM6/12/17
to
Touché. That doesn't work. The quantifiers I've supplied are strictly
greedy. I'm not sure how much work it would be to add backtracking
or some other strategy that would allow that parser to match. But OTOH
I don't immediately see the utility of it. I think I'll explicitly not
support that sort of usage for the time being.

The paper I'm following uses fp magic to return all possible matches
from a parser. http://eprints.nottingham.ac.uk/221/1/parsing.pdf

Ben Bacarisse

unread,
Jun 12, 2017, 7:25:10 AM6/12/17
to
luser droog <luser...@gmail.com> writes:

> On Sunday, June 11, 2017 at 6:11:19 AM UTC-5, Ben Bacarisse wrote:
>> luser droog <luser...@gmail.com> writes:
>> <snip>
>> > This should be my last post so I can focus on other stuff, but I
>> > fixed the quantifiers. Didn't want to leave the thread dangling
>> > with broken code.
>> >
>> > The idea for the Kleene star quantifier combinator is that it's
>> > a sequence
>> >
>> > [ x ]
>> >
>> > where x is the input parser. But it could be zero, so wrap the
>> > sequence in an alternation with succeeds.
>> >
>> > << [ x ] true >>
>> >
>> > Then stitch it up into a loop by putting the alternate inside
>> > the sequence.
>> >
>> > y = << [ x y ] true >>
>> >
>> > This will match 0 or more of x and return the length matched.
>>
>> What about:
>>
>> parser p = sequence2(some(term('x')), string("xy"));
>> test(p, "xxxxy");
>>
>> ?
>
> Touché. That doesn't work. The quantifiers I've supplied are strictly
> greedy.

Just a small point to avoid confusion: "strictly greedy" seems to be
your own term, at least I've not hear it before. The normal Kleene star
operator is greedy in the usual meaning of the term -- that it matches
the longest possible string /that still allows the whole pattern to
match/.

<snip>
--
Ben.

luser droog

unread,
Jun 12, 2017, 4:53:42 PM6/12/17
to
Thanks. A brief search suggests at least one person calls
this a "possessive" Kleene star.

https://cs.stackexchange.com/questions/40712/possessive-kleene-star-operator

luser droog

unread,
Jun 13, 2017, 7:44:07 AM6/13/17
to
Had some success on the postscript side. The missing ingredient is
having each parser return a list of results (lengths). So I'll probably
want to rewrite the whole thing with list management in mind.

But not right now.

luser droog

unread,
Jun 16, 2017, 5:22:15 AM6/16/17
to
Greedy, but not possessive.

Lists aren't so terrible, dealt with functionally.

Norah@laptop ~
$ cat pc3.c
#include <stdio.h>
#include <stdlib.h>

typedef struct parser *parser;
typedef struct result *result;
typedef result (*handler)( parser p, char *input );

struct parser {
handler test;
parser a, b;
int c;
};

struct result {
result next;
int length_matched;
};

#define new_function_for_(type) \
type new_##type ( struct type input ){ \
type local = calloc( sizeof *local, 1 ); \
return local ? ( *local = input ), local : local; \
}
new_function_for_(parser)
new_function_for_(result)


result parse( parser p, char *input ){ return p->test( p, input ); }
result parse_fail( parser p, char *input ){ return NULL; }
result parse_succeed( parser p, char *input ){
return new_result( (struct result){ .next = NULL, .length_matched = 0 } );
}
result parse_term( parser p, char *input ){
return *input == p->c ?
new_result( (struct result){ .next = NULL, .length_matched = 1 } ) :
NULL;
}

result parse_alternate( parser p, char *input ){
result res = parse( p->a, input );
if( res ){
result r = res;
while( r->next ) r = r->next;
r->next = parse( p->b, input );
return res;
} else return parse( p->b, input );
}

void amend_lengths( result r, int length ){
if( r ) r->length_matched += length;
if( r->next ) amend_lengths( r->next, length );
}

result parse_sequence_next( result r, parser p, char *input ){
if( ! r ) return NULL;
result res = parse( p->b, input + r->length_matched );
if( res ){
amend_lengths( res, r->length_matched );
result tail = res;
while( tail->next ) tail = tail->next;
tail->next = parse_sequence_next( r->next, p, input );
return res;
} else return parse_sequence_next( r->next, p, input );
}

result parse_sequence( parser p, char *input ){
result res = parse( p->a, input );
return parse_sequence_next( res, p, input );
}


parser fails(){ return new_parser( (struct parser){ .test = parse_fail } ); }
parser succeeds(){ return new_parser( (struct parser){ .test = parse_succeed } ); }

parser term( int c ){
return new_parser( (struct parser){ .test = parse_term, .c = c } );
}
parser alternate( parser a, parser b){
return new_parser( (struct parser){ .test = parse_alternate, .a = a, .b = b } );
}
parser sequence( parser a, parser b){
return new_parser( (struct parser){ .test = parse_sequence, .a = a, .b = b } );
}
parser many( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
parser r = new_parser( (struct parser){ .test = parse_alternate, .a = q, .b = succeeds() } );
q->b = r;
return r;
}
parser some( parser a ){
parser q = new_parser( (struct parser){ .test = parse_sequence, .a = a } );
parser r = new_parser( (struct parser){ .test = parse_alternate, .a = q, .b = succeeds() } );
q->b = r;
return q;
}

parser char_class( char *str ){
return *str ? alternate( term( *str ), char_class( str + 1 ) ) : fails();
}
parser string( char *str ){
return *str ? sequence( term( *str ), string( str + 1 ) ) : succeeds();
}


void print_results( result res ){
if( res ) printf( "%d ", res->length_matched ),
print_results( res->next );
}

void test( parser p, char *str ){
printf( "%s:", str );
fflush(0);
result r = parse( p, str );
print_results( r );
printf( "\n" );
}

int main(){
parser p = string("hello");
test( p, "hello" );
test( p, "hell" );
test( p, "hello w" );
test( p, "xxxxx" );

parser q = many( term( 'x' ) );
test( q, "xy" );
test( q, "xx" );
test( q, "xxxy" );
test( q, "xxxxy" );
test( q, "xxxxx" );

parser r = sequence( many( term( 'x' ) ), string( "xy" ) );
test( r, "xy" );
test( r, "xx" );
test( r, "xxxy" );
test( r, "xxxxy" );
test( r, "xxxxx" );
return 0;
}

Norah@laptop ~
$ make pc3 CFLAGS+='-std=c99 -g'
cc -std=c99 -g pc3.c -o pc3

Norah@laptop ~
$ ./pc3
hello:5
hell:
hello w:5
xxxxx:
xy:1 0
xx:2 1 0
xxxy:3 2 1 0
xxxxy:4 3 2 1 0
xxxxx:5 4 3 2 1 0
xy:2
xx:
xxxy:4
xxxxy:5
xxxxx:

Norah@laptop ~
$

Melzzzzz

unread,
Jun 17, 2017, 12:05:12 AM6/17/17
to
On 2017-06-16, luser droog <luser...@gmail.com> wrote:
> On Tuesday, June 13, 2017 at 6:44:07 AM UTC-5, luser droog wrote:
>> On Monday, June 12, 2017 at 3:53:42 PM UTC-5, luser droog wrote:
>> > On Monday, June 12, 2017 at 6:25:10 AM UTC-5, Ben Bacarisse wrote:
>> > > luser droog <luser...@gmail.com> writes:
>
>> > > >> > This will match 0 or more of x and return the length matched.
>> > > >>
>> > > >> What about:
>> > > >>
>> > > >> parser p = sequence2(some(term('x')), string("xy"));
>> > > >> test(p, "xxxxy");
>> > > >>
>> > > >> ?
>> > > >
>> > > > Touché. That doesn't work. The quantifiers I've supplied are strictly
>> > > > greedy.
>> > >
>> > > Just a small point to avoid confusion: "strictly greedy" seems to be
>> > > your own term, at least I've not hear it before. The normal Kleene star
>> > > operator is greedy in the usual meaning of the term -- that it matches
>> > > the longest possible string /that still allows the whole pattern to
>> > > match/.
>> > >
>> >
>> > Thanks. A brief search suggests at least one person calls
>> > this a "possessive" Kleene star.
>> >
>> > https://cs.stackexchange.com/questions/40712/possessive-kleene-star-operator
>>
Take a look at Haskell solution ;)

[bmaxa@maxa-pc haskell]$ cat combinator.hs
data Parser = Term Char | Alt Parser Parser | Seq Parser Parser | Many Parser

parse (Term c) n@(input:rest)
| c == input = (True,rest)
| otherwise = (False,n)
parse (Alt a b) input = let a' = parse a input
b' = parse b input
in if fst a' then a' else b'
parse (Seq m@(Many _) b) input = let a' = parse m input
b' = parse' b (snd a') input
in if fst a' then b' else a'
where
parse' a s n@(_:input)
| s == n = parse a n
| otherwise =
let a' = parse a n
in if fst a' then a'
else parse' a s input
parse (Seq a b) input = let a' = parse a input
b' = parse b $ snd a'
in if fst a' then b' else a'
parse m@(Many a) input
| f = parse m s
| otherwise = (True,s)
where
(f,s) = parse a input

string (x:[]) = Term x
string (x:xs) = Seq (Term x) (string xs)

char_class (x:[]) = Term x
char_class (x:xs) = Alt (Term x) (char_class xs)

main = do
let p = Seq (Alt (Term 'H') (Term 'h'))
(Seq (Term 'e') (Seq (Term 'l') (Seq (Term 'l') (Term 'o'))))
let q = Seq (char_class "Hh") (string "ello")
let r = Seq (string "ab") (Term 'c')
let s = Seq (Term 'a') (string "bc")
let t = Seq (Many (Seq (char_class "Aa") (string "ab"))) (string "abc")
print $ parse p "helloh"
print $ parse p "Hello"
print $ parse p "YaMaMa"
print $ parse q "hello"
print $ parse q "Hello"
print $ parse q "YaMaMa"
print $ parse r "abc"
print $ parse s "abcd"
print $ parse t "aabAabcd"

[bmaxa@maxa-pc haskell]$ ./combinator
(True,"h")
(True,"")
(False,"YaMaMa")
(True,"")
(True,"")
(False,"YaMaMa")
(True,"")
(True,"d")
(True,"d")

luser droog

unread,
Jun 17, 2017, 2:05:58 AM6/17/17
to
On Friday, June 16, 2017 at 11:05:12 PM UTC-5, Melzzzzz wrote:
> On 2017-06-16, luser droog <luser...@gmail.com> wrote:
> > On Tuesday, June 13, 2017 at 6:44:07 AM UTC-5, luser droog wrote:
> >> On Monday, June 12, 2017 at 3:53:42 PM UTC-5, luser droog wrote:
> >> > On Monday, June 12, 2017 at 6:25:10 AM UTC-5, Ben Bacarisse wrote:
> >> > > luser droog <luser...@gmail.com> writes:
> >
> >> > > >> > This will match 0 or more of x and return the length matched.
> >> > > >>
> >> > > >> What about:
> >> > > >>
> >> > > >> parser p = sequence2(some(term('x')), string("xy"));
> >> > > >> test(p, "xxxxy");
> >> > > >>
> >> > > >> ?
> >> > > >
> >> > > > Touché. That doesn't work. The quantifiers I've supplied are strictly
> >> > > > greedy.
> >> > >
> >> > > Just a small point to avoid confusion: "strictly greedy" seems to be
> >> > > your own term, at least I've not hear it before. The normal Kleene star
> >> > > operator is greedy in the usual meaning of the term -- that it matches
> >> > > the longest possible string /that still allows the whole pattern to
> >> > > match/.
> >> > >
> >> >
> >> > Thanks. A brief search suggests at least one person calls
> >> > this a "possessive" Kleene star.
> >> >
> >> > https://cs.stackexchange.com/questions/40712/possessive-kleene-star-operator
> >>
> Take a look at Haskell solution ;)

I'm afraid I'm not qualified to judge the quality of your code due to
insufficient familiarity with Haskell; but it's immensely gratifying
that my posts have inspired at least one other person to try coding
this up.

On the third hand, it's not greatly surprising that Haskell can handle
this easily. AIUI from the paper I ripped off, the Parser-Combinator
concept was originally worked up in the Miranda language, an fp language
that's an ancestor of Haskell.

> [bmaxa@maxa-pc haskell]$ cat combinator.hs
> data Parser = Term Char | Alt Parser Parser | Seq Parser Parser | Many Parser
>
> parse (Term c) n@(input:rest)
> | c == input = (True,rest)
> | otherwise = (False,n)
> parse (Alt a b) input = let a' = parse a input
> b' = parse b input
> in if fst a' then a' else b'
> parse (Seq m@(Many _) b) input = let a' = parse m input
> b' = parse' b (snd a') input
> in if fst a' then b' else a'
> where
> parse' a s n@(_:input)
> | s == n = parse a n
> | otherwise =
> let a' = parse a n
> in if fst a' then a'
> else parse' a s input

While lacking any real certainty, I wonder if this part should not
be necessary. Am I right that this is special-casing a Seq whose
first member happens to be a Many? I feel there ought to be a way
to do it without a special case. Since you can I believe construct
infinite lists, the Many combinator can build its action out of
Seq and Alt ad infinitem.

I don't actually know how to do what I describe. But if I understand
Haskell half as well as I know I don't, then it's certainly definitely
probably possible, maybe.
The last one doesn't make sense to me. Is it matching /half/ of that inner Seq?

[All of my comments here are intended to be constructive and encouraging.
Any offense or missing-of-the-point is entirely unintentional.]

Melzzzzz

unread,
Jun 17, 2017, 2:16:45 AM6/17/17
to
Yes.

I feel there ought to be a way
> to do it without a special case.

Well, how I did it, it requires special case when Many is
first member.

Since you can I believe construct
> infinite lists, the Many combinator can build its action out of
> Seq and Alt ad infinitem.

Many shouldn't know about Seq and Alt, IMO, rather Seq should know about
Many.

>
> I don't actually know how to do what I describe. But if I understand
> Haskell half as well as I know I don't, then it's certainly definitely
> probably possible, maybe.

Well you can do it in many ways ;)
> The last one doesn't make sense to me. Is it matching /half/ of that inner Seq?
It is matching (Many [Aa]"ab") ending with (string "abc")
I return unparsed part .

tko...@gmail.com

unread,
Aug 10, 2017, 9:38:50 AM8/10/17
to
Well, I guess I'll jump onto the language bandwagon and post my C++ translation of the program (reduced to one file for your convenience). I couldn't implement `many` the way you did, since my code doesn't allow cycles in the parser structure.

#include <algorithm>
#include <functional>
#include <iostream>
#include <list>
#include <memory>

namespace uparse {
class Parser;
using Result = std::list<int>;
using PPtr = std::unique_ptr<Parser>;
class Parser {
public:
virtual Result parse(const char* s) = 0;
virtual ~Parser() {};
};
class Succeed : public Parser {
public:
Result parse(const char* s) override {
return { 0 };
}
};
class Fail : public Parser {
public:
Result parse(const char* s) override {
return {};
}
};
class Term : public Parser {
public:
Term(int c) : c(c) {}
Result parse(const char* s) override {
return s[0] == c ? std::list<int>{ 1 } : std::list<int>{};
}
int c;
};
class Alternate : public Parser {
public:
Alternate(std::unique_ptr<Parser> a, std::unique_ptr<Parser> b) :
a(std::move(a)), b(std::move(b)) {}
Result parse(const char* s) override {
Result res1 = a->parse(s);
Result res2 = b->parse(s);
res1.splice(res1.end(), res2);
return res1;
}
std::unique_ptr<Parser> a;
std::unique_ptr<Parser> b;
};
void amendLengths(Result& r, int length);
class Sequence : public Parser {
public:
Sequence(std::unique_ptr<Parser> a, std::unique_ptr<Parser> b) :
a(std::move(a)), b(std::move(b)) {}
Result parse(const char* s) override {
Result res = a->parse(s);
return parseNext(res, s);
}
std::unique_ptr<Parser> a;
std::unique_ptr<Parser> b;
private:
Result parseNext(Result& r, const char* s) {
Result result;
for (int len : r) {
Result res = b->parse(s + len);
amendLengths(res, len);
result.splice(result.end(), res); // append res to result
}
return result;
}
};
class Many : public Parser {
public:
Many(std::unique_ptr<Parser> a) :
a(std::move(a)) {}
Result parse(const char* s) override {
Result res = { 0 };
Result result;
while (!res.empty()) {
result.insert(result.end(), res.begin(), res.end());
res = parseNext(res, s);
}
return result;
}
std::unique_ptr<Parser> a;
private:
Result parseNext(Result& r, const char* s) {
Result result;
for (int len : r) {
Result res = a->parse(s + len);
amendLengths(res, len);
result.splice(result.end(), res); // append res to result
}
return result;
}
};
void test(Parser& p, const char* str);
inline void test(PPtr& p, const char* str) {
test(*p, str);
}
// Convenience functions
PPtr term(int c);
PPtr operator+(PPtr a, PPtr b);
PPtr operator|(PPtr a, PPtr b);
PPtr many(PPtr a);
PPtr succeed();
void amendLengths(Result& r, int length) {
for (int& len : r) len += length;
}
void test(Parser& p, const char* str) {
std::cout << str << ": ";
Result r = p.parse(str);
for (int len : r)
std::cout << len << ' ';
std::cout << '\n';
}
PPtr term(int c) {
return std::make_unique<Term>(c);
}
PPtr operator+(PPtr a, PPtr b) {
return std::make_unique<Sequence>(std::move(a), std::move(b));
}
PPtr operator|(PPtr a, PPtr b) {
return std::make_unique<Alternate>(std::move(a), std::move(b));
}
PPtr many(PPtr a) {
return std::make_unique<Many>(std::move(a));
}
PPtr succeed() {
return std::make_unique<Succeed>();
}
}


using uparse::term;
using uparse::many;
using uparse::PPtr;

int main() {
PPtr p = term('h') + term('e') + term('l') + term('l') + term('o');
test(p, "hello");
test(p, "hell");
test(p, "hello w");
PPtr p2 = (term('H') | term('h')) + term('e') + term('l') + term('l') + term('o');
test(p2, "Hello");
test(p2, "hell");
test(p2, "hello w");
PPtr p3 = many(term('x'));
test(p3, "xxx");
test(p3, "xxxxx");
test(p3, "xyzzy");
}

Steve Carroll

unread,
Aug 12, 2017, 4:08:01 PM8/12/17
to
This is something you should ask Marek. Time to blame the herd! Nobody gets it, I barely understand it. Allah doesn't have any idea what he is blubbering about.

--
What Every Entrepreneur Must Know
http://www.5z8.info/racist-raps_r6r3wf_best-russian-sites-for-bootleg-everything
http://www.5z8.info/hack-outlook_c6c9in_michaelangelo-virus
https://youtu.be/lF8yk7ul4Mw
Jonas Eklundh Communication

luser droog

unread,
Aug 20, 2017, 9:45:58 PM8/20/17
to
On Thursday, August 10, 2017 at 8:38:50 AM UTC-5, tko...@gmail.com wrote:
> On Friday, 16 June 2017 05:22:15 UTC-4, luser droog wrote:
> > Greedy, but not possessive.
> >
> > Lists aren't so terrible, dealt with functionally.
> >
<snip c program>
>
> Well, I guess I'll jump onto the language bandwagon and post my C++ translation of the program (reduced to one file for your convenience). I couldn't implement `many` the way you did, since my code doesn't allow cycles in the parser structure.
>

Melzzz had to do that one weirdly as well. And the record show the effort it took
to come with my twisty loop. There's something weird about the `many` combinator.

But I didn't see /why/ my twisty loop wouldn't work in C++. Could you explain
further how yours works for the 'C-only's?

In the C version, `parser many(parser p)` makes a sequence

seq( p, x )

where x points to an alteration

alt( y, true )

where y points back to the sequence. Is it because of inheritance that you
can't do it this way in C++ You can't have infinite inheritance?


0 new messages