Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Simple parsing problem

2 views

Skip to first unread message

Eric Fowler

unread,

Jun 21, 2009, 11:09:55 PM6/21/09

This should be easy practice for the experts ..
I am writing a bison grammar to parse strings coming from various
kinds of attached devices.

One of the strings is of the form:
$FOO,field1,field2, 0,a,1,b,3,c, ....<CRLF>

where there are a variable number of paired fields of the form
<number> COMMA <text> COMMA. The comma is always a delimiter here,
the text contains no commas.

This one was easy, I did it like this:
FOO opt_token opt_token dse_data_set
{mumble ...}

dse_data_set:
dse_data_pair dse_data_set
| dse_data_pair

dse_data_pair:
opt_token opt_token
{
...my code here ...
}
;

opt_token:
COMMA_DELIM TOKEN
{
memcpy($$, $2, sizeof($$)/sizeof(*$$) - 1);
}
| COMMA_DELIM
{ *$$ = 0;}
;

All well and good. Now I got this curveball - it is the same as the
other one, but it has another opt_token field after the variable
length list of pairs:

$FOOBAR,field1,field2,0,a,1,b,
3,c....,field3<CRLF>

[An opt_token is just a comma delimited field that might be empty, BTW. ]

I am getting shift-reduce conflicts when I try to handle this like this:
FOOBAR opt_token opt_token dse_data_set opt_token

I can vaguely see why this is happening ... the parser can't tell the
diff between opt_tokens that are in pairs or 'in the wild'. But I am
not clear how to fix this.

I could, I suppose, define something equivalent to opt_token that does
the same thing, but is different, and use it in my dse_data_pair, or
alternatively use it for my last opt_token. But I am wondering if
there is a cleaner play.

Thanks very much

Eric

Hans Aberg

unread,

Jun 23, 2009, 4:26:10 AM6/23/09

Eric Fowler wrote:
> I am writing a bison grammar to parse strings coming from various
> kinds of attached devices.
>
> One of the strings is of the form:
> $FOO,field1,field2, 0,a,1,b,3,c, ....<CRLF>
>
>
> where there are a variable number of paired fields of the form
> <number> COMMA <text> COMMA. The comma is always a delimiter here,
> the text contains no commas.

How about:
%token COMMA ","
%token FOO CRLF field1 field2
%token NUMBER LETTER
%%
item:
FOO "," field1 "," field2 "," sequence "," CRLF
;

sequence:
/* empty */
| sequence NUMBER "," LETTER
;
%%

If the last LETTER of a "sequence" should not have a terminating ",",
delete it from the "item" grammar variable. If sequence must be
non-empty, replace the empty rule by 'NUMBER "," LETTER'. (field1 and
field2 tokens here only to make the snippet Bison-compilable.)

Hans

Waldek Hebisch

unread,

Jun 23, 2009, 5:02:18 PM6/23/09

Try:

dse_data_set:
dse_data_set dse_data_pair
| dse_data_pair

Your definition forces parser to decide if dse_data_set is
complete too early. The deinition above allow extending
dse_data_set if it is followed by two tokens.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

0 new messages