The language we are parsing is valid xml, so I could work out a method
of marking the start of each tag by just marking the offset whenever we
encounter a '<' character in the correct context. We could just reparse,
starting at the last '<' every time we execute a read, until we
encounter the '>' in the correct (close tag) context. However, this is
ugly and will add a lot of complexity, not to mention tons of
backtracking, potentially. Suggestions for alternatives would
be greatly appreciated.
I should add that I am a total lex and yacc novice, although I have
ready access to folks who have done significant work with them, although
in every case, it was some years ago...
Thanks in advance.
(please cc my email address, )
--sam
[It's certainly doable, although not pretty. People have hacked up both
lex and yacc to be reentrant, which is a start. -John]
In my opinion, what you really want to solve your problem is a lexer
and parser that keep their internal states in an object. That's the
way we implemented Yacc++ (specifically so we could use the lexer and
parser as call-back objects in event driven systems). This is all
part of handling "inverted flow-of-control" properly.
The problem with most implementations of yacc and lex (including (at
least at one time) Bison and Flex) is that they assume that the
program is implemented top-down the parser controls the show and calls
the lexer when it needs another token. The lexer does the same thing
with the input stream, telling the input stream when to give it
characters.
In Yacc++, this model is exactly reversed, the input object controls
the show and hands the lexer a buffer of one or more characters, which
the lexer tokenizes and if it recognizes a token calls the parser
with. The parser takes the tokens the lexer gives it and parses them.
At each point (i.e. after the parser has processed a token the lexer
has given it or the lexer has finished the buffer of characters that
it was given) the higher level objects have their entire state in the
object and they return to the caller.
First aside, this is one of the reasons Yacc++ does not implement a
recursive descent option. Recursive descent depends on the parser
being able to use the system call stack to handle certain parts of the
parser state. You can't do that and implement an inverted flow-of-
control scheme, because you can't return to the caller and keep your
information in the call stack at the same time. (Give me a language
with threaded co-routines so that the parser stack can be maintained
while still returning control to the lexer running on another thread,
and I will give you a recursive descent parser that works with
inverted flow-of-control.)
Second aside, the yacc model of parser control is so pervasive that to
make Yacc++ not seem "too wierd" we implemented some wrapper functions
in Yacc++ that allow it to look like it implements the normal yacc
flow-of-control. One wrapper function called yy_parse() (to match the
top level yyparse() function in yacc) simply loops on the input object
until the parser declares that it has accepted the input. That way
users can call yy_parse() just like they would have called yyparse()
and the same parsing sequence happens. Another (set of) function(s)
adapts the Yacc++ parser to talk to LEX (or Flex) lexers and maps the
yacc/LEX interface (where the lexer returns a token and stores the
associated semantic value in the variable yylval) to the Yacc++ one
where the parser is "called" with a token object that contains the
token type and semantic value.
Now, Yacc++ has been using this model since 1990, and there are other
yacc derivatives that have adopted similar schemes in the meantime.
In particular, you may be able to find versions called "Bison++" and
"Flex++" that keep their states in objects. I have read that such
versions have been developed and made available (ca. 1995). Whether
they truly support inverted flow-of-control, I don't know. (I've
never had a strong reason to find out.)
In any case, it should be possible to make them so. The tables that
run the engines are agnostic in which flow-of-control order is used.
Disclaimer: I co-wrote Yacc++, so I have a rather strong bias in this
area.
-Chris
*****************************************************************************
Chris Clark Internet : com...@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)