Hi back,
As i told you early i was planning to modify the code in order to solve the
problems that i had.
During that i found and fix what i think was a bug.
It is when we have an ambiguous grammar and it is asked to process the tree
with the ambiguous flag. The problem is that it call the grammar function
with list of symnode or rulenode (i dont remember) rather than with what
should be the kids. The problem was in the order that you checked the
lengths of the possibilities of the tree. As it is a tree now, what i did
was defined a new function to return the next grammar processed in a Depth
first tree manner. I didnt want to return the list of trees, because some
initialization probably has to be done before the next one start. I know
that that could be a function in the code section of the grammar, but i
keeped it simple.
Also, as i was parsing a huge message with a huge grammar, the proctree
reach the recursion limit, so i implement an iterative function for it. To
my surprise it runs a little bit slower than the recursive, i dont know if i
mess it up somewhere or is because how i m managing the stack. Anyways, i
stop taking a look on it, probably i ll do it later. but here is my little
contribution. I ll copy in the message but if you want me to send u the
file, let me know. This function should be in the helpers.py file (just in
case)
class TreeProc :
def __init__(self, gram, allowambig=0) :
self.gram = gram
self.allowambig = allowambig
self.pathschoose = []
self.exploringitem = 0
def nextproctree(self,t):
while len(self.pathschoose):
curpath,curposs = self.pathschoose.pop()
curpath += 1
if curpath != curposs:
self.pathschoose.append( (curpath,curposs) )
break
self.exploringitem = 0
if len(self.pathschoose):
return self.proctree(t)
else:
return None
def proctree(self,t):
rulestack = []
stack = []
value = []
rule = None
stack.append(t)
while len(rulestack) or len(stack):
if len(stack):
curr = stack.pop(0)
else: #elif len(rulestack):
newvalue = rule(value)
rule, value, stack = rulestack.pop()
value.append(newvalue)
continue
if isinstance(curr, glr.symnode) :
if len(curr.possibilities) == 0 :
# for terminals, return the token value
value.append( curr.sym[1] )
elif len(curr.possibilities) == 1 :
stack.insert(0,curr.possibilities[0])
elif self.allowambig :
# Save the path where is going if this area was not
explored earlier or
# continue by the path
if len(self.pathschoose) <= self.exploringitem:
self.pathschoose.append( (0,len(curr.possibilities))
)
nextpath = self.pathschoose[self.exploringitem][0]
self.exploringitem += 1
stack.insert(0,curr.possibilities[nextpath])
else: #elif len(t.possibilities) > 1 :
raise AmbigParseError("ambiguous parse of %s" % curr.sym
)
elif isinstance(curr, glr.rulenode) :
# return the result of the semantic action performed on the
# children.
rulestack.append( (rule, value, stack) )
sym,redlen,prodno = curr.rule
rule = self.gram.semactions[prodno]
stack = curr.elements[:]
value = []
else :
raise ApiError("Illegal node in tree: %s" % curr.__class__)
if len(value) > 1 :
raise error("Something wrong in the production of the rule :s
%s" % curr.__class__)
return value[0]
A second thing was, as i told you earlier, was for the following discussion:
> Taking a look in the code i found that when the glr.py call the token
> > > function, the latter even if it founds more than one token it only
> > > returns the first one. And in no place it cares about which are the
> > > possible next tokens.
> > Yes. The parser will handle ambiguous grammars, but the lexer
> > does not handle ambiguous lexing rules. However, in your case
> > you can disambiguate entirely in the lexer:
What i did was to call the token function with the list of allowed tokens
(as YAPPS approach) and it improves a lot in ambiguous lexers and grammars.
Still is not enough because of what i ll tell you later, but what i did is
not really clean for the moment. The thing is that the only way for a token
(in the lexer) to know the name of it is from the return value. So in order
to check which is the name of the token in the lexer i ve to run the action
to get the returned value. But the problem is that i m running the action
(some code can be set there and I ve side effects of course). So, for my
examples in which i dont put code except of the "return TOKEN", it works
perfectly. What i was thinking to do is to extend the grammar to be:
deflist ::= IDENT<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>ID
:
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>regexp
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
\n<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
deflist ::= deflist
IDENT<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
ID
: ¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>regexp
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
EOL<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
or something like it rather than:
rulelist ::= ¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>rulepat
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
SRCCODE<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
rulelist ::= rulelist
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>rulepat
¨<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
SRCCODE<http://users.lava.net/%7Enewsham/pyggy/html/%3CpyGrammarToken%3E%3Cpy...>
An in that way when the lexer runs in a new file it can pre calculate the
name of the token of each expression and i dont ve to run the action in
order to know it and it keeps the full functionality u did for.
If you are interest in that approach let me know.
Now the second problem is even if i pass to the lexer the allowed list of
token, the lexer can still find more than one. For this problem i dont see
any better solution than to start a new active parser for each token
returned. But i didnt work on it yet.
Well, that it is.
Humberto