Issues with explicitly defined groups in GOLD Builder 5.2

118 views
Skip to first unread message

Arsène von Wyss

unread,
Dec 18, 2012, 8:57:25 PM12/18/12
to gold-pars...@googlegroups.com
I'm trying to define groups explicitly as per the documentation. Right now I try to define a non-noise line block.

This is my grammar:

"Start Symbol" = <Program>
               
NL = {CR}? {LF}
NL @= { Type = Noise }
    
Whitespace = ({Space}|{HT})+

LnBlk @= { Type = Content }
LnBlk Start = '--'
LnBlk End = NL
LnBlk Block @= { Ending = Open }
                        
<Program> ::= LnBlk

As you can see, I deliberately don't use the names generated by GOLD for implicit comment group creation; neither the newline nor the group have a "special" name. The tables build fine, but I cannot parse my test line:

-- test

(Note that I do have a newline after the "test" line). Expected would be that I get a LnBlk token.

Digging deeper, I analyzed the symbols table. This reveals something interesting: the NL symbol exists twice, once as "Noise"/"Defined in Grammar", and once as "Lexical Group End"/"Implicitly Defined". I assume that this is the cause of the problem. Note that I'm following the documented example for the "Pascal" block comments as described here: http://goldparser.org/doc/grammars/example-group.htm

This prompts a general issue with defining end tokens in groups that do not consume the end token: this end token can be any terminal that is supposed to be re-usable otherwise. As such, the "Lexical Group End" type should not be used in this case.

In fact, even if the block consumes the end token it should not be a "Lexical Group End". Think of a grammar which allows arbitrary text blocks like so:

"Start Symbol" = <Statement>
               
End = 'end'
Begin = 'begin'
Statement = {AlphaNumeric}+
               
Message @= { Type = Content }
Message Start = 'message:'
Message End = End
Message Block @= { Advance = Character, Ending = Closed }
                        
<Statement> ::= Statement
             |  Message
             |  <Block>
             
<StatementList> ::= <Statement> ';' <StatementList>
                 |
           
<Block> ::= Begin <StatementList> End

It fails to generate the DFA states ("Cannot distinguish between: End End"). Okay, so let's not define "end" as terminal and try again...

"Start Symbol" = <Statement>
               
Statement = {AlphaNumeric}+
               
Message @= { Type = Content }
Message Start = 'message:'
Message End = end
Message Block @= { Advance = Character, Ending = Closed }
                        
<Statement> ::= Statement
             |  Message
             |  <Block>
             
<StatementList> ::= <Statement> ';' <StatementList>
                 |
           
<Block> ::= begin <StatementList> end

Now it creates the tables alright. Inspecting the symbols however show that "end" is a "Lexical Group End", even though it should be a "Content" really. The test however now parses fine (yeeha!):

begin
  statement;
  message: funky stuff end;
end

So, in summary, I see the following two major issues:
  • Explicitly defined terminals cannot be used as group start/end symbols as described in the example. Trying to do so causes the symbol table to contain non-distinct names and leads to DFA table creation errors.
  • The "Lexical Group End" type is meaningless and should be completely dropped, so that those symbols are normal terminals of type "Content" or "Noise"
It would be great to have those fixed.

Thanks, Arsène

Devin

unread,
Jan 6, 2013, 8:59:31 PM1/6/13
to gold-pars...@googlegroups.com
I'll take a look and try to get a fix asap.

- Devin

Devin

unread,
Jan 6, 2013, 9:48:42 PM1/6/13
to gold-pars...@googlegroups.com
Ah. The problem is when I assign the symbol types after the grammar is parsed. I'll fix it soon an release a patch.

The Group End type is designed to be used to identify a symbol, that is recognized, but not used as a content symbol. The case you pointed out is a bug that I will fix.

Thanks for all your help!


- Devin


On Tuesday, December 18, 2012 5:57:25 PM UTC-8, Arsène von Wyss wrote:

Arsène von Wyss

unread,
May 8, 2013, 7:36:11 AM5/8/13
to gold-pars...@googlegroups.com
Not that I know of... Devin?

Cheers, Arsène

On Tuesday, April 30, 2013 9:19:03 AM UTC+2, Neil Anderson wrote:
Hi,

Was this ever fixed?

Thanks,
Neil
Reply all
Reply to author
Forward
0 new messages