Trouble with collecting multiple words at end of line into "notes"

34 views
Skip to first unread message

Dave Collins

unread,
Aug 1, 2017, 7:54:13 AM8/1/17
to Treetop Development
Hello,

  This is probably something easy I'm missing...  I want to have a grammar that matches certain words at the beginning of the line, but then any other words are just collected as "notes".

  I can get it to work but only with :consume_all_input set to false.  Whenever I set :consume_all_input to true it fails to parse because it expects another "word" when there is no more input.

Here is the simplest grammar to reproduce the issue:

grammar AnimalGrammar

  rule animals
    animal+ notes
  end

  rule animal
    "cat " / "dog "
  end

  rule notes
    word*
  end

  rule word
    ([a-zA-Z0-9]+ [ \t\n!.]+) {
      def content
        text_value.strip
      end
    }
  end

end


So, for the input "cat and dog can get along" I would like it to parse and have "cat dog " as animals and "can get along" as notes, but it returns:
"Expected one of [a-zA-Z0-9], [ \\t\\n!.] at line 1, column 22 (byte 22) after along"

I'm sure I have to tell it that a "word" is done matching if there is no more input (which I tried to do with !.) but must be doing that wrong.

Other things I would like it to parse:

"dog cat"   (animal: "dog cat " notes empty)

Any help on how to deal with this "nothing left" condition would be most appreciated!

Thanks!
Dave

Clifford Heath

unread,
Aug 1, 2017, 8:34:35 AM8/1/17
to treetop-dev@googlegroups.com Development
You need to make sure you match the final newline. Your parse fails because you don't consume all the input. 

Clifford Heath 

--
You received this message because you are subscribed to the Google Groups "Treetop Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to treetop-dev+unsubscribe@googlegroups.com.
To post to this group, send email to treet...@googlegroups.com.
Visit this group at https://groups.google.com/group/treetop-dev.
For more options, visit https://groups.google.com/d/optout.

Dave Collins

unread,
Aug 1, 2017, 9:27:02 AM8/1/17
to Treetop Development
Thanks for your reply Clifford, and I understand that is what must be the problem.  However, I have tried so many variations of the "word" rule with so many variations of \r\n, !., etc. that I just don't see how to consume that last newline.

Also, I'm not explicitly passing a newline:
parser.parse("cat dog can get along", :consume_all_input => true)

But I'm assuming that one is present or added somehow?

If you could be a bit more specific or point me at an example it would be most appreciated.

Thanks much,
Dave
To unsubscribe from this group and stop receiving emails from it, send an email to treetop-dev...@googlegroups.com.

mar...@reality.com

unread,
Aug 1, 2017, 11:34:25 AM8/1/17
to Treetop Development

The problem is you are including whitespace delimiters in your definition
of words, and are including newlines in your definition of whitespace, so
notes just sucks up words (and whitespace, including newlines) until the
end of the file. If instead you defined:

rule word
[a-zA-Z0-9]+
end

rule optional_whitespace
[ \t]*
end

rule newline
[\n]
end

rule notes
(word optional_whitespace)*
end

...and then explicitly parsed newlines where you wanted them (instead of
implicitly in word) you should get better results.

-- Markus

--
Sent from PINE (GUI? Phooey!)

mar...@reality.com

unread,
Aug 1, 2017, 12:59:17 PM8/1/17
to Treetop Development

> The problem is you are including whitespace delimiters in your definition
> of words, and are including newlines in your definition of whitespace, so
> notes just sucks up words (and whitespace, including newlines) until the
> end of the file.

In my hasty reading of your first email I may have misunderstood the
problem you were describing; in either case the change I proposed should
fix it, since the root issue is that your definition of word also requires
trailing whitespace, which your example input didn't have.

Dave Collins

unread,
Aug 1, 2017, 9:31:11 PM8/1/17
to Treetop Development
Thanks so much Markus (and Clifford) for your responses.

 That is too funny because I had recently rewritten the grammar to move the spaces into the words.  The original version had a separate whitespace rule... All was working fine except for this "notes collection" issue.

  I'll convert it back and see if I can repro the problem with the small grammar.  Either way I'll post here.

Thanks again,
Dave

Dave Collins

unread,
Aug 1, 2017, 9:47:31 PM8/1/17
to Treetop Development
Markus!  You were spot-on!  Changing the grammar to the optional_whitespace concept and moving the spaces and newlines out of the definition of word solved the issue.

Here is the version that works.  Also it works with or without a \n at the end of the line:

grammar AnimalGrammar

  rule animals
    (animal optional_ws)+ notes
  end

  rule animal
    "cat" / "dog"
  end

  rule notes
    (word optional_ws)*
  end

  rule word
    [a-zA-Z0-9]+ {
      def content
        text_value.strip
      end
    }
  end

  rule optional_ws
    [ \t\n]*
  end

end

Thanks so much for the suggestion and help
Dave



Reply all
Reply to author
Forward
0 new messages