Need help to debug a wired grammar

29 views
Skip to first unread message

Sen Han

unread,
Nov 18, 2014, 12:28:43 PM11/18/14
to treet...@googlegroups.com

grammar AgencyCompany

  rule expression
    find_agency / find_company
  end

  rule find_agency
    agency space that space concrete_company space work_with
  end

  rule find_company
    company space that concrete_agency space work_with
  end

  rule work_with
    ("has" / "have") space ("partnered" / "partner" / "worked" / "work") space "with"
  end

  rule that
    "that" / "which"
  end

  rule company
    "companies" / "vendors" / "company" / "vendor"
  end

  rule agency
    "agency" / "agencies"
  end

  rule my_company
    "my company"
  end

  rule concrete_company
    entity_name
  end

  rule concrete_agency
    entity_name
  end

  rule entity_name
    (word (space word)?)+ / word
  end

  rule word
    [a-zA-Z]+
  end

  rule space
    [\s]+
  end

end

(Treetop.load "./lib/bgov_brain/queries/agency_company.treetop").new.parse("agency that locked martin has worked with") #=> Works
(Treetop.load "./lib/bgov_brain/queries/agency_company.treetop").new.parse("agency that locked martin abc has worked with") #=> Not Works
(Treetop.load "./lib/bgov_brain/queries/agency_company.treetop").new.parse("agency that locked  has worked with") #=> Not Works
(Treetop.load "./lib/bgov_brain/queries/agency_company.treetop").new.parse("company that dod  has worked with") #=> Not Works
(Treetop.load "./lib/bgov_brain/queries/agency_company.treetop").new.parse("company that department of education  has worked with") #=> Not Works
 
I feel I have a missing concept to understand. 


Clifford Heath

unread,
Nov 18, 2014, 4:59:44 PM11/18/14
to treet...@googlegroups.com
Sen,

As soon as your grammar tries entity_name, the repetition of words is unlimited, and consumes the rest of the input.
When your grammar requires other words after that, it will always fail.

Treetop, like other packrat & PEG parsers, will never retry with a reduced number of repetitions.
You must prevent the excessive repetition from occurring, possibly by using a negative lookahead.
In your case it looks like this rule might work:

rule entity_name
(word (space !work_with word)?)+ / word
end

Clifford Heath.
> --
> You received this message because you are subscribed to the Google Groups "Treetop Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to treetop-dev...@googlegroups.com.
> To post to this group, send email to treet...@googlegroups.com.
> Visit this group at http://groups.google.com/group/treetop-dev.
> For more options, visit https://groups.google.com/d/optout.

Sen Han

unread,
Nov 19, 2014, 11:37:42 AM11/19/14
to treet...@googlegroups.com
Great. Thanks. It works. 
Reply all
Reply to author
Forward
0 new messages