Parser infinite loop

43 views
Skip to first unread message

Will Myers

unread,
Jan 20, 2013, 8:38:35 PM1/20/13
to treet...@googlegroups.com
Hi, All

Hoping to get some help as my parser keeps deadspinning. 

I'm trying to parse a Latex document into HTML markup. So far I'm starting simply, I just want to parse an input with `tags` and `text`. When I start the following grammar using :root => :paragraph, I get an infinite loop with the following input:

'friend \emph{hello}\n\n'

It seems to be stalling at the first '\n' character. I've been building the grammar into a .rb file and debugging but can't seem to work out why it can't escape. 

grammar Latex
  rule document
    (paragraph)* {
      def content
        [:document, elements.map { |e| e.content }]
      end
    }
  end

  rule paragraph
    ( tag / text )* eop {
      def content
        [:paragraph, elements.map { |e| e.content } ]
      end
    }
  end

  rule text
    ( !( tag_start / eop) . )* {
      def content
        [:text, text_value ]
      end
    }
  end

  # Example: \tag{inner_text}
  rule tag
    tag_start tag_type "{" inner_text "}" {
      def content
        [tag_type, inner_text.content]
      end
    }
  end 

  # Example: \emph{inner_text}
  rule inner_text
    ( !'}' . )* {
      def content
        [:inner_text, text_value]
      end
    }
  end

  rule eop
    newline 2.. {
      def content
        [:newline, text_value]
      end
    }
  end

  rule tag_type
    "emph" / "texttt"
  end

  rule newline
    "\n"
  end

  rule tag_start
    "\\"
  end

end

Clifford Heath

unread,
Jan 20, 2013, 9:29:09 PM1/20/13
to treet...@googlegroups.com
Will,

I can't spot anything straight away. In a situation like this, I tend to do the following:

require 'ruby-debug'
Debugger.start
trap "INT" do
puts caller*"\n\t"
debugger
end

Run your program and hit ^C.

Clifford Heath.
> --
> You received this message because you are subscribed to the Google Groups "Treetop Development" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/treetop-dev/-/v9tvR5ToZPQJ.
> To post to this group, send email to treet...@googlegroups.com.
> To unsubscribe from this group, send email to treetop-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/treetop-dev?hl=en.

Will Myers

unread,
Jan 22, 2013, 7:56:05 PM1/22/13
to treet...@googlegroups.com
Thanks for the reply, Clifford. 

When I run this and hit ^C I usually hit one of the following three places: 

(eval):118:in `call'
(eval):118:in `_nt_text'
(eval):73:in `block in _nt_paragraph'
(eval):67:in `loop'
(eval):67:in `_nt_paragraph'

or 

(eval):206:in `call'
(eval):206:in `_nt_tag'
(eval):69:in `block in _nt_paragraph'
(eval):67:in `loop'
(eval):67:in `_nt_paragraph'

or 

(eval):68:in `call'
(eval):68:in `block in _nt_paragraph'
(eval):67:in `loop'
(eval):67:in `_nt_paragraph'

I'm afraid that doesn't clear anything up for me, I already knew where it was spinning from pry. Any other thoughts?

Clifford Heath

unread,
Jan 22, 2013, 10:01:14 PM1/22/13
to treet...@googlegroups.com
Ahh, found it. Paragraph matches an unlimited number of tags or text.
Text can however match zero-length input; so you're getting an unlimited
number of variable-length texts. Try using + instead of * in text.

Clifford Heath.
> --
> You received this message because you are subscribed to the Google Groups "Treetop Development" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/treetop-dev/-/wrGTsIdpcI8J.

Will Myers

unread,
Jan 22, 2013, 11:24:27 PM1/22/13
to treet...@googlegroups.com
Clifford, 

That was it! Many thanks.
Reply all
Reply to author
Forward
0 new messages