HTML parser

27 views
Skip to first unread message

Roberto Saccon

unread,
Nov 20, 2007, 9:29:18 PM11/20/07
to MochiWeb
I checked out rev. 16 today and noticed that a HTML parser has been
added. After first big excitement, I tried some ugly things to crash
it, and managed to do so with conditional comments like this:

<!--[if lt IE 7]>
<style type="text/css">
.no_ie { display: none; }
</style>
<![endif]-->

any chances mochiweb_html will parse such constructs ?

regards
Roberto

b...@redivi.com

unread,
Nov 21, 2007, 12:52:33 AM11/21/07
to moch...@googlegroups.com
It's just a toy, and it's not finished. What are you intending to use it for?

Bob Ippolito

unread,
Nov 21, 2007, 1:35:28 AM11/21/07
to moch...@googlegroups.com
I also can't reproduce that problem... This matches:

[{comment, "[if lt IE 7]>\n<style type=\"text/css\">\n.no_ie {
display: none; }\n</style>\n<![endif]"}] =
tokens("<!--[if lt IE 7]>\n<style type=\"text/css\">\n.no_ie {
display: none; }\n</style>\n<![endif]-->"),

Can you give me some input that *actually* crashes?

Roberto Saccon

unread,
Nov 21, 2007, 2:46:52 AM11/21/07
to MochiWeb
I was just experimenting with kind of screen scrapping. So I tried
again, here the full file which caused on my Erlang OS X version R11B
the crash:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title>Foo</title>
<link rel="stylesheet" type="text/css" href="/static/rel/dojo/
resources/dojo.css" media="screen">
<link rel="stylesheet" type="text/css" href="/static/foo.css"
media="screen">
<!--[if lt IE 7]>
<style type="text/css">
.no_ie { display: none; }
</style>
<![endif]-->
<link rel="icon" href="/static/images/favicon.ico" type="image/x-
icon">
<link rel="shortcut icon" href="/static/images/favicon.ico"
type="image/x-icon">
</head>
<body id="home" class="tundra">
</body>
</html>

--------------------------------------------------

9> P=mochiweb_html:parse(B).

=ERROR REPORT==== 21-Nov-2007::05:38:14 ===
Error in process <0.37.0> with exit value: {function_clause,
[{mochiweb_html,tree,[[{comment,"[if lt IE 7]>\n <style type=\"text/
css\">\n .no_ie { display: none; }\n </style>\n <![endif]"},
{data,"\n ",true},{start_tag,"link",[{"rel","icon"},{"href"...

** exited: {function_clause,
[{mochiweb_html,
tree,
[[{comment,
"[if lt IE 7]>\n <style type=\"text/css\">
\n .no_ie { display: none; }\n </style>\n <![endif]"},
{data,"\n ",true},

-------------------------------------------------

when I take out the conditional comment I don't get the crash anymore:



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title>Foo</title>
<link rel="stylesheet" type="text/css" href="/static/rel/dojo/
resources/dojo.css" media="screen">
<link rel="stylesheet" type="text/css" href="/static/foo.css"
media="screen">
<link rel="icon" href="/static/images/favicon.ico" type="image/x-
icon">
<link rel="shortcut icon" href="/static/images/favicon.ico"
type="image/x-icon">
</head>
<body id="home" class="tundra">
</body>
</html>


Am I doing something wrong ?

regards
Roberto

b...@redivi.com

unread,
Nov 21, 2007, 3:00:25 AM11/21/07
to moch...@googlegroups.com
That looks right, I only tested the tokenizer, not the tree parser
(which uses the tokens as input). The fix is probably trivial, most
likely it just doesn't handle comments yet. The fact that it's
conditional doesn't matter, comments are tokenized correctly.

On 11/20/07, Roberto Saccon <rsa...@gmail.com> wrote:
>

Bob Ippolito

unread,
Nov 21, 2007, 3:57:44 AM11/21/07
to moch...@googlegroups.com
That's what it was, looks like it works fine now in r18. I used that
document as a test.

Roberto Saccon

unread,
Nov 21, 2007, 4:50:49 AM11/21/07
to MochiWeb
Great, thanks.

On Nov 21, 6:57 am, "Bob Ippolito" <b...@redivi.com> wrote:
> That's what it was, looks like it works fine now in r18. I used that
> document as a test.
>
> On 11/21/07, b...@redivi.com <b...@redivi.com> wrote:
>
> > That looks right, I only tested the tokenizer, not the tree parser
> > (which uses the tokens as input). The fix is probably trivial, most
> > likely it just doesn't handle comments yet. The fact that it's
> > conditional doesn't matter, comments are tokenized correctly.
>
Reply all
Reply to author
Forward
0 new messages