Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
HTML parser
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Roberto Saccon  
View profile  
 More options Nov 20 2007, 9:29 pm
From: Roberto Saccon <rsac...@gmail.com>
Date: Tue, 20 Nov 2007 18:29:18 -0800 (PST)
Local: Tues, Nov 20 2007 9:29 pm
Subject: HTML parser
I checked out rev. 16 today and noticed that a HTML parser has been
added. After first big excitement, I tried some ugly things to crash
it, and managed to do so with conditional comments like this:

    <!--[if lt IE 7]>
    <style type="text/css">
      .no_ie { display: none; }
    </style>
    <![endif]-->

any chances mochiweb_html will parse such constructs ?

regards
Roberto


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
b...@redivi.com  
View profile  
 More options Nov 21 2007, 12:52 am
From: b...@redivi.com
Date: Tue, 20 Nov 2007 21:52:33 -0800
Local: Wed, Nov 21 2007 12:52 am
Subject: Re: [mochiweb] HTML parser
It's just a toy, and it's not finished. What are you intending to use it for?

On 11/20/07, Roberto Saccon <rsac...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bob Ippolito  
View profile  
 More options Nov 21 2007, 1:35 am
From: "Bob Ippolito" <b...@redivi.com>
Date: Tue, 20 Nov 2007 22:35:28 -0800
Local: Wed, Nov 21 2007 1:35 am
Subject: Re: [mochiweb] HTML parser
I also can't reproduce that problem... This matches:

[{comment, "[if lt IE 7]>\n<style type=\"text/css\">\n.no_ie {
display: none; }\n</style>\n<![endif]"}] =
        tokens("<!--[if lt IE 7]>\n<style type=\"text/css\">\n.no_ie {
display: none; }\n</style>\n<![endif]-->"),

Can you give me some input that *actually* crashes?

On 11/20/07, b...@redivi.com <b...@redivi.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Saccon  
View profile  
 More options Nov 21 2007, 2:46 am
From: Roberto Saccon <rsac...@gmail.com>
Date: Tue, 20 Nov 2007 23:46:52 -0800 (PST)
Local: Wed, Nov 21 2007 2:46 am
Subject: Re: HTML parser
I was just experimenting with kind of screen scrapping. So I tried
again, here the full file which caused on my Erlang OS X version R11B
the crash:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
    <title>Foo</title>
    <link rel="stylesheet" type="text/css" href="/static/rel/dojo/
resources/dojo.css" media="screen">
    <link rel="stylesheet" type="text/css" href="/static/foo.css"
media="screen">
    <!--[if lt IE 7]>
    <style type="text/css">
      .no_ie { display: none; }
    </style>
    <![endif]-->
    <link rel="icon" href="/static/images/favicon.ico" type="image/x-
icon">
    <link rel="shortcut icon" href="/static/images/favicon.ico"
type="image/x-icon">
  </head>
  <body id="home" class="tundra">
  </body>
</html>

--------------------------------------------------

9> P=mochiweb_html:parse(B).

=ERROR REPORT==== 21-Nov-2007::05:38:14 ===
Error in process <0.37.0> with exit value: {function_clause,
[{mochiweb_html,tree,[[{comment,"[if lt IE 7]>\n    <style type=\"text/
css\">\n      .no_ie { display: none; }\n    </style>\n    <![endif]"},
{data,"\n    ",true},{start_tag,"link",[{"rel","icon"},{"href"...

** exited: {function_clause,
               [{mochiweb_html,
                    tree,
                    [[{comment,
                          "[if lt IE 7]>\n    <style type=\"text/css\">
\n      .no_ie { display: none; }\n    </style>\n    <![endif]"},
                      {data,"\n    ",true},

-------------------------------------------------

when I take out the conditional comment I don't get the crash anymore:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
    <title>Foo</title>
    <link rel="stylesheet" type="text/css" href="/static/rel/dojo/
resources/dojo.css" media="screen">
    <link rel="stylesheet" type="text/css" href="/static/foo.css"
media="screen">
    <link rel="icon" href="/static/images/favicon.ico" type="image/x-
icon">
    <link rel="shortcut icon" href="/static/images/favicon.ico"
type="image/x-icon">
  </head>
  <body id="home" class="tundra">
  </body>
</html>

Am I doing something wrong ?

regards
Roberto


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
b...@redivi.com  
View profile  
 More options Nov 21 2007, 3:00 am
From: b...@redivi.com
Date: Wed, 21 Nov 2007 00:00:25 -0800
Local: Wed, Nov 21 2007 3:00 am
Subject: Re: [mochiweb] Re: HTML parser
That looks right, I only tested the tokenizer, not the tree parser
(which uses the tokens as input). The fix is probably trivial, most
likely it just doesn't handle comments yet. The fact that it's
conditional doesn't matter, comments are tokenized correctly.

On 11/20/07, Roberto Saccon <rsac...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bob Ippolito  
View profile  
 More options Nov 21 2007, 3:57 am
From: "Bob Ippolito" <b...@redivi.com>
Date: Wed, 21 Nov 2007 00:57:44 -0800
Local: Wed, Nov 21 2007 3:57 am
Subject: Re: [mochiweb] Re: HTML parser
That's what it was, looks like it works fine now in r18. I used that
document as a test.

On 11/21/07, b...@redivi.com <b...@redivi.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roberto Saccon  
View profile  
 More options Nov 21 2007, 4:50 am
From: Roberto Saccon <rsac...@gmail.com>
Date: Wed, 21 Nov 2007 01:50:49 -0800 (PST)
Local: Wed, Nov 21 2007 4:50 am
Subject: Re: HTML parser
Great, thanks.

On Nov 21, 6:57 am, "Bob Ippolito" <b...@redivi.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google