Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: A HTML Parser Bug?

1 view
Skip to first unread message

Gisle Aas

unread,
Sep 9, 2013, 5:45:17 PM9/9/13
to Michael Song, libwww
It's quite a lot of effort to try to distil what's wrong from your report.  Can you try to reduce the HTML to the minimal amount of code that still show the claimed difference between Firefox and HTML::Parser.  Sample code driving the parser would also be helpful.

Regards,
Gisle


On Sun, Sep 8, 2013 at 4:50 PM, Michael Song <michael...@gmail.com> wrote:
Hell All,
I've attached my html source code to demonstrate a incompatible parsing behavior between Parser and Firefox firebug
If you open the below attached file with Firefox firebug, you will see
<div class="gd-grid-6 product-pricing">  is inside <div class="listing-page-bucket"> enclosure
but when you parse it, the above relationship does not in the tree,
Is there anyway I can get around this problem?


Thanks






Michael Song

unread,
Sep 16, 2013, 1:32:38 PM9/16/13
to lib...@perl.org, Gisle Aas
HI Gisle,
I figured out the root cause of the problem
this line
<META content=3.9 itemprop="rating">
in the html body will throw a START Event to perl

but this line is without END event and consequently causes TreeBuilder to start a new branch.
my quick fix is to ignore the START event if tag is 'meta'. The code hparser.c just works fine as i stepped through the code.

A future proof fix will require adding some logic on TreeBuilder to close any open closures if its parent is closing.

Thanks

 






0 new messages