Issue 45 in fizzler: HTML that doesn't parse correctly (but doesn't fail either)

fiz...@googlecode.com

unread,

Apr 6, 2011, 3:37:51 PM4/6/11

to fizzler...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 45 by portman....@gmail.com: HTML that doesn't parse correctly
(but doesn't fail either)
http://code.google.com/p/fizzler/issues/detail?id=45

I've been using Fizzler with great success, but today I came across some
HTML that silently failed to parse correctly.

I was selecting all of the <a> elements and noticed that one was being
ignored. Here are the repo steps:

1. Load the HTML from http://pastebin.com/T1Lsr6w6 (this is the "View
Source" for
http://www.diapers.com/product/productdetail.aspx?productid=16913)
2. Try to query the selector "#pdp"
3. Example code (assuming String html has the HTML above)

var doc = new HtmlDocument();
doc.LoadHtml(html);
var dom = doc.DocumentNode;
var pdpElement = dom.QuerySelector("#pdp");

What is the expected output? What do you see instead?
Expect pdpElement to be an HtmlNode of <a
href="http://c1.diapers.com/images/products/p/pg/pg-256_1z.jpg"
class="MagicZoomPlus" id="pdp" title="Pampers Sensitive Thick Baby Wipes
Refill 360ct." target="_blank">

Instead, it doesn't find a match.

What version of the product are you using? On what operating system?
Fizzler 0.9

Please provide any additional information below.

fiz...@googlecode.com

unread,

Apr 6, 2011, 4:00:06 PM4/6/11

to fizzler...@googlegroups.com

Comment #1 on issue 45 by portman....@gmail.com: HTML that doesn't parse

correctly (but doesn't fail either)
http://code.google.com/p/fizzler/issues/detail?id=45

I narrowed down the error slightly.

Using VisualFizzler (neat tool!) I can see that everything up to line 282
is selectable (for example "#siteNav").

But after line 283, I can't select anything (for example "div.topToolBox").

So the issue has to do with long lines like on line 283 of that pastebin
example.

fiz...@googlecode.com

unread,

Apr 6, 2011, 4:09:19 PM4/6/11

to fizzler...@googlegroups.com

Comment #2 on issue 45 by portman....@gmail.com: HTML that doesn't parse

correctly (but doesn't fail either)
http://code.google.com/p/fizzler/issues/detail?id=45

Sure enough, when I remove this line (#283) from the HTML, everything works
perfectly. It's pathologically long (51,553 characters in fact!!) so this
is probably a defect in one of the underlying framework classes that
Fizzler is using.

In the meantime, I've changed my code to chop long lines at 1024 characters
before handing off to Fizzler, and everything is working again. But you
still might want to investigate what precisely is going wrong on that long
line, so I'll keep the issue open.

fiz...@googlecode.com

unread,

Apr 7, 2011, 9:49:13 AM4/7/11

to fizzler...@googlegroups.com

Comment #3 on issue 45 by colinramsay1980: HTML that doesn't parse

correctly (but doesn't fail either)
http://code.google.com/p/fizzler/issues/detail?id=45

We're using HTMLAgilityPack so it's probably an issue there, but it should
be fairly trivial to swap out HTMLAgilityPack for another parser. It could
also be that this issue has been fixed by a more recent version of
HTMLAgilityPack than the one in the download.

fiz...@googlecode.com

unread,

Aug 23, 2015, 9:47:38 AM8/23/15

to fizzler...@googlegroups.com

Comment #4 on issue 45 by azizatif: HTML that doesn't parse correctly (but
doesn't fail either)
https://code.google.com/p/fizzler/issues/detail?id=45

This issue has been migrated to:
https://github.com/atifaziz/Fizzler/issues/45
The conversation continues there.
DO NOT post any further comments to the issue tracker on Google Code as it
is shutting down.
You can also just subscribe to the issue on GitHub to receive notifications
of any further development.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply all

Reply to author

Forward