Issue 41 in fizzler: Class selector return duplicated node collection

10 views
Skip to first unread message

fiz...@googlecode.com

unread,
Nov 18, 2010, 3:04:20 PM11/18/10
to fizzler...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 41 by aprilkacau: Class selector return duplicated node collection
http://code.google.com/p/fizzler/issues/detail?id=41

What steps will reproduce the problem?
1. Load this : http://read.mangashare.com/20th-Century-Boys to a
HtmlDocument
2. do this : doc.DocumentNode.QuerySelectorAll("table.datalist >
tr.datarow")
3.or this : doc.DocumentNode.QuerySelectorAll(".datalist > .datarow")

What is the expected output? What do you see instead?
I expect to get a node collection without any duplication, because with
just HTMLAgilityPack it does exactly that, although it's more difficult. I
see a lot of duplicated node on the returned collection.


What version of the product are you using? On what operating system?
Fizzler 1.0.0, Win7 x64, VS 2010

Attachments:
Untitled.png 17.3 KB

fiz...@googlecode.com

unread,
Nov 18, 2010, 6:57:05 PM11/18/10
to fizzler...@googlegroups.com

Comment #1 on issue 41 by azizatif: Class selector return duplicated node
collection
http://code.google.com/p/fizzler/issues/detail?id=41

This issue has not been successfully reproduced. A simple and initial test
in IronPython interpreter using Fizzler shows that node selection between
HtmlAgilityPack and Fizzler produce identical count of nodes:

IronPython 2.6 (2.6.10920.0) on .NET 2.0.50727.4952
Type "help", "copyright", "credits" or "license" for more information.
>>> import clr
>>> clr.AddReference('HtmlAgilityPack')
>>> clr.AddReference('Fizzler')
>>> clr.AddReference('Fizzler.Systems.HtmlAgilityPack')
>>> from HtmlAgilityPack import HtmlDocument
>>> from Fizzler.Systems.HtmlAgilityPack.HtmlNodeSelection import *
>>> from System.Net import WebClient
>>> wc = WebClient()
>>> html = wc.DownloadString('http://read.mangashare.com/20th-Century-Boys')
>>> hd = HtmlDocument()
>>> hd.LoadHtml(html)
>>> doc = hd.DocumentNode
>>> rows = QuerySelectorAll(doc, '.datalist > .datarow')
>>> len(list(rows))
249
>>> doc.SelectNodes("//*[@class='datalist']/*[@class='datarow']").Count
249


If Fizzler was returning duplicates, its count be twice the number returned
using XPath-based selection in HtmlAgilityPack. Do you see an oversight in
the test or a misrepresentation of the problem?

Attached is the HTML source to http://read.mangashare.com/20th-Century-Boys
at the time the test was conducted.


Attachments:
20th-Century-Boys.html 149 KB

fiz...@googlecode.com

unread,
Nov 19, 2010, 12:39:16 AM11/19/10
to fizzler...@googlegroups.com

Comment #2 on issue 41 by aprilkacau: Class selector return duplicated node
collection
http://code.google.com/p/fizzler/issues/detail?id=41

Attached is the complete VS 2010 solution that I've been working on, maybe
there's something wrong with my code or something, but I would be greatly
appreciate it if you look at it.

Regards,

pilus

Attachments:
Zeus.rar 349 KB

fiz...@googlecode.com

unread,
Nov 19, 2010, 2:33:45 AM11/19/10
to fizzler...@googlegroups.com
Updates:
Status: Invalid

Comment #3 on issue 41 by azizatif: Class selector return duplicated node
collection
http://code.google.com/p/fizzler/issues/detail?id=41

The problem is how your ParseHTML2 is written. Here is a proposed fix:

void ParseHTML2() {
var doc = new Agi.HtmlDocument();
doc.LoadHtml(textBox2.Text);
var rows = from row in doc.DocumentNode
.QuerySelectorAll("table.datalist > tr.datarow")
let cells = row.QuerySelectorAll("td").ToArray()
select new object[] {
/* Date */ cells[0].InnerText,
/* Chapter */ cells[1].InnerText,
/* Scanlator */ cells[2].InnerText,
/* Link */ cells[3].QuerySelector("a")
.GetAttributeValue("href", null),
};
foreach(var row in rows)
dataGridView1.Rows.Add(row);
}

Closing this issue as invalid because it reports a bug in user code, not in
Fizzler.


fiz...@googlecode.com

unread,
Nov 19, 2010, 2:38:01 AM11/19/10
to fizzler...@googlegroups.com

Comment #4 on issue 41 by aprilkacau: Class selector return duplicated node
collection
http://code.google.com/p/fizzler/issues/detail?id=41

oh ok then, I'll try it later, thx for all your reply then ... :D

Reply all
Reply to author
Forward
0 new messages