New issue 41 by aprilkacau: Class selector return duplicated node collection
http://code.google.com/p/fizzler/issues/detail?id=41
What steps will reproduce the problem?
1. Load this : http://read.mangashare.com/20th-Century-Boys to a
HtmlDocument
2. do this : doc.DocumentNode.QuerySelectorAll("table.datalist >
tr.datarow")
3.or this : doc.DocumentNode.QuerySelectorAll(".datalist > .datarow")
What is the expected output? What do you see instead?
I expect to get a node collection without any duplication, because with
just HTMLAgilityPack it does exactly that, although it's more difficult. I
see a lot of duplicated node on the returned collection.
What version of the product are you using? On what operating system?
Fizzler 1.0.0, Win7 x64, VS 2010
Attachments:
Untitled.png 17.3 KB
This issue has not been successfully reproduced. A simple and initial test
in IronPython interpreter using Fizzler shows that node selection between
HtmlAgilityPack and Fizzler produce identical count of nodes:
IronPython 2.6 (2.6.10920.0) on .NET 2.0.50727.4952
Type "help", "copyright", "credits" or "license" for more information.
>>> import clr
>>> clr.AddReference('HtmlAgilityPack')
>>> clr.AddReference('Fizzler')
>>> clr.AddReference('Fizzler.Systems.HtmlAgilityPack')
>>> from HtmlAgilityPack import HtmlDocument
>>> from Fizzler.Systems.HtmlAgilityPack.HtmlNodeSelection import *
>>> from System.Net import WebClient
>>> wc = WebClient()
>>> html = wc.DownloadString('http://read.mangashare.com/20th-Century-Boys')
>>> hd = HtmlDocument()
>>> hd.LoadHtml(html)
>>> doc = hd.DocumentNode
>>> rows = QuerySelectorAll(doc, '.datalist > .datarow')
>>> len(list(rows))
249
>>> doc.SelectNodes("//*[@class='datalist']/*[@class='datarow']").Count
249
If Fizzler was returning duplicates, its count be twice the number returned
using XPath-based selection in HtmlAgilityPack. Do you see an oversight in
the test or a misrepresentation of the problem?
Attached is the HTML source to http://read.mangashare.com/20th-Century-Boys
at the time the test was conducted.
Attachments:
20th-Century-Boys.html 149 KB
Attached is the complete VS 2010 solution that I've been working on, maybe
there's something wrong with my code or something, but I would be greatly
appreciate it if you look at it.
Regards,
pilus
Attachments:
Zeus.rar 349 KB
Comment #3 on issue 41 by azizatif: Class selector return duplicated node
collection
http://code.google.com/p/fizzler/issues/detail?id=41
The problem is how your ParseHTML2 is written. Here is a proposed fix:
void ParseHTML2() {
var doc = new Agi.HtmlDocument();
doc.LoadHtml(textBox2.Text);
var rows = from row in doc.DocumentNode
.QuerySelectorAll("table.datalist > tr.datarow")
let cells = row.QuerySelectorAll("td").ToArray()
select new object[] {
/* Date */ cells[0].InnerText,
/* Chapter */ cells[1].InnerText,
/* Scanlator */ cells[2].InnerText,
/* Link */ cells[3].QuerySelector("a")
.GetAttributeValue("href", null),
};
foreach(var row in rows)
dataGridView1.Rows.Add(row);
}
Closing this issue as invalid because it reports a bug in user code, not in
Fizzler.
oh ok then, I'll try it later, thx for all your reply then ... :D