HtmlAgilityPack Cooperation

58 views
Skip to first unread message

Jeff Klawiter

unread,
Nov 30, 2009, 5:13:27 PM11/30/09
to Fizzler
I just wanted to drop a line to the developers of fizzler. I've taken
over development of HAP and am interested in hearing your thoughts on
it, what you need in it and such.

The newest version now has some basic LINQ support and I'm working on
thinking up a fluent API.

Jake Scott

unread,
Dec 1, 2009, 5:58:59 AM12/1/09
to fiz...@googlegroups.com, tagn...@gmail.com
Does HAP have support for xhtml? I ended up having to to use SgmlReader to tidy up the document after modifying it with agility packss HtmlDocument.. which was a bit of a pain because SgmlReader is slooow :)












--

You received this message because you are subscribed to the Google Groups "Fizzler" group.
To post to this group, send email to fiz...@googlegroups.com.
To unsubscribe from this group, send email to fizzler+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fizzler?hl=en.



Atif Aziz

unread,
Dec 1, 2009, 2:05:25 PM12/1/09
to fiz...@googlegroups.com
Hi Jeff,

I'm one of main developers on Fizzler but I'm also on vacation right
now. :) If you can bear with a delay in response for another week or
so, I'll get in touch shortly after my return. Meanwhile good to see
HAP regaining life again.

- Atif

Sent from my iPhone

Atif Aziz

unread,
Dec 8, 2009, 6:18:47 PM12/8/09
to fiz...@googlegroups.com
Hi Jeff,
 
am interested in hearing your thoughts on
it, what you need in it and such.
 
It would be nice to see the following issue have a more elegant solution:
 
 
Although we've addressed it as fixed in Fizzler via a workaround (you'll find all the details in comments), it's something better handled through non-static options. It's even reported as a HAP issue (albeit closed); see issue 21782.
 
The newest version now has some basic LINQ support
 
I couldn't exactly find this. Could you point me in the right direction?
 
Thanks,
- Atif

Atif Aziz

unread,
Dec 15, 2009, 4:32:14 AM12/15/09
to Jeff Klawiter, fiz...@googlegroups.com
Hi Jeff,
 
The LINQ support is basically getting HAP updated to use generic collections instead of array lists and implementing IEnumerator. So now all the System.Linq extension methods are available.The new 1.4.0 Beta 2 has this. 

OK, I found this in the 1.4 release branch. Earlier I was expecting the trunk to represent the latest code and so never went looking up the branches.
 
I've also added a bunch of functions like Descendants, Elements modeled off of LINQ to XML. All these methods are basically yield returns to maintain compatibility with .NET 2.0. But with these functions and the update on the collections it is now easier to write LINQ queries against the node tree.
 
Nice. When HAP lacked such support, I went ahead and added it as extension methods. See HtmlNodeExtensions.cs. Do you have a target for a release?
 
One concern I have is that since HAP is lacking tests, is there anything helping you maintain backward compatibility through refactorings or know where you are consciously making breaking change?
 
configuring the parsing engine all around I do agree it needs to be more elegant. I've been mulling over a couple possible solutions. While yes you can pass in the configuration as a HASH table now it is unintuitive and ugly.
 
The biggest problem right now is not so much whether it is elegant or not but rather that it is statically configured. The first order of clean up would be to make it configurable per parsing. This will make it thread safe and more flexible.
 
I also think it might be useful and possible to use the DocType declarations and load defaults based on that.
 
I love this idea and two thumbs up for that! In fact, you could even have canned configurations hanging of a static class based on HTML versions, conformance and strictness and which you could just pass along to the parser. I think these two will be far more used than some fluent API for configuration.
 
I am probably going to change the default settings to include the most common complaints and questions. Since we are further down the road in HTML since HAP was originally written, adhering to HTML 3.2 is less of an issue. 
 
My only concern here would be breaking existing code out there unless you're calling this version 2 of HAP rather than 1.4.
 
- Atif

On 9 Dec 2009, at 00:37, Jeff Klawiter <tagn...@gmail.com> wrote:

Atif,

The LINQ support is basically getting HAP updated to use generic collections instead of array lists and implementing IEnumerator. So now all the System.Linq extension methods are available.The new 1.4.0 Beta 2 has this.

I've also added a bunch of functions like Descendants, Elements modeled off of LINQ to XML. All these methods are basically yield returns to maintain compatibility with .NET 2.0. But with these functions and the update on the collections it is now easier to write LINQ queries against the node tree. Though it's not as compact as using XPATH, xpath is just a pain to use sometimes. I did add a new property that will expose the direct XPATH to a node or attribute.

As for the form element and the configuring the parsing engine all around I do agree it needs to be more elegant. I've been mulling over a couple possible solutions. While yes you can pass in the configuration as a HASH table now it is unintuitive and ugly. I'd be more inclined to make it a bit more strongly typed and maybe add a fluent interface. Like
Parser.Tags.Add("form").CanOverLap().IsClosed();
Also possibly adding in xml config (or app.config) support.

I also think it might be useful and possible to use the DocType declarations and load defaults based on that.

I'm curious on what you think would be easier for handling the configuration of the parsing settings. I am probably going to change the default settings to include the most common complaints and questions. Since we are further down the road in HTML since HAP was originally written, adhering to HTML 3.2 is less of an issue.

--
Jeff Klawiter
http://nerdery.com/people/jk
tagn...@gmail.com
MCPD: Web, MCTS: Windows
Reply all
Reply to author
Forward
0 new messages