is there an HTML parsing library that creates a DOM from a page?
Gï¿œnther
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
tagsoup produces trees ( http://hackage.haskell.org/package/tagsoup )
I use it with hxt ( http://hackage.haskell.org/package/hxt )
to tree-walk HTML pages.
J.W.
> Hi all,
>
> is there an HTML parsing library that creates a DOM from a page?
I've got the month of October off, and one of the things I've been
planning on working on is a compliant HTML5 parser for Haskell --
something which is sorely needed! I will ping the list back if/when I
get it finished.
G
--
Gregory Collins <gr...@gregorycollins.net>
I've heard that some of the existing HTML parsers in Haskell were
already HTML5 compliant (this topic came up when I was complaining
that there were some algorithms that you absolutely had to have
state for, because that was how they were specified.) I never
verified this assertion though.
Edward
> Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
>> I've got the month of October off, and one of the things I've been
>> planning on working on is a compliant HTML5 parser for Haskell --
>> something which is sorely needed! I will ping the list back if/when I
>> get it finished.
>
> I've heard that some of the existing HTML parsers in Haskell were
> already HTML5 compliant (this topic came up when I was complaining
> that there were some algorithms that you absolutely had to have
> state for, because that was how they were specified.) I never
> verified this assertion though.
If there's already a library which *correctly* parses html5 documents
into DOM trees, could someone please let me know so I can use it instead
of wasting a bunch of time writing one?
Thanks,
G
--
Gregory Collins <gr...@gregorycollins.net>
> As far as I know, Neil Mitchel's tagsoup[1] parses according to the
> HTML 5 parsing rules, but it just generates a list of Tags[2], so
> you'd have to build the DOM tree up from there. I personally have had
> great experience with tagsoup. It's even the core of HTML-scraping
> technology powering searchonce[3].
Yep, someone else wrote me privately to say this (that tagsoup respects
the html5 lexing rules). So I'll be using this as the basis of an html5
DOM parser. Stay tuned!
As far as I know, Neil Mitchel's tagsoup[1] parses according to the
HTML 5 parsing rules, but it just generates a list of Tags[2], so
you'd have to build the DOM tree up from there. I personally have had
great experience with tagsoup. It's even the core of HTML-scraping
technology powering searchonce[3].
Michael
[1] http://hackage.haskell.org/package/tagsoup
[2] http://hackage.haskell.org/packages/archive/tagsoup/0.11.1/doc/html/Text-HTML-TagSoup.html#t:Tag
[3] http://www.search-once.com/
Thanks, Neil
2010/10/7 Gregory Collins <gr...@gregorycollins.net>: