Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

On the possibility of using the HTML5 parser from C++

66 views
Skip to first unread message

JDE

unread,
Feb 14, 2013, 10:48:27 AM2/14/13
to
Dear all:

Sorry if this is not the right group but it seemed to me the most
appropriate. If not, could please someone redirect me to the right one?

My question is about the possibility of using the library contained
in the mozilla global tree under parser/htmlparser to parse a HTML5 file
(local file, not downloaded) from a C++ program and getting the DOM
tree so that my program can transform the HTML file.

I have succesfully built Firefox, and consequently all the files in
parser/htmlparser. My questions are:

-In which library have their object code been included? Is it possible to
link an independet program that uses it?

-Is there any documentation from the author(s) of the library (apart from
the comments in the files themselves) the document the API?

Thank you very much in advance for your responses.

Juan

Boris Zbarsky

unread,
Feb 14, 2013, 11:32:38 AM2/14/13
to
On 2/14/13 10:48 AM, JDE wrote:
> My question is about the possibility of using the library contained
> in the mozilla global tree under parser/htmlparser to parse a HTML5 file
> (local file, not downloaded) from a C++ program and getting the DOM
> tree so that my program can transform the HTML file.

First of all, parser/htmlparser is not an HTML5 parser. It's the old
tagsoup parser. The HTML5 parser is in parser/html.

Second, neither one is really designed as a standalone drop-in parser,
though the HTML5 parser is maybe closer to it...

> -In which library have their object code been included?

libxul.

> Is it possible to link an independet program that uses it?

Yes, probably, but you'll be pulling in all of libxul.

You may want to talk to Henri Sivonen about using the translate-to-C++
bits or http://about.validator.nu/htmlparser/ as part of your project...
that's basically what parser/html is.

-Boris
0 new messages