Myk Melez wrote:
> Folks (particularly extension developers) regularly ask for a way to
> parse HTML into a document object, which is currently hard and hacky to do.
So as I see it, the steps to get this working are:
1) Decide what the problem we're solving is. Specifically, how should
noscript, noframes, and such be parsed in these documents? Keep in mind that
depending on user settings (like whether script is enabled) we create different
DOMs from the same source.
2) Decide what the plan is for charsets (currently we depend on having a
docshell to handle charset autodetect and in some cases <meta> tags, because we
have to throw away the document and reparse).
3) Go through the HTML content sink and HTML document, and make sure all the
places that use the docshell or window can survive without one.
4) Do whatever we decided to do for charsets.
5) Make DOMParser parse HTML.
> 1. Will things get better in Gecko 1.9/Firefox 3 (i.e. are there
> concrete plans or promising developments in this area)?
I'm not aware of significant changes in this area since 1.8, and I'm not sure
anyone is working on this actively. I strongly suspect that given our existing
code, once item #1 above is sorted out handling item #3 and item #5 should not
be that bad -- a few days work at most. Items #2 and #4 I'm really not sure
about; I guess in large part it depends on what we decide to do about #2.