Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
parsing HTML into a document object in Fx3
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Myk Melez  
View profile  
 More options Nov 16 2006, 3:16 pm
Newsgroups: mozilla.dev.tech.layout
From: Myk Melez <m...@mozilla.org>
Date: Thu, 16 Nov 2006 12:16:43 -0800
Local: Thurs, Nov 16 2006 3:16 pm
Subject: parsing HTML into a document object in Fx3
Folks (particularly extension developers) regularly ask for a way to
parse HTML into a document object, which is currently hard and hacky to do.

bzbarsky suggested last year that things may get better in Gecko 1.9
[1], and shaver recently started a wiki page on the subject [2].

My questions are:

1. Will things get better in Gecko 1.9/Firefox 3 (i.e. are there
concrete plans or promising developments in this area)?

2. If not, is it worth turning the MicrosummaryResource object [3],
which does this (hackily, but perhaps as well as currently possible),
into an XPCOM component usable by other code?

[1]
http://groups-beta.google.com/group/netscape.public.mozilla.dom/msg/a...

[2] http://developer.mozilla.org/en/docs/Parsing_HTML_From_Chrome

[3]
http://lxr.mozilla.org/mozilla/source/browser/components/microsummari...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Boris Zbarsky  
View profile  
 More options Nov 16 2006, 7:15 pm
Newsgroups: mozilla.dev.tech.layout
From: Boris Zbarsky <bzbar...@mit.edu>
Date: Thu, 16 Nov 2006 18:15:38 -0600
Local: Thurs, Nov 16 2006 7:15 pm
Subject: Re: parsing HTML into a document object in Fx3

Myk Melez wrote:
> Folks (particularly extension developers) regularly ask for a way to
> parse HTML into a document object, which is currently hard and hacky to do.

So as I see it, the steps to get this working are:

1)  Decide what the problem we're solving is.  Specifically, how should
noscript, noframes, and such be parsed in these documents?  Keep in mind that
depending on user settings (like whether script is enabled) we create different
DOMs from the same source.

2)  Decide what the plan is for charsets (currently we depend on having a
docshell to handle charset autodetect and in some cases <meta> tags, because we
have to throw away the document and reparse).

3)  Go through the HTML content sink and HTML document, and make sure all the
places that use the docshell or window can survive without one.

4)  Do whatever we decided to do for charsets.

5)  Make DOMParser parse HTML.

> 1. Will things get better in Gecko 1.9/Firefox 3 (i.e. are there
> concrete plans or promising developments in this area)?

I'm not aware of significant changes in this area since 1.8, and I'm not sure
anyone is working on this actively.  I strongly suspect that given our existing
code, once item #1 above is sorted out handling item #3 and item #5 should not
be that bad -- a few days work at most.  Items #2 and #4 I'm really not sure
about; I guess in large part it depends on what we decide to do about #2.

-Boris


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Myk Melez  
View profile  
 More options Nov 21 2006, 7:58 pm
Newsgroups: mozilla.dev.tech.layout
From: Myk Melez <m...@mozilla.org>
Date: Tue, 21 Nov 2006 16:58:25 -0800
Local: Tues, Nov 21 2006 7:58 pm
Subject: Re: parsing HTML into a document object in Fx3

Boris Zbarsky wrote:
> Myk Melez wrote:
>> Folks (particularly extension developers) regularly ask for a way to
>> parse HTML into a document object, which is currently hard and hacky
>> to do.

> So as I see it, the steps to get this working are:

Ok, I posted your comments to bug 102699, and I also requested
blocking1.9 on the bug, since it seems to me that Firefox's microsummary
service would really benefit from it, not to mention extension authors
and other Gecko consumers.

I also filed a dependent bug 361449 to have the microsummary service use
DOMParser instead of hidden iframes to parse HTML once DOMParser can do
so.  And I added a comment to bug 102699 about potentially turning
MicrosummaryResource into an XPCOM component if that bug doesn't get
fixed in Gecko 1.9.

-myk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Boris Zbarsky  
View profile  
 More options Nov 21 2006, 11:25 pm
Newsgroups: mozilla.dev.tech.layout
From: Boris Zbarsky <bzbar...@mit.edu>
Date: Tue, 21 Nov 2006 22:25:17 -0600
Local: Tues, Nov 21 2006 11:25 pm
Subject: Re: parsing HTML into a document object in Fx3

Myk Melez wrote:
> Ok, I posted your comments to bug 102699, and I also requested
> blocking1.9 on the bug, since it seems to me that Firefox's microsummary
> service would really benefit from it, not to mention extension authors
> and other Gecko consumers.

Right.  We just need to make some decisions here...

> I also filed a dependent bug 361449 to have the microsummary service use
> DOMParser instead of hidden iframes to parse HTML once DOMParser can do
> so.  And I added a comment to bug 102699 about potentially turning
> MicrosummaryResource into an XPCOM component if that bug doesn't get
> fixed in Gecko 1.9.

I don't think the "find some random chrome window and parse in an iframe in
there" approach is really something we want to turn into an "XPCOM component"...
    For one thing, it doesn't work if no window is open (think Mac).

-Boris


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »