Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Why can't Firefox parse HTML?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Expand all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Matthew Gertner  
View profile  
 More options May 17 2005, 9:37 am
Newsgroups: netscape.public.mozilla.dom
From: "Matthew Gertner" <matt...@acepoint.cz>
Date: 17 May 2005 06:37:37 -0700
Local: Tues, May 17 2005 9:37 am
Subject: Why can't Firefox parse HTML?
I stumbled on a previous thread in this group:
http://groups-beta.google.com/group/netscape.public.mozilla.dom/brows...
which claims that it is impossible to create a new HTML DOM document
from a Firefox script without displaying it in a new window. This means
that HTML screenscraping using XMLHttpRequest is not possible.

In a fit of pique I ranted about this on my blog
(http://www.allpeers.com/blog/?p=136). I was trying to be funny, but
the issue is serious. I'm probably missing something, but can someone
explain to me why the appropriate interfaces are not exposed to
scripters using XPCOM?

Cheers,
Matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Boris Zbarsky  
View profile  
 More options May 17 2005, 11:16 am
Newsgroups: netscape.public.mozilla.dom
From: Boris Zbarsky <bzbar...@mit.edu>
Date: Tue, 17 May 2005 10:16:48 -0500
Local: Tues, May 17 2005 11:16 am
Subject: Re: Why can't Firefox parse HTML?

Matthew Gertner wrote:
> In a fit of pique I ranted about this on my blog
> (http://www.allpeers.com/blog/?p=136). I was trying to be funny, but
> the issue is serious. I'm probably missing something, but can someone
> explain to me why the appropriate interfaces are not exposed to
> scripters using XPCOM?

Because HTML content model construction is so tied to having a window.  As one
simple example, it assumes the existence of a window it can reload to handle
charset autodetection and <meta> charset declarations that are not in the first
chunk of data we get from the document.  For XML this is not an issue, of
course, since the problem simply cannot arise.

There are other issues; for example the HTML parser needs the window to find out
whether scripts and frames are enabled (for parsing <noscript> and <noframes>
tags).  This is not an issue in XML, again, because the _parsing_ doesn't depend
on anything.  In HTML it does.  And since frames and scripts can be
enabled/disabled on a per-window basis, this is a bit of a problem.

Some work has been done to make the parsing not require a window, but a lot more
needs to be done, especially if people want it to work like it would with a
window around.

-Boris


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew Gertner  
View profile  
 More options May 18 2005, 12:49 pm
Newsgroups: netscape.public.mozilla.dom
From: "Matthew Gertner" <matt...@acepoint.cz>
Date: 18 May 2005 09:49:44 -0700
Local: Wed, May 18 2005 12:49 pm
Subject: Re: Why can't Firefox parse HTML?
Boris,

Many thanks for the reply. I understand the issue much better now. Two
more questions:

1) You mention that "a lot more needs to be done." Is there an active
effort to break the remaining dependence of the HTML parser on the
existence of a window?
2) Wouldn't it be a viable workaround, in the meantime, to associate an
HTML document retrieved with XMLHttpRequest with an invisible window?

Matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Boris Zbarsky  
View profile  
 More options May 18 2005, 1:02 pm
Newsgroups: netscape.public.mozilla.dom
From: Boris Zbarsky <bzbar...@mit.edu>
Date: Wed, 18 May 2005 12:02:43 -0500
Local: Wed, May 18 2005 1:02 pm
Subject: Re: Why can't Firefox parse HTML?

Matthew Gertner wrote:
> 1) You mention that "a lot more needs to be done." Is there an active
> effort to break the remaining dependence of the HTML parser on the
> existence of a window?

Not very active right now, since Gecko is in 1.8 freeze, more or less.  There
may be more work on it in the 1.9 cycle.

> 2) Wouldn't it be a viable workaround, in the meantime, to associate an
> HTML document retrieved with XMLHttpRequest with an invisible window?

That would execute scripts in the document in question, load stylesheets, etc,
etc.  That seems undesirable (especially executing scripts).

-Boris


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew Gertner  
View profile  
 More options May 19 2005, 4:30 am
Newsgroups: netscape.public.mozilla.dom
From: "Matthew Gertner" <matt...@acepoint.cz>
Date: 19 May 2005 01:30:56 -0700
Local: Thurs, May 19 2005 4:30 am
Subject: Re: Why can't Firefox parse HTML?
Ok. Personally I think this functionality is important enough to merit
a short-term workaround involving an invisible window with scripts
disabled. I can't believe that this would be a huge programming effort.
At the same time, I can see how this approach could meet with
resistance since it's obviously a hack and might have other unintended
side effects.

Is there a Bugzilla report related to this that you know of? I had a
look around but couldn't find anything.

Cheers,
Matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Boris Zbarsky  
View profile  
 More options May 19 2005, 10:52 am
Newsgroups: netscape.public.mozilla.dom
From: Boris Zbarsky <bzbar...@mit.edu>
Date: Thu, 19 May 2005 09:52:25 -0500
Local: Thurs, May 19 2005 10:52 am
Subject: Re: Why can't Firefox parse HTML?

Matthew Gertner wrote:
> Ok. Personally I think this functionality is important enough to merit
> a short-term workaround involving an invisible window with scripts
> disabled.

Patches accepted....

> Is there a Bugzilla report related to this that you know of?

Not that I know of, no.

-Boris


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew Gertner  
View profile  
 More options May 19 2005, 12:01 pm
Newsgroups: netscape.public.mozilla.dom
From: "Matthew Gertner" <matt...@acepoint.cz>
Date: 19 May 2005 09:01:03 -0700
Local: Thurs, May 19 2005 12:01 pm
Subject: Re: Why can't Firefox parse HTML?

> Patches accepted....

Fair enough, I'll take a crack at it.

Cheers,
Matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »