Issues 1520, 314, 960 innerText & innerHTML [WAS: Issue 1520 - DOM.setInnerHTML is incorrect on IE]

71 views
Skip to first unread message

Fred Sauer

unread,
Feb 7, 2008, 1:29:56 PM2/7/08
to Joel Webber, Matthew Mastracci, Google-Web-Tool...@googlegroups.com

On Feb 7, 2008 11:04 AM, Matthew Mastracci <mmas...@gmail.com> wrote:

FWIW, I think that GWT should guarantee a sane behaviour if and only
if we can guarantee that it's possible to offer it without a
significant hit across all browsers.  I'd say that for methods with
varying behaviour, principal of least surprise (as noted by fred) is
key.

For these particular bugs:

 - 1520: LGTM, the IE solution is nearly as optimal as raw innerHTML

In addition, IMO, maximizing cross browser consistency is a big deal.

 

- 314: I'd still rather see an append method on label vs.
guaranteeing roundtripping on innerText/innerHTML.  It's certainly not
a pattern we should encourage (it's probably O(N^2) or worse for most
browsers).

I don't mind backing away from roundtrip desires. However, I still see benefits in maximizing cross-browser consistency.

I still think removing the special IE version of getInnerText and just using a common one that is much more consistent across browsers is a good thing.

@Joel: Could we change the issue subject on 314 to "inconsistent getInnerText() results in IE" ? Or should I open a new issue for that and close 314?


 
- 960: Agreed that IE is the exception - would using createTextNode()
fix this?  I think that innerText should set the node's text content
directly, as if you had placed the text escaped in a <span> element.
If you want non-breaking space, use the hex version, \u00a0, in your
strings.

Thanks, Matthew. I attached a patch to 960 which reverts IE to use the existing createTextNode() solution. The patch also simplifies the setInnerText's common implementation.

--
Fred Sauer
fr...@allen-sauer.com

Matthew Mastracci

unread,
Feb 7, 2008, 2:11:25 PM2/7/08
to Fred Sauer, Joel Webber, Google-Web-Tool...@googlegroups.com
Fred Sauer wrote:
> I don't mind backing away from roundtrip desires. However, I still see
> benefits in maximizing cross-browser consistency.
>
> I still think removing the special IE version of getInnerText and just
> using a common one that is much more consistent across browsers is a
> good thing.
>
> @Joel: Could we change the issue subject on 314 to "*inconsistent
> getInnerText() results in IE*" ? Or should I open a new issue for that
> and close 314?
Thinking more about it, I agree with Fred that IE should be using the
same recursive innerText method that the rest of the browsers are
using. It's guaranteed to be slow on some browsers right now, so
developers shouldn't be using it on more than simple elements. It
should be reasonably fast on elements that have simple textnode
children, regardless.

The API consistency across browsers does seem important enough to
warrant this change.

Matt.

Sam Gross

unread,
Feb 7, 2008, 5:19:31 PM2/7/08
to Google-Web-Tool...@googlegroups.com, Fred Sauer, Joel Webber, Matthew Mastracci
getInnerText (issue 314):
I don't think using the recursive getInnerText in IE6 will achieve consistency.  IE6 clobbers the whitespace of statically created HTML elements, as well as whitespace set in the innerHTML property.

For example for a page containing the static HTML:
<span>abc     def</span>

The recursive getInnerText on the span returns "abc def" in IE6, but it returns "abc     def" in other browsers.

I don't think DOM#getInnerText needs to be consistent, but I think it is important to make Label#getText consistent.

Can we use elem.textContent instead of the recursive method in Gecko/Safari/Opera?  It seems to work in Firefox 2, Safari 3, and Opera 9.25 at least.

setInnerText (issue 960):
Whitespace is not collapsed in IE6 when using the DOMImpl#setInnerText, although it doesn't translate spaces into &nbsp;.  I think this is because IE6 relies on clobbering whitespace before the text nodes are created.

I think adding elem.innerHTML = elem.innerHTML; after the line elem.appendChild($doc.createTextNode(text)); in IE6 only may resolve this.

This also makes it more difficult to create a consistent getInnerText.

Label#getText (issue 314)
I'm not sure of the best way to implement Label#getText because of the HTML subclass.  I think getText should return the same value that was passed to setText.  If the widget's text was set using HTML#setHTML, then getText should probably return a best-effort attempt using DOM#getInnerText.

-Sam

Matthew Mastracci

unread,
Feb 7, 2008, 5:54:05 PM2/7/08
to Google Web Toolkit Contributors
Comments inline. Thinking about it some more, I believe that the
guarantees and caveats that GWT should offer are:

- Text set via setInnerText() will be returned exactly as went in
when calling getInnerText() (solves the Label problem).

- HTML set via setInnerHTML() will preserve whitespace as best as
possible going in and will contain the same whitespace coming out.
Browsers may parse HTML differently, so developers should be cautious
to use only simple HTML if they expect results to match across the
board.

- HTML structure returned via getInnerHTML() will mostly resemble the
HTML set via setInnerHTML(), but no guarantees can be made about tag
casing, attribute casing, attribute order or other parsing/formatting
issues. The text content of the HTML returned will be identical to
the text content that was set.

On Feb 7, 3:19 pm, "Sam Gross" <colesb...@gmail.com> wrote:

> getInnerText (issue 314):
> I don't think using the recursive getInnerText in IE6 will achieve
> consistency.  IE6 clobbers the whitespace of statically created HTML
> elements, as well as whitespace set in the innerHTML property.

[...]

> I don't think DOM#getInnerText needs to be consistent, but I think it is
> important to make Label#getText consistent.

I think it's more for ensuring that whitespace is better preserved in
dynamic HTML. The patch that Fred has posted in 314 now correctly
maintains whitespace when setting "long space" as innerHTML.
There's certainly nothing we can do for the HTML parsed at page
download time and I believe it would be acceptable to document that
limitation of the API.

Fred's point of view has probably rubbed off on me. :)

> Can we use elem.textContent instead of the recursive method in
> Gecko/Safari/Opera?  It seems to work in Firefox 2, Safari 3, and
> Opera 9.25at least.

How does it behave with comments, scripts and style elements? I seem
to recall that it differs in some ways. The current DOM
implementation probably includes some of these items in getInnerText()
right now, but we might be able to do something about this.

> setInnerText (issue 960):
> Whitespace is not collapsed in IE6 when using the DOMImpl#setInnerText,
> although it doesn't translate spaces into &nbsp;.  I think this is because
> IE6 relies on clobbering whitespace before the text nodes are created.
>
> I think adding elem.innerHTML = elem.innerHTML; after the line
> elem.appendChild($doc.createTextNode(text)); in IE6 only may resolve this.
>
> This also makes it more difficult to create a consistent getInnerText.

I think Fred was aiming for innerText to preserve whitespace by using
createTextNode() instead of innerText's setter. Ideally, setInnerText
shouldn't be creating &nbsp; nodes to avoid surprising the developer
with unexpected characters (IMHO, at least). I'm leaning towards the
whitespace-preserving approach myself - I think I'd expect that the
text I call the method with will be the exact value of a text node
underneath the element.

As I mentioned in a previous note, if the developer wishes non-
breaking spaces to appear in the text, they can always use the unicode
escape (\u00a0) in Java or the hex escape (\xa0) in JS to put them in
there.

> Label#getText (issue 314)
> I'm not sure of the best way to implement Label#getText because of the HTML
> subclass.  I think getText should return the same value that was passed to
> setText.  If the widget's text was set using HTML#setHTML, then getText
> should probably return a best-effort attempt using DOM#getInnerText.

Since labels could be arbitrarily large, keeping a copy of the text
set via setText() could become an issue on pages with many labels and/
or large labels. I think it might be best for labels to use a "best
effort" approach to return the text, only guaranteeing 100% fidelity
if setText() was initially used.

>
> -Sam

Fred Sauer

unread,
Feb 8, 2008, 2:41:03 PM2/8/08
to Google-Web-Tool...@googlegroups.com
On Feb 7, 2008 3:54 PM, Matthew Mastracci <mmas...@gmail.com> wrote:

Comments inline.  Thinking about it some more, I believe that the
guarantees and caveats that GWT should offer are:

 - Text set via setInnerText() will be returned exactly as went in
when calling getInnerText() (solves the Label problem).

 - HTML set via setInnerHTML() will preserve whitespace as best as
possible going in and will contain the same whitespace coming out.
Browsers may parse HTML differently, so developers should be cautious
to use only simple HTML if they expect results to match across the
board.

 - HTML structure returned via getInnerHTML() will mostly resemble the
HTML set via setInnerHTML(), but no guarantees can be made about tag
casing, attribute casing, attribute order or other parsing/formatting
issues.  The text content of the HTML returned will be identical to
the text content that was set.

I agree.

Moreover, I see the demands on GWT coming doing to three basic requirements, in order:

  1st) Cross browser consistency
  2nd) Whitespace preservation in injected node content
  3rd) Whitespace preservation in extracted node content

I think we can accomplish all three goals, within reason. At a minimum, we need to achieve #1.

 
To give a couple of examples:

If I write:
    HTML#setHTML("   foo   ");
I expect to get in all four browsers:
    <div class='gwt-HTML'>   foo   </div>
and not this in IE:
    <div class='gwt-HTML'>foo</div>
because I might want to:
    DIV { white-space: pre; }
or:
    DIV { border: 1px solid red; display: inline; }



If I write:
    Label#setText("Answer to Life, the Universe, and Everything");
I expect to get in all four browsers:
    <div class='gwt-Label'>Answer to Life, the Universe, and Everything</div>
and not this version in IE:
    <div&nbsp;class='gwt-Label'>Answer&nbsp;to&nbsp;Life,&nbsp;the&nbsp;Universe,&nbsp;and&nbsp;Everything</div>
because I might want my text to wrap.



On Feb 7, 3:19 pm, "Sam Gross" <colesb...@gmail.com> wrote:
> Can we use elem.textContent instead of the recursive method in
> Gecko/Safari/Opera?  It seems to work in Firefox 2, Safari 3, and
> Opera 9.25at least.

How does it behave with comments, scripts and style elements?  I seem
to recall that it differs in some ways.  The current DOM
implementation probably includes some of these items in getInnerText()
right now, but we might be able to do something about this.

Seems to work on Safari/Firefox. On opera textContent slurps up HTML comments.



> setInnerText (issue 960):
> Whitespace is not collapsed in IE6 when using the DOMImpl#setInnerText,
> although it doesn't translate spaces into &nbsp;.  I think this is because
> IE6 relies on clobbering whitespace before the text nodes are created.
>
> I think adding elem.innerHTML = elem.innerHTML; after the line
> elem.appendChild($doc.createTextNode(text)); in IE6 only may resolve this.
>
> This also makes it more difficult to create a consistent getInnerText.

I think Fred was aiming for innerText to preserve whitespace by using
createTextNode() instead of innerText's setter.  Ideally, setInnerText
shouldn't be creating &nbsp; nodes to avoid surprising the developer
with unexpected characters (IMHO, at least).  I'm leaning towards the
whitespace-preserving approach myself - I think I'd expect that the
text I call the method with will be the exact value of a text node
underneath the element.

Exactly. I don't like the unexpected &nbsp; in just one of the four browsers, and and I equally dislike loosing whitespace.



Sam Gross

unread,
Feb 8, 2008, 2:59:31 PM2/8/08
to Fred Sauer, google-web-tool...@googlegroups.com
[+gwtc]

Fred,

The patch attached to issue 960 doesn't fully work on IE6 for me. I'm testing the patch in trunk r1805 on IE 6.0.2900.2180. My test code is:

  public void onModuleLoad() {

    Element div = DOM.createElement("div");

    DOM.appendChild(RootPanel.getBodyElement(), div);

    DOM.setInnerText(div, "<abc>      <def>");

  }


In IE6 the page displays:
<abc>      <def>

It is as if the <div> tag were treated like a <pre> tag - white space is preserved unexpectedly instead of collapsed.  Changing the white-space CSS attribute has no visible effect. Firefox and Safari behave as expected.

Adding the line


    Window.alert(DOM.getInnerHTML(div));


shows that IE6 is *not* translating spaces into non-breaking spaces.


My theory is that IE relies on clobbering white space before creating DOM elements, as opposed to collapsing white space when displaying the element.  If this is the case, it doesn't bode well for cross-browser consistency.

Thanks for catching Opera's peculiar behavior regarding textContent.

-Sam

On Feb 8, 2008 2:30 AM, Fred Sauer <fr...@allen-sauer.com> wrote:
Sam,

On Feb 7, 2008 3:19 PM, Sam Gross <cole...@gmail.com> wrote:
getInnerText (issue 314):
I don't think using the recursive getInnerText in IE6 will achieve consistency.  IE6 clobbers the whitespace of statically created HTML elements, as well as whitespace set in the innerHTML property.

For example for a page containing the static HTML:
<span>abc     def</span>

The recursive getInnerText on the span returns "abc def" in IE6, but it returns "abc     def" in other browsers.

There's not much we can do about the static elements, but we can address the innerHTML assignments (=issue 1520). Ad we AJAX being the name of the game, hopefully dynamic content is a much more common use case than static.




I don't think DOM#getInnerText needs to be consistent, but I think it is important to make Label#getText consistent.

I'd love to see Label#getText be consistent, but I'd also like for us to also maximize the consistency of DOM#getInnerText.


 
Can we use elem.textContent instead of the recursive method in Gecko/Safari/Opera?  It seems to work in Firefox 2, Safari 3, and Opera 9.25 at least.

Good idea. I agree that it seems to work on Safari/Gecko, but in Opera the HTML comments get pulled in as well (which was unexepected). I attached a new patch (issue314-getInnerText-r1805.patch) based on this and a couple of other changes here:
  http://code.google.com/p/google-web-toolkit/issues/detail?id=314#c10
 



setInnerText (issue 960):
Whitespace is not collapsed in IE6 when using the DOMImpl#setInnerText, although it doesn't translate spaces into &nbsp;.  I think this is because IE6 relies on clobbering whitespace before the text nodes are created.

Should I assume you mean *does* translate spaces into &nbsp;'....?

I think adding elem.innerHTML = elem.innerHTML; after the line elem.appendChild($doc.createTextNode(text)); in IE6 only may resolve this.

This also makes it more difficult to create a consistent getInnerText.

Setting either innerHTML/innerText (or outerHTML/outerText for that matter) clobbers whitespace in IE (with the possible exception of certain uses of PRE elements), so I don't think that's an option. I believe the patch (issue960-setInnerText-r1790.patch) on 960, comment 7 works:
  http://code.google.com/p/google-web-toolkit/issues/detail?id=960#c7

Or is it missing something in IE?




Label#getText (issue 314)
I'm not sure of the best way to implement Label#getText because of the HTML subclass.  I think getText should return the same value that was passed to setText.  If the widget's text was set using HTML#setHTML, then getText should probably return a best-effort attempt using DOM#getInnerText.

I'm okay with best effort in this case.


Since these three issues are really interrelated, I might propose applying all three (i.e. the latest patch on 314,960,1520) at once:
  issue314-getInnerText-r1805.patch
  issue960-setInnerText-r1790.patch
  issue1520-ie-setInnerHTML-null-check-r1790.patch

That seems to make it easier to run through some of these test cases.

Thanks--

Fred Sauer

unread,
Feb 8, 2008, 4:51:25 PM2/8/08
to Google Web Toolkit Contributors
Sam,

Thanks for the new test case. I expanded it a bit to:

  String[] ws = {"normal", "pre"};
  for (int i = 0; i < ws.length; i++) {

    Element div = DOM.createElement("div");
    DOM.setStyleAttribute(div, "whiteSpace", ws[i]);
    DOM.appendChild(RootPanel.getBodyElement(), div);
    DOM.setInnerText(div, "   <abc>      <def> " + ws[i]);
  }

The desired output is for the first to line to have whitespace collapsed, and the second to show expanded whitespace (but without resorting to &nbsp;):
<abc>_<def>_normal
___<abc>______<def>_pre


In IE using...
elem.innerHTML = '';
if (text != null) {
elem.appendChild($doc.createTextNode(text));
}

...I (correctly) get leading whitespace collapsed on the first line, but (incorrectly) no internal whitespace collapse...
<abc>______<def>_normal
___<abc>______<def>_pre



Using the innerHTML reassignment...
    elem.innerHTML = '';
    if (text != null) {
      elem.appendChild($doc.createTextNode(text));
      elem.innerHTML = elem.innerHTML;
    }

...(incorrectly) produces collapsed (lost) whitespace on both...
<abc>_<def>_normal
<abc>_<def>_pre




Using the old innerText implementation...
    elem.innerText = text;

...(incorrectly) produces expanded whitespace on both...
___<abc>______<def>_normal
___<abc>______<def>_pre




I even tried...
elem.innerHTML = '';
if (text != null) {
var re = new RegExp(/((\s+)|(\S+))/g);
var chunk;
while(chunk = re.exec(text)) {
elem.appendChild($doc.createTextNode(chunk[0]));
}
}

...but this does no better than the patch...
<abc>______<def>_normal
___<abc>______<def>_pre



In summary:
  • We can get leading & trailing whitespace behavior in IE to match the other browsers, which is better than what we have today
  • I'm at a loss as to how to deal with the internal whitespace at the moment, but maybe someone else reading along has another idea...?

Thanks
Fed
--
Fred Sauer
fr...@allen-sauer.com

Matthew Mastracci

unread,
Feb 8, 2008, 5:40:50 PM2/8/08
to Google Web Toolkit Contributors
Sam/Fred,

This is absolutely horrible. :( I didn't realize that IE treated
white-space like that - thanks for running those tests, guys.

I hacked a quick test to see if you can detect white-space at
runtime. It turns out that it behaves a little differently in quirks
vs. strict mode, but you can detect what IE will be doing with the
whitespace.

Quirks mode:

In IE6, setting "white-space: pre" on an whitespace-collapsing element
doesn't have any effect at runtime or parse time: whitespace is always
collapsed. elem.currentStyle.whiteSpace is "normal", regardless.

In quirks mode, <pre> and <textarea> elements have
elem.currentStyle.whiteSpace == "pre".

Strict mode.

In strict mode, setting white-space on a div (or any other element)
works as expected and elem.currentStyle.whiteSpace == "pre".

A div nested in a <pre> always has currentStyle.whiteSpace == "pre" in
strict or quirks mode. The whitespace in the div is preserved.

Based on this, it might be possible to have a runtime test in IE to
determine whether whitespace should go through the normalization
(innerHTML/innerText) or the preserving (createTextNode/<pre>-wrapped)
codepath.
>    - We can get leading & trailing whitespace behavior in IE to match the
>    other browsers, which is better than what we have today
>    - I'm at a loss as to how to deal with the internal whitespace at the
>    moment, but maybe someone else reading along has another idea...?
>
> Thanks
> Fed
>
> On Feb 8, 2008 12:59 PM, Sam Gross <colesb...@gmail.com> wrote:
>
>
>
> > [+gwtc]
>
> > Fred,
>
> > The patch attached to issue 960 doesn't fully work on IE6 for me. I'm
> > testing the patch in trunk r1805 on IE 6.0.2900.2180. My test code is:
>
> >   public void onModuleLoad() {
>
> >     Element div = DOM.createElement("div");
>
> >     DOM.appendChild(RootPanel.getBodyElement(), div);
>
> >     DOM.setInnerText(div, "<abc>      <def>");
>
> >   }
>
> > In IE6 the page displays:
>
> > <abc>      <def>
>
> > It is as if the <div> tag were treated like a <pre> tag - white space is
> > preserved unexpectedly instead of collapsed.  Changing the white-space CSS
> > attribute has no visible effect. Firefox and Safari behave as expected.
>
> > Adding the line
>
> >     Window.alert(DOM.getInnerHTML(div));
>
> > shows that IE6 is *not* translating spaces into non-breaking spaces.
>
> > I put my test at
> >http://web.mit.edu/sgross/www/com.google.gwt.sample.hello.Hello/Hello...
> > .
>
> > My theory is that IE relies on clobbering white space before creating DOM
> > elements, as opposed to collapsing white space when displaying the element.
> >  If this is the case, it doesn't bode well for cross-browser consistency.
>
> > Thanks for catching Opera's peculiar behavior regarding textContent.
>
> > -Sam
>
> > On Feb 8, 2008 2:30 AM, Fred Sauer <f...@allen-sauer.com> wrote:
>
> > > Sam,
>
> > > *  issue314-getInnerText-r1805.patch
> > >   issue960-setInnerText-r1790.patch
> > >   issue1520-ie-setInnerHTML-null-check-r1790.patch
> > > *
> > > That seems to make it easier to run through some of these test cases.
>
> > > Thanks--
>
> > > Fred Sauer
> > > f...@allen-sauer.com
>
> --
> Fred Sauer
> f...@allen-sauer.com

Fred Sauer

unread,
May 5, 2008, 2:15:09 AM5/5/08
to Google-Web-Tool...@googlegroups.com
Bringing this thread back to life before 1.5 is released as it would be a shame to persist white-space inconsistencies for another release...
--
Fred Sauer
fr...@allen-sauer.com

Ranxerox

unread,
May 14, 2008, 7:02:16 PM5/14/08
to Google Web Toolkit Contributors
This looks like what I'm seeing with the HTML class. It appears that:

new HTML(" something");

will throw away the space at the beginning of the string. It's quite
annoying.

Using 1.5 M2.
Reply all
Reply to author
Forward
0 new messages