Issue 2408 in v8: String.prototype.trim() trims 0x200b ZERO WIDTH SPACE

12 views
Skip to first unread message

codesite...@google.com

unread,
Nov 14, 2012, 9:17:02 AM11/14/12
to v8-...@googlegroups.com
Status: New
Owner: ----

New issue 2408 by erik...@google.com: String.prototype.trim() trims
0x200b ZERO WIDTH SPACE
http://code.google.com/p/v8/issues/detail?id=2408

V8 follows JSC in trimming the zero width space (0x200b) in the trim()
method. According to

https://bugs.webkit.org/show_bug.cgi?id=26590

this is following an early draft of the trim() spec, but in the ES5 spec
only a few characters are specified apart from the Unicode space category
(which does not include the zero width space).

Firefox follows ES5 on this one.

Repro:

alert("\u200b".trim().length); // Should alert "1".

V8 also strips 0x85, the NEL (next line) character. It is alone in
this,but 0x85 is not mentioned in runtime.cc so it must be from the unicode
tables, which may be out of date.

Search for 200b in runtime.cc.

codesite...@google.com

unread,
Nov 14, 2012, 9:18:02 AM11/14/12
to v8-...@googlegroups.com
Updates:
Status: Assigned
Owner: mstar...@chromium.org
Cc: christia...@gmail.com
Labels: ES5

Comment #1 on issue 2408 by erik.corry: String.prototype.trim() trims
(No comment was entered for this change.)

codesite...@google.com

unread,
Nov 14, 2012, 10:11:42 AM11/14/12
to v8-...@googlegroups.com
Updates:
Labels: Type-Bug Priority-Medium

Comment #2 on issue 2408 by mstar...@chromium.org:
String.prototype.trim() trims 0x200b ZERO WIDTH SPACE
http://code.google.com/p/v8/issues/detail?id=2408

So I checked the Unicode 6.2 character database and the following
characters belong to the "Zs" category.

0020;SPACE;Zs;0;WS;;;;;N;;;;;
00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;;
1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;;;;;
180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;WS;;;;;N;;;;;
2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;;
2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;;
2002;EN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2003;EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2004;THREE-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2005;FOUR-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2006;SIX-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
2008;PUNCTUATION SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
2009;THIN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
200A;HAIR SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
205F;MEDIUM MATHEMATICAL SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;

On top of that ES5.1 specifies the following characters explicitly to be
WhiteSpace or LineTerminator.

0009, Tab
000A, Line Feed
000B, Vertical Tab
000D, Carriage Return
000C, Form Feed
0020, Space
00A0, No-break space
2028, Line separator
2029, Paragraph separator
FEFF, Byte Order Mark

I still need to figure out where the NEL character is coming from.

codesite...@google.com

unread,
Nov 14, 2012, 10:25:44 AM11/14/12
to v8-...@googlegroups.com

Comment #3 on issue 2408 by mstar...@chromium.org:
String.prototype.trim() trims 0x200b ZERO WIDTH SPACE
http://code.google.com/p/v8/issues/detail?id=2408

Yep, both Unicode 6.1 (which we are currently based on) and Unicode 6.2
(the most recent) mark 0x0085 as having the "White_Space" property but not
being in the "Zs" category. So actually the problem is that our Unicode
tables are based on properties instead of categories. But 0x0085 is the
only character where that's a problem. So the quick fix is to just special
case the for String.prototype.trim() as we already do.

codesite...@google.com

unread,
Feb 7, 2014, 8:38:24 AM2/7/14
to v8-...@googlegroups.com
Updates:
Status: Fixed

Comment #4 on issue 2408 by yan...@chromium.org: String.prototype.trim()
Fixed in r19196.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

codesite...@google.com

unread,
Feb 10, 2014, 3:25:04 AM2/10/14
to v8-...@googlegroups.com

Comment #5 on issue 2408 by mathi...@opera.com: String.prototype.trim()
Fix was reverted: https://code.google.com/p/v8/source/detail?r=19199

codesite...@google.com

unread,
Feb 10, 2014, 3:31:29 AM2/10/14
to v8-...@googlegroups.com
Updates:
Status: Assigned
Owner: yan...@chromium.org

Comment #6 on issue 2408 by yan...@chromium.org: String.prototype.trim()
Right. Improved CL coming up.

codesite...@google.com

unread,
Feb 10, 2014, 7:52:43 AM2/10/14
to v8-...@googlegroups.com
Updates:
Status: Fixed

Comment #7 on issue 2408 by yan...@chromium.org: String.prototype.trim()
Fixed in r19222.
Reply all
Reply to author
Forward
0 new messages