JavaScript Functions

423 views
Skip to first unread message

Bruce A. Julseth

unread,
Dec 26, 2008, 9:37:41 PM12/26/08
to
I've been developing in PHP and now want to learn the client side with
JavaScript..

One of the first problems I have come across is what string functions are
available. Right now I'm looking for a "trim" funciton. Is the one? If not,
how can I do it?

Also, when my browser (IE, FireFox, Safari, Opera) finds a syntext problem,
it just plain quits executing with no error message. Is there a "switch" I
need to turn on in my browers?

Thank you....


L.@canberra Trevor Lawrence

unread,
Dec 26, 2008, 11:42:18 PM12/26/08
to
"Bruce A. Julseth" <julebj...@bellsouth.net> wrote in message
news:rqg5l.13290$n_5....@bignews7.bellsouth.net...


Hmm, an intersting quetsion
I had a look at http://www.w3schools.com/jsref/jsref_obj_string.asp
and there does not appear to be a trim function

A nice long winded way to left trim is to search for the first occurrence of
a non blank character and substring from that point, e.g.
<script type="text/javascript">
var str=" Hello world!", i, x ;
for (i = 0; i < str.length; i++) {
if(str.substr(i,1) != " ")
{ x=i; i = str.length; }
}
document.write(str.substr(x))
</script>

It would be similar for a right trim, using a reverse search, but I am
having trouble doing it

In any case, the experts here will no doubt find a better way to do it

Re syntax errors,
In IE7 try clicking the error icon on the bottom left. It doesn't always
help a great deal though.
Firefox has a plug-in named Firebug. Look for it on the FF site.
--
Trevor Lawrence
Canberra
Web Site http://trevorl.mvps.org


SAM

unread,
Dec 27, 2008, 12:02:17 AM12/27/08
to
Le 12/27/08 3:37 AM, Bruce A. Julseth a écrit :

> I've been developing in PHP and now want to learn the client side with
> JavaScript..
>
> One of the first problems I have come across is what string functions are
> available. Right now I'm looking for a "trim" funciton. Is the one? If not,
> how can I do it?

reg Expressions ?

// delete/suppress blank characters in beginning and end

function trim(strg) {
return strg.replace(/^\s+|\s+$/g, '');
}

<https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Creating_a_Regular_Expression>

> Also, when my browser (IE, FireFox, Safari, Opera) finds a syntext problem,
> it just plain quits executing with no error message. Is there a "switch" I
> need to turn on in my browers?

Firefox : menu Tools / Errors Console
Safari : preferences / advanced / [X] active developpement menu
then : menu Developpement / Display console

--
sm

SAM

unread,
Dec 27, 2008, 12:06:39 AM12/27/08
to
Le 12/27/08 6:02 AM, SAM a écrit :

>
> // delete/suppress blank characters in beginning and end
>
> function trim(strg) {
> return strg.replace(/^\s+|\s+$/g, '');
> }


function trimLeft(strg) {
return strg.replace(/^\s+/, '');
}

function trimRight(strg) {
return strg.replace(/\s+$/, '');
}

--
sm

RobG

unread,
Dec 27, 2008, 2:52:00 AM12/27/08
to
"Bruce A. Julseth" <julebj...@bellsouth.net> wrote:
> I've been developing in PHP and now want to learn the client side with
>
> JavaScript..
>
> One of the first problems I have come across is what string functions
> are
> available.

The authoritative reference for properties and methods of built-in
objects (such as String) is the ECMA-262 specification.

Right now I'm looking for a "trim" funciton. Is the one?

Not a built-in function. .

If not,
> how can I do it?

There is one in the FAQ, which also has links to useful resources.


--
Rob

Gregor Kofler

unread,
Dec 27, 2008, 8:45:10 AM12/27/08
to
Bruce A. Julseth meinte:

> I've been developing in PHP and now want to learn the client side with
> JavaScript..
>
> One of the first problems I have come across is what string functions are
> available. Right now I'm looking for a "trim" funciton. Is the one? If not,
> how can I do it?

No. Besides it would be a method of the string (prototype) object.

function trim(yourString) {
return yourString.replace(/^\s+\s+$/, "");
}

> Also, when my browser (IE, FireFox, Safari, Opera) finds a syntext problem,
> it just plain quits executing with no error message. Is there a "switch" I
> need to turn on in my browers?

Do yourself a favour and get proper add-ons (or activate) them. FF has
Firebug, Opera has Dragonfly (Extras->(whatever that menu entry is
called in English)->Developer Tools), Safari has somewhat less capable
but still sufficient tools (activated somewhere in the options dialog,
being on Linux I can't test that right now).

Gregor

Thomas 'PointedEars' Lahn

unread,
Dec 27, 2008, 9:18:12 AM12/27/08
to
Gregor Kofler wrote:
> Bruce A. Julseth meinte:

>> One of the first problems I have come across is what string functions are
>> available. Right now I'm looking for a "trim" funciton. Is the one? If not,
>> how can I do it?
>
> No. Besides it would be a method of the string (prototype) object.
>
> function trim(yourString) {
> return yourString.replace(/^\s+\s+$/, "");
> }

Apparently you forgot a | between the two \s+, and the global flag. See the
FAQ entry.

>> Also, when my browser (IE, FireFox, Safari, Opera) finds a syntext problem,
>> it just plain quits executing with no error message. Is there a "switch" I
>> need to turn on in my browers?
>

> Do yourself a favour and get proper add-ons (or activate) them. [...]


> Opera has Dragonfly (Extras->(whatever that menu entry is called in

> English)->Developer Tools), [...]

Speaking of which, does anyone know what I can do about always getting a
"Select a runtime" message when entering, say, 1 in the Command Line pane of
Dragonfly's Script tab in Opera/9.63 (X11; Linux i686; U; en) Presto/2.1.1?
So far I have found nothing relevant in the Settings (button with tools
icon, lower right corner of the Dragonfly pane).


TIA

PointedEars

Thomas 'PointedEars' Lahn

unread,
Dec 27, 2008, 9:30:15 AM12/27/08
to
Thomas 'PointedEars' Lahn wrote:
> Gregor Kofler wrote:
>> Bruce A. Julseth meinte:
>>> Also, when my browser (IE, FireFox, Safari, Opera) finds a syntext problem,
>>> it just plain quits executing with no error message. Is there a "switch" I
>>> need to turn on in my browers?
>> Do yourself a favour and get proper add-ons (or activate) them. [...]
>> Opera has Dragonfly (Extras->(whatever that menu entry is called in
>> English)->Developer Tools), [...]
>
> Speaking of which, does anyone know what I can do about always getting a
> "Select a runtime" message when entering, say, 1 in the Command Line pane of
> Dragonfly's Script tab in Opera/9.63 (X11; Linux i686; U; en) Presto/2.1.1?
> So far I have found nothing relevant in the Settings (button with tools
> icon, lower right corner of the Dragonfly pane).

I think I've got it. For Scripts/Command Line to work properly, you have to
select a window/tab as global execution context under Scripts/Scripts first;
unlike Firebug, with Dragonfly the current tab is not automatically
selected. (I'd rather they changed that.)


PointedEars

Gregor Kofler

unread,
Dec 27, 2008, 10:20:05 AM12/27/08
to
Thomas 'PointedEars' Lahn meinte:

> Gregor Kofler wrote:
>> Bruce A. Julseth meinte:
>>> One of the first problems I have come across is what string functions are
>>> available. Right now I'm looking for a "trim" funciton. Is the one? If not,
>>> how can I do it?
>> No. Besides it would be a method of the string (prototype) object.
>>
>> function trim(yourString) {
>> return yourString.replace(/^\s+\s+$/, "");
>> }
>
> Apparently you forgot a | between the two \s+, and the global flag. See the
> FAQ entry.

Oops. Yes, I should re-read my posts...

Gregor

SAM

unread,
Dec 27, 2008, 11:42:59 AM12/27/08
to
Le 12/27/08 4:20 PM, Gregor Kofler a écrit :

> Thomas 'PointedEars' Lahn meinte:
>> Gregor Kofler wrote:
>>> Bruce A. Julseth meinte:
>>>> Right now I'm looking for a "trim" funciton.
>>>
>>> function trim(yourString) {
>>> return yourString.replace(/^\s+\s+$/, "");
>>> }
>>
>> Apparently you forgot a | between the two \s+, and the global flag.
>> See the
>> FAQ entry.
>
> Oops. Yes, I should re-read my posts...

And other ones ?
(I previously gave it while yours corrected is not yet complete)

<http://jibbering.com/faq/#trimString>

--
sm

kangax

unread,
Dec 27, 2008, 4:30:44 PM12/27/08
to
RobG wrote:

[...]

> If not,
>> how can I do it?
>
> There is one in the FAQ, which also has links to useful resources.
>
>

Perhaps, FAQ should mention that RegExp whitespace character class does
not conform to specification (see: 15.10.2.12) in some of the browsers
and that `trim` (as it is right now in the FAQ) will fail to remove some
of those characters.


A simple example would be:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title></title>
</head>
<body>
<script type="text/javascript">
(function(){
function trim(s) {
return s.replace(/^\s+|\s+$/g, '');
}
var s = ' \n\r\t\x0B\f\xA0 hello \xA0\n \r\t\f\x0B';
// should be "5"
document.write(trim(s).length);
})();
</script>
</body>
</html>

This fails in IE, Safari (although, fixed in webkit nightlies) and Chrome.

A workaround is, of course, simple:

var strip = (function(){
var wspClass = '[\\x09\\x0B\\x0C\\x20\\xA0\\x0A\\x0D\\u2028\\u2029]';
var leadingSpace = new RegExp('^' + wspClass + '+');
var trailingSpace = new RegExp(wspClass + '+$');
return function(s) {
return s.replace(leadingSpace, '').replace(trailingSpace, '');
}
})();


--
kangax

SAM

unread,
Dec 27, 2008, 5:52:20 PM12/27/08
to
Le 12/27/08 10:30 PM, kangax a écrit :

> RobG wrote:
>
> [...]
>
>> If not,
>>> how can I do it?
>>
>> There is one in the FAQ, which also has links to useful resources.
>>
>
> Perhaps, FAQ should mention that RegExp whitespace character class does
> not conform to specification (see: 15.10.2.12) in some of the browsers
> and that `trim` (as it is right now in the FAQ) will fail to remove some
> of those characters.

Is &nbsp; or &#160; or \xA0 a 'blank' (or white-space) character ?

My Firefox.3 and Opera.9 seem to think it is.
My iCab.4 and Safari.3 think it is not.

> var s = ' \n\r\t\x0B\f\xA0 hello \xA0\n \r\t\f\x0B';
> // should be "5"
> document.write(trim(s).length);

> fails in IE, Safari (although, fixed in webkit nightlies) and Chrome.

In IE ... I don't know.

> A workaround is, of course, simple:
>
> var strip = (function(){
> var wspClass = '[\\x09\\x0B\\x0C\\x20\\xA0\\x0A\\x0D\\u2028\\u2029]';
> var leadingSpace = new RegExp('^' + wspClass + '+');
> var trailingSpace = new RegExp(wspClass + '+$');
> return function(s) {
> return s.replace(leadingSpace, '').replace(trailingSpace, '');
> }
> })();

Rest to know if really we want to delete this unbreakable character ?

--
sm

L.@canberra Trevor Lawrence

unread,
Dec 27, 2008, 6:55:43 PM12/27/08
to
"Trevor Lawrence" <Trevor L.@Canberra> wrote in message
news:newscache$heqick$vo7$1...@news.grapevine.com.au...

While I appreciate the elegance of the RegExp solutions, there seemed to be
some problems with defining what is white space, so I wondered if the simple
functions below would do the job just as well

function ltrim(str) {
for (var i = 0; i < str.length; i++) {
if(str.substr(i,1)!=" ")
break;
}
return str.substr(i);
}

function rtrim(str) {
for (var i = str.length-1; i >= 0; i--) {
if(str.substr(i,1)!=" ")
break;
}
return str.substr(0,i+1);
}

function trim(str) {
return ltrim(rtrim(str));
}

They can be tested by
var stringToTrim = " Hello world! " ;
document.write("String: |" + stringToTrim + '|<br>' )
document.write("ltrim: |" + ltrim(stringToTrim) + '|<br>' )
document.write("rtrim: |" + rtrim(stringToTrim) + '|<br>' )
document.write("trim: |" + trim(stringToTrim) + '|<br>' )

I have placed the '|' delimiter at start and end of the string so that the
result can be clearly seen. This is what prints
String: | Hello world! |
ltrim: |Hello world! |
rtrim: | Hello world!|
trim: |Hello world!|

kangax

unread,
Dec 27, 2008, 7:22:30 PM12/27/08
to
SAM wrote:

[...]

>> Perhaps, FAQ should mention that RegExp whitespace character class
>> does not conform to specification (see: 15.10.2.12) in some of the
>> browsers and that `trim` (as it is right now in the FAQ) will fail to
>> remove some of those characters.
>
> Is &nbsp; or &#160; or \xA0 a 'blank' (or white-space) character ?

I was only talking about WhiteSpace (7.2) and LineTerminator (7.3)
productions. Former one includes '\xA0' and so, yes, it should be
matched by /\s/. I filed a bug with Chrome [1] but only after noticed
that nightly webkit does not exhibit this (so it's only a matter of
updating Chrome to the newer version)

[...]

[1] http://code.google.com/p/chromium/issues/detail?id=5206

Thomas 'PointedEars' Lahn

unread,
Dec 28, 2008, 4:20:33 AM12/28/08
to
"Trevor Lawrence" wrote:
> While I appreciate the elegance of the RegExp solutions, there seemed
> to be
> some problems with defining what is white space, so I wondered if the
> simple
> functions below would do the job just as well
> [...]

Dear green beginners,

simple functions like these is how this thing started about a decade
ago. Since they turned out to be inadequate (handle too few cases, are
inefficient from the outset, do not scale well), RegExp matching and
replacing was introduced.

That some ECMAScript implementations might not match all specified
white-space characters and line terminators with \s really is no reason
at all for us to prefer Wheel 0.1 that could only jump 1 cm (handle only
one white-space character, space, inefficiently aso.), because RegExp
allows us to define our own character classes.

Thank you in advance.


PointedEars

Dr J R Stockton

unread,
Dec 28, 2008, 7:51:40 AM12/28/08
to
In comp.lang.javascript message <4956b1a5$0$9375$ba4a...@news.orange.fr
>, Sat, 27 Dec 2008 23:52:20, SAM <stephanemor...@wanadoo.fr.in
valid> posted:

>Rest to know if really we want to delete this unbreakable character ?

The FAQ entry should list the characters that the simple code ought to
delete, and the known exceptions in reasonably common browsers. There,
"ought" could be according to 16262 or according to the majority of
usage.


The FAQ at Jibbering has not been changed for a month and is dated a
fortnight earlier. The auto-posted FAQ in this group is dated a month
earlier than that.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE7 FF2 Op9 Sf3
news:comp.lang.javascript FAQ <URL:http://www.jibbering.com/faq/index.html>.
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

SAM

unread,
Dec 28, 2008, 1:56:01 PM12/28/08
to
Le 12/28/08 12:55 AM, Trevor Lawrence a écrit :

>
> While I appreciate the elegance of the RegExp solutions, there seemed to be
> some problems with defining what is white space, so I wondered if the simple
> functions below would do the job just as well

Probably they search and eliminate simple white space : ' '
and that doesn't solve the problem to identify other spaces required in
a "trim" function (what to cut of ?).

function trimWhiteSpace(strg) {
return strg.replace(/^ +| +$/g, '');
}

document.write('['+trimWhiteSpace(" Hello world! ")+']');

--> [Hello world!]

function lTrimWS(strg) { return strg.replace(/^ +/, ''); }

function rTrimWS(strg) { return strg.replace(/ +$/, ''); }


If we refer to the PHP trim function(1) we would have to strip from
beginning and end the invisible following characters :
* " " (ASCII 32 (0x20)), an ordinary space.
* "\t" (ASCII 9 (0x09)), a tab.
* "\n" (ASCII 10 (0x0A)), a new line (line feed).
* "\r" (ASCII 13 (0x0D)), a carriage return.
* "\0" (ASCII 0 (0x00)), the NUL-byte.
* "\x0B" (ASCII 11 (0x0B)), a vertical tab.
whom unbreakable space is absent

what is supposed to almost get in reg expression(2) using: \s
witch would have to be equivalent with :

Gecko :
[\t\n\v\f\r
\u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]
(where \u00a0 is the unbreakable space)

Microsoft :
[\f\n\r\t\v] (and they forget ' ' !)


So we could have :

function trimer( strg, extendedBlanks ) {
var reg = '[\f\n\r\t\v ' + ( extendedBlanks? '\xa0' : '' ) + ']';
reg = new RegExp( '^' + reg + '+|' + reg + '+$', 'g' );
return strg.replace(reg,'');
}

And with the string :
var strng = ' \n\t \n \xa0 hello \xa0 \n \n';
We can get :
document.write('['+ trimer(strng) +']'); // [ hello ]
document.write('['+ trimer(strng,1) +']'); // [hello]

Tested : FF.3, Safari.3, Opera.9, iCab.4

(1)
<http://fr.php.net/manual/en/function.trim.php>
(2)
<http://msdn.microsoft.com/en-us/library/se61087k(VS.85).aspx>
<https://developer.mozilla.org/En/Core_JavaScript_1.5_Reference:Objects:RegExp>

--
sm

Dr J R Stockton

unread,
Dec 28, 2008, 1:45:46 PM12/28/08
to
In comp.lang.javascript message <newscache$us7kck$gbg$1...@news.grapevine.c
om.au>, Sat, 27 Dec 2008 23:55:43, Trevor Lawrence <Trevor@L.?.invalid>
posted:

>
>While I appreciate the elegance of the RegExp solutions, there seemed to be
>some problems with defining what is white space, so I wondered if the simple
>functions below would do the job just as well
>
>function ltrim(str) {
>for (var i = 0; i < str.length; i++) {
> if(str.substr(i,1)!=" ")
> break;
>}
>return str.substr(i);
>}

No point in coding like that. One can just modify the RegExp routines
in the FAQ by replacing "\s" with " " and changing "whitespace" to
"space(s) in the description."

AFAIK, all common browsers include in their "whitespace character = \s"
all the characters which normally need to be handled. The differing
characters can only appear if entered by a programmer, who can make
appropriate tests, or by an end user - and an end user who contrives to
enter an esoteric separator deserves whatever he gets.

The question about \xA0, however, is a good one that should be
considered for the FAQ entry.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)

Garrett Smith

unread,
Feb 12, 2009, 3:57:53 AM2/12/09
to
kangax wrote:
> SAM wrote:
>
> [...]
>
>>> Perhaps, FAQ should mention that RegExp whitespace character class
>>> does not conform to specification (see: 15.10.2.12) in some of the
>>> browsers and that `trim` (as it is right now in the FAQ) will fail to
>>> remove some of those characters.
>>

Good point. That should definitely be in the FAQ.

>> Is &nbsp; or &#160; or \xA0 a 'blank' (or white-space) character ?
>

Is this fixed in recent JScript in IE8?

Question: Why not:-

function trimString(s){
return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
}

?

> I was only talking about WhiteSpace (7.2) and LineTerminator (7.3)
> productions. Former one includes '\xA0' and so, yes, it should be
> matched by /\s/. I filed a bug with Chrome [1] but only after noticed
> that nightly webkit does not exhibit this (so it's only a matter of
> updating Chrome to the newer version)
>
> [...]
>
> [1] http://code.google.com/p/chromium/issues/detail?id=5206

http://code.google.com/p/chromium/issues/detail?id=5206#c6
| This is fixed in V8 bleeding edge, so it's on its way to Chromium.

Good post. I missed this one.

Garrett


--
comp.lang.javascript FAQ <URL: http://jibbering.com/faq/ >

Matthias Watermann

unread,
Feb 12, 2009, 8:26:14 AM2/12/09
to
On Thu, 12 Feb 2009 00:57:53 -0800, Garrett Smith wrote:

> Question: Why not:-
>
> function trimString(s){
> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
> }

Why would you consider the pipe char ("|") as whitespace?


--
Matthias
/"\
\ / ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
X - AGAINST M$ ATTACHMENTS
/ \

kangax

unread,
Feb 12, 2009, 9:26:50 AM2/12/09
to
Garrett Smith wrote:
[...]

> Is this fixed in recent JScript in IE8?

I'll check this later, will let you know.

>
> Question: Why not:-
>
> function trimString(s){
> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
> }

That would work, sure. Accidentally, google groups, which was recently
criticized here, actually uses exactly such `strim` [1]

...
S.string.trim=function(a){
return a.replace(/^[\s\xa0]+|[\s\xa0]+$/g,"")
};
...

[1]
http://groups.google.com/groups/static/release/g2_common-2cddf002493e87d5abe24a2765ad49a6.js


--
kangax

Peter May

unread,
Feb 12, 2009, 9:52:30 AM2/12/09
to
Garrett Smith pisze:
[...]

> function trimString(s){
> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
> }

Look for an interesting article about the trim:
http://blog.stevenlevithan.com/archives/faster-trim-javascript

--
Peter

Lasse Reichstein Nielsen

unread,
Feb 12, 2009, 5:07:11 PM2/12/09
to
kangax <kan...@gmail.com> writes:

> I was only talking about WhiteSpace (7.2) and LineTerminator (7.3)
> productions. Former one includes '\xA0' and so, yes, it should be
> matched by /\s/. I filed a bug with Chrome [1] but only after noticed
> that nightly webkit does not exhibit this (so it's only a matter of
> updating Chrome to the newer version)

RegExp actually doesn't come with WebKit, which Chrome and Safari
are sharing, but with the underlying Javascript implementation.
Chrome uses V8 and Safari uses Squirrelfish (Extreme in nighlies).

The released versions both use the same adaption of the PCRE library
for regexps, but it's being phased out. SFX are implementing WREC
and V8 have just released a new regexp engine as well. It's only
available on the Chrome developer channel releases yet.

So, you are right, the newest "release" of Chrome (but not the
newest stable release!) does implements \s correctly, but it's
completely unrelated to the version of Webkit being used.

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

Dr J R Stockton

unread,
Feb 12, 2009, 10:43:28 AM2/12/09
to
In comp.lang.javascript message <gn0o99$dtm$1...@news.motzarella.org>, Thu,
12 Feb 2009 00:57:53, Garrett Smith <dhtmlk...@gmail.com> posted:

>
>Good point. That should definitely be in the FAQ.
>
>>> Is &nbsp; or &#160; or \xA0 a 'blank' (or white-space) character ?
>
>Is this fixed in recent JScript in IE8?

The FAQ should not presume that \xA0 should be considered, by the
programmer, as RegExp whitespace. That is application-dependent. For
example, a paragraph-packer should treat it largely as if it were a
letter (if a long word MUST be broken, it might be best to do it before
a \xA0, for visibility). In that case, it may be necessary to use not
\s but a more detailed expression.

It seems likely that [ \f\n\r\t\v] will be equivalent to or a subset
of \s , at least in anything which implements \f \n \r \t \v .

Browser RegExps ought to be in accordance with ISO/IEC 16262 Sec 7.2 for
*Source* *Text*, since Sec 15.10 refers to that for "whitespace".

Richard Cornford

unread,
Feb 12, 2009, 6:59:49 PM2/12/09
to
kangax wrote:
> Garrett Smith wrote:
> [...]
>> Is this fixed in recent JScript in IE8?
>
> I'll check this later, will let you know.

If that question was whether \s in a regular expression matches '\u00A0'
in IE 8 then it appears that the answer is no.

>> Question: Why not:-
>>
>> function trimString(s){
>> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
>> }
>
> That would work, sure.

<snip>

By "work" I assume you mean that it corrects the fact that some
javascript engines do not match '\u00A0' (the non-breaking space
character) when \s is used in a regular expression. But is this an
attempt to create methods that conform to the ECMA (3rd Ed.)
specification, and thus are consistent across platforms, or is it a
desire to arbitrarily include the non-breaking space in the set of
matched characters for a trim function.

The former seems the more reasonable goal, but in that case this trim
function still falls short as where '\u00A0' is not matched the odds are
extremely good that '\u2003' (EM space), to name but one, is not matched
either. So the above function will still produce inconsistent results
between javascript engines.

The ECMA spec wants \s to match javascript's whit space characters and
it line terminator characters, but the definition of whitespace includes
all of Unicode's Zs group, and JScript, for example, does not match the
majority of those.

Try this out:-

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">


<html>
<head>
<title></title>
</head>
<body>

<pre>
<script type="text/javascript">

var WhiteSpace = [
{
cp:"9", codePoint:"0x0009", character :"\u0009",
name:"<control>[ASCII Tab]", group:"Cc"
},
{
cp:"11", codePoint:"0x000B", character :"\u000B",
name:"<control>[ASCII Vertical Tab]", group:"Cc"
},
{
cp:"12", codePoint:"0x000C", character :"\u000C",
name:"<control>[ASCII Form Feed]", group:"Cc"
},
{
cp:"32", codePoint:"0x0020", character :"\u0020",
name:"SPACE", group:"Zs"
},
{
cp:"160", codePoint:"0x00A0", character :"\u00A0",
name:"NO-BREAK SPACE", group:"Zs"
},
{
cp:"5760", codePoint:"0x1680", character :"\u1680",
name:"OGHAM SPACE MARK", group:"Zs"
},
{
cp:"6158", codePoint:"0x180E", character :"\u180E",
name:"MONGOLIAN VOWEL SEPARATOR", group:"Zs"
},
{
cp:"8192", codePoint:"0x2000", character :"\u2000",
name:"EN QUAD", group:"Zs"
},
{
cp:"8193", codePoint:"0x2001", character :"\u2001",
name:"EM QUAD", group:"Zs"
},
{
cp:"8194", codePoint:"0x2002", character :"\u2002",
name:"EN SPACE", group:"Zs"
},
{
cp:"8195", codePoint:"0x2003", character :"\u2003",
name:"EM SPACE", group:"Zs"
},
{
cp:"8196", codePoint:"0x2004", character :"\u2004",
name:"THREE-PER-EM SPACE", group:"Zs"
},
{
cp:"8197", codePoint:"0x2005", character :"\u2005",
name:"FOUR-PER-EM SPACE", group:"Zs"
},
{
cp:"8198", codePoint:"0x2006", character :"\u2006",
name:"SIX-PER-EM SPACE", group:"Zs"
},
{
cp:"8199", codePoint:"0x2007", character :"\u2007",
name:"FIGURE SPACE", group:"Zs"
},
{
cp:"8200", codePoint:"0x2008", character :"\u2008",
name:"PUNCTUATION SPACE", group:"Zs"
},
{
cp:"8201", codePoint:"0x2009", character :"\u2009",
name:"THIN SPACE", group:"Zs"
},
{
cp:"8202", codePoint:"0x200A", character :"\u200A",
name:"HAIR SPACE", group:"Zs"
},
{
cp:"8239", codePoint:"0x202F", character :"\u202F",
name:"NARROW NO-BREAK SPACE", group:"Zs"
},
{
cp:"8287", codePoint:"0x205F", character :"\u205F",
name:"MEDIUM MATHEMATICAL SPACE", group:"Zs"
},
{
cp:"12288",codePoint:"0x3000", character :"\u3000",
name:"IDEOGRAPHIC SPACE", group:"Zs"
}
];

var LineTerminator = [
{
cp:"8232", codePoint:"0x2028", character :"\u2028",
name:"LINE SEPARATOR", group:"Zl"
},
{
cp:"8233", codePoint:"0x2029", character :"\u2029",
name:"PARAGRAPH SEPARATOR", group:"Zp"
},
{
cp:"10", codePoint:"0x000A", character :"\u000A",
name:"<control>[ASCII Line Feed]", group:"Cc"
},
{
cp:"13", codePoint:"0x000D", character :"\u000D",
name:"<control>[ASCII Carriage Return]", group:"Cc"
}
];

var NonWhiteSpace = [
{
cp:"8203", codePoint:"0x200B", character :"\u200B",
name:"ZERO WIDTH SPACE (will be whitespace in ES 3.1)",
group:"Cf"
},
{
cp:"8204", codePoint:"0x200C", character :"\u200C",
name:"ZERO WIDTH NON-JOINER", group:"Cf"
},
{
cp:"8205", codePoint:"0x200D", character :"\u200D",
name:"ZERO WIDTH JOINER", group:"Cf"
},
{
cp:"65279", codePoint:"0xFEFF", character :"\uFEFF",
name:"ZERO WIDTH NO-BREAK SPACE (will be whitespace in ES 3.1)",
group:"Cf"
}
];

var testAr = [WhiteSpace, LineTerminator];

var testRX = /\s/g;
function testWS(chObj){
var char, stripped, matches;
if(!(char = String.fromCharCode(+chObj.cp))){
return 'Anomaly: None empty string is false.\n'
}


/* First verify that the String.fromCharCode result matches the
Unicode escape sequence based string literal for the character.
*/
if(char != chObj.character){
return 'Anomaly in String.String.fromCharCode('+chObj.cp+')\n'
}
if((stripped = char.replace(testRX, '')) == chObj.character){
matches = false;
}else{
matches = true;
}
return (
matches+
'\t"'+stripped+'"\tcp = '+chObj.codePoint+'\tname = '+
chObj.name+' group = '+chObj.group+'\n'
);
}

var c, cp, list, ctr;
document.write('Should be matched by \\s\n');
for(c = 0;c < testAr.length;++c){
list = testAr[c];
for(cp = 0;cp < list.length;++cp){
ctr = list[cp];
document.write(testWS(ctr)
.replace('true', 'GOOD')
.replace('false', 'FAIL'));

}
}

document.write('\n\nMust not be matched by \\s (in ES 3)\n');
for(cp = 0;cp < NonWhiteSpace.length;++cp){
ctr = NonWhiteSpace[cp];
document.write(testWS(ctr)
.replace('true', 'FAIL')
.replace('false', 'GOOD'));

}
</script>
</pre>
</body>
</html>

Richard.

kangax

unread,
Feb 12, 2009, 10:28:10 PM2/12/09
to
Richard Cornford wrote:
> kangax wrote:
>> Garrett Smith wrote:
>> [...]
>>> Is this fixed in recent JScript in IE8?
>>
>> I'll check this later, will let you know.
>
> If that question was whether \s in a regular expression matches '\u00A0'
> in IE 8 then it appears that the answer is no.

Oh well.

>
>>> Question: Why not:-
>>>
>>> function trimString(s){
>>> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
>>> }

I missed this first time, but `|` is not needed here (inside the
character class), is it?

>>
>> That would work, sure.
> <snip>
>
> By "work" I assume you mean that it corrects the fact that some
> javascript engines do not match '\u00A0' (the non-breaking space
> character) when \s is used in a regular expression. But is this an
> attempt to create methods that conform to the ECMA (3rd Ed.)
> specification, and thus are consistent across platforms, or is it a
> desire to arbitrarily include the non-breaking space in the set of
> matched characters for a trim function.

As I understand it, there are practical implications to regex whitespace
character class not matching \xA0 (NBSP) characters in context of `trim`
function.

A simple example demonstrating the issue would be:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
<title></title>
</head>
<body>

<script type="text/javascript">
(function(){

function trim(s) {
return s.replace(/^\s+/, '')
.replace(/\s+$/, '');
}

function trim_xA0(s) {
return s.replace(/^[\s\xA0]+/, '')
.replace(/[\s\xA0]+$/, '');
}

var el = document.createElement('div');
el.innerHTML = '&nbsp;foo&nbsp;';

document.write('trim: ' +
trim(el.firstChild.nodeValue).length + '<br>');

document.write('trim_xA0: ' +
trim_xA0(el.firstChild.nodeValue).length);

})();
</script>
</body>
</html>

In Firefox and Opera - "simple" trim removes "nbsp" characters. IE and
Safari need an addition of \xA0. Considering that \xA0 is part of a
White Space production (as per ES3) and that such major browsers as IE
and Opera have faulty implementations, it would seem to be a good idea
to use `\xA0`-patched `trim`.

>
> The former seems the more reasonable goal, but in that case this trim
> function still falls short as where '\u00A0' is not matched the odds are
> extremely good that '\u2003' (EM space), to name but one, is not matched
> either. So the above function will still produce inconsistent results
> between javascript engines.
>
> The ECMA spec wants \s to match javascript's whit space characters and
> it line terminator characters, but the definition of whitespace includes
> all of Unicode's Zs group, and JScript, for example, does not match the
> majority of those.

True. I missed this part of the specs (about Unicode whitespace).

>
> Try this out:-
>

[snip test]

Interesting. There's quite a bit of failures in all browsers I could test.

--
kangax

kangax

unread,
Feb 12, 2009, 10:34:04 PM2/12/09
to
Lasse Reichstein Nielsen wrote:
[...]

> RegExp actually doesn't come with WebKit, which Chrome and Safari
> are sharing, but with the underlying Javascript implementation.
> Chrome uses V8 and Safari uses Squirrelfish (Extreme in nighlies).

Ah, so WebKit is a rendering engine and has nothing to do with ES
implementation. Thanks, I wasn't aware of this : )

[...]

--
kangax

kangax

unread,
Feb 12, 2009, 11:01:20 PM2/12/09
to
kangax wrote:
> Richard Cornford wrote:
[...]

> [snip test]
>
> Interesting. There's quite a bit of failures in all browsers I could test.
>

Actually, I just download nightly WebKit (rev. 40931) and it passes all
tests (!)

--
kangax

Dr J R Stockton

unread,
Feb 13, 2009, 12:44:37 PM2/13/09
to
In comp.lang.javascript message <iYOdnUKRy682rAnUnZ2dnUVZ_vjinZ2d@gigane
ws.com>, Thu, 12 Feb 2009 09:26:50, kangax <kan...@gmail.com> posted:

>Garrett Smith wrote:
>[...]
>> Is this fixed in recent JScript in IE8?
>
>I'll check this later, will let you know.
>
>> Question: Why not:-
>> function trimString(s){
>> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
>> }
>
>That would work, sure. Accidentally, google groups, which was recently
>criticized here, actually uses exactly such `strim` [1]
>
>...
>S.string.trim=function(a){
> return a.replace(/^[\s\xa0]+|[\s\xa0]+$/g,"")
>};

That is not "exactly such". The first also trims leading and trailing
vertical bars; the second should not.

kangax

unread,
Feb 13, 2009, 2:55:46 PM2/13/09
to
Dr J R Stockton wrote:

[snip]

> That is not "exactly such". The first also trims leading and trailing
> vertical bars; the second should not.
>

Yep, I noticed it later on. See my follow-up to Richard's post.

--
kangax

Richard Cornford

unread,
Feb 13, 2009, 5:19:50 PM2/13/09
to
kangax wrote:
> Richard Cornford wrote:
>> kangax wrote:
>>> Garrett Smith wrote:
<snip>

>>>> function trimString(s){
>>>> return s.replace(/^[\s|\xA0]+|[\s|\xA0]+$/g, '');
>>>> }
>
> I missed this first time, but `|` is not needed here
> (inside the character class), is it?

No. But my interest was mostly with the justification for including
\u00A0 but not, say, \u202F.

>>> That would work, sure.
>> <snip>
>>
>> By "work" I assume you mean that it corrects the fact that
>> some javascript engines do not match '\u00A0' (the
>> non-breaking space character) when \s is used in a regular
>> expression. But is this an attempt to create methods that
>> conform to the ECMA (3rd Ed.) specification, and thus are
>> consistent across platforms, or is it a desire to arbitrarily
>> include the non-breaking space in the set of matched
>> characters for a trim function.
>
> As I understand it, there are practical implications to
> regex whitespace character class not matching \xA0 (NBSP)
> characters in context of `trim` function.

But there must also be practical implications in its not matching
\u202F. There may be a diminishing likelihood of any given - trim -
implementation encountering particular whitespace characters as they
become increasingly obscure, but we are already well into the obscure
when handling non-breaking space as users are going to pretty hard
pressed to enter that character into, say, and <INPUT type="text">
element. (That is, it is likely a deliberate act on the part of web
developers to include \u00A0 in a string, and so it is maybe only the
developers who chose that design who will have to deal with the issue)

<snip>> [snip test]


>
> Interesting. There's quite a bit of failures in all browsers I
> could test.

Yes, so a general - trim - that is going to be consistent across
browsers is going to have to either do quite a bit of work to compensate
for the inconsistencies in \s or abandon the use of \s in favour of a
predictable explicit character class definition.

Richard.