var escapeJSONChar=function ()
{
var escapeChars = ["\b","\t","\n","\f","\r"];
return function(c){
// Need to do these first as their ascii values are > 32 (34 & 92)
if(c == "\"" || c == "\\")
{
return "\\"+c;
}
//Otherwise it doesn't need escaping
if(c.charCodeAt(0)>=32)
{
return c;
}
// Otherwise it is has a code < 32 and may need escaping.
for(var i=0;i<escapeChars.length;i++)
{
if(c==escapeChars[i])
{
return "\\" + c;
}
}
// it was a character from 0-31 that wasn't one of the escape
chars
return c;
};
}();
this function can not return correct quoted char .
this line : return "\\" + c; is wrong , originalchar returnd
It might be better with the 'chained if' we had in the earlier version
of that function. e.g.
if (c == "\b") return "\\b";
else if (c == "\f") return "\\f";
else if (c == "\n") return "\\n";
else if (c == "\r") return "\\r";
else if (c == "\t") return "\\t";
It is most likely faster as there is no need for loop local variable
initialization and loop iterator testing, etc.
From looking, the current version does not hex encode control chars
as required by the JSON spec (bare control chars are not allowed).
This is based on an older version of the function (without the bug)
but I have not tested it yet.
escapeJSONChar =
function escapeJSONChar(c)
{
var code = c.charCodeAt(0);
if (code >= 32) return c;
else if (c == "\"" || c == "\\") return "\\" + c;
else if (c == "\b") return "\\b";
else if (c == "\f") return "\\f";
else if (c == "\n") return "\\n";
else if (c == "\r") return "\\r";
else if (c == "\t") return "\\t";
var hex = code.toString(16);
if (hex.length == 1) return "\\u000" + hex;
else return "\\u00" + hex;
};
The current jabsorb coding style doesn't allow for concise expression of
chained if, so this would have to be modified before we can make a patch.
Let us know if this works for you and I'll translate it (personally I
prefer a more concise coding style). grumble grumble. :)
lxb...@gmail.com wrote:
> var escapeJSONChar=function ()
> {
> var escapeChars = ["\b","\t","\n","\f","\r"];
> return function(c){
> // Need to do these first as their ascii values are > 32 (34 & 92)
> if(c == "\"" || c == "\\")
> {
> return "\\"+c;
> }
> //Otherwise it doesn't need escaping
> if(c.charCodeAt(0)>=32)
> {
> return c;
> }
> // Otherwise it is has a code < 32 and may need escaping.
> for(var i=0;i<escapeChars.length;i++)
> {
> if(c==escapeChars[i])
> {
> return "\\" + c;
> }
> }
> // it was a character from 0-31 that wasn't one of the escape
> chars
> return c;
> };
> }();
> this function can not return correct quoted char .
> this line : return "\\" + c; is wrong , originalchar returnd
Will and I went round and round on this function early on...
The reason for this function is of course to convert a native Javascript
String into a json version of the String for transport.
That's why we took out the hex conversion. It's not strictly needed as the
String is still legal json (and a lot smaller and more quickly processed) if
we only escape characters that *must* be escaped. (see the rail diagram for
string at http://json.org)
Otherwise doing this hex conversion wastes extra CPU and makes the escaped
string a lot larger.
All we need to do is quickly find characters that *must* be escaped for
proper json conversion and transfer.
As far as the supposed *bug* found. I don't see how that is a bug. Can you
explain further?
The purpose of that code is to escape a double quote char and a backslash
chars if found.
If this is a bug, please explain how you found it, and how it's hurting you.
As far as the chained if goes: How about using a switch statement
instead... wouldn't that be faster than both the chained if and the array
scan?
On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark <mich...@metaparadigm.com>wrote:
> It might be better with the 'chained if' we had in the earlier version
> of that function. e.g.
> if (c == "\b") return "\\b";
> else if (c == "\f") return "\\f";
> else if (c == "\n") return "\\n";
> else if (c == "\r") return "\\r";
> else if (c == "\t") return "\\t";
> It is most likely faster as there is no need for loop local variable
> initialization and loop iterator testing, etc.
> From looking, the current version does not hex encode control chars
> as required by the JSON spec (bare control chars are not allowed).
> This is based on an older version of the function (without the bug)
> but I have not tested it yet.
> escapeJSONChar =
> function escapeJSONChar(c)
> {
> var code = c.charCodeAt(0);
> if (code >= 32) return c;
> else if (c == "\"" || c == "\\") return "\\" + c;
> else if (c == "\b") return "\\b";
> else if (c == "\f") return "\\f";
> else if (c == "\n") return "\\n";
> else if (c == "\r") return "\\r";
> else if (c == "\t") return "\\t";
> var hex = code.toString(16);
> if (hex.length == 1) return "\\u000" + hex;
> else return "\\u00" + hex;
> };
> The current jabsorb coding style doesn't allow for concise expression of
> chained if, so this would have to be modified before we can make a patch.
> Let us know if this works for you and I'll translate it (personally I
> prefer a more concise coding style). grumble grumble. :)
> lxb...@gmail.com wrote:
> > var escapeJSONChar=function ()
> > {
> > var escapeChars = ["\b","\t","\n","\f","\r"];
> > return function(c){
> > // Need to do these first as their ascii values are > 32 (34 & 92)
> > if(c == "\"" || c == "\\")
> > {
> > return "\\"+c;
> > }
> > //Otherwise it doesn't need escaping
> > if(c.charCodeAt(0)>=32)
> > {
> > return c;
> > }
> > // Otherwise it is has a code < 32 and may need escaping.
> > for(var i=0;i<escapeChars.length;i++)
> > {
> > if(c==escapeChars[i])
> > {
> > return "\\" + c;
> > }
> > }
> > // it was a character from 0-31 that wasn't one of the escape
> > chars
> > return c;
> > };
> > }();
> > this function can not return correct quoted char .
> > this line : return "\\" + c; is wrong , originalchar returnd
After I wrote this, I looked into string escaping a little more.
Technically, we are supposed to escape "control characters".
What is exactly meant by a "control character" isn't rigidly defined in the
rail diagram, but the reference implementation at:
> Will and I went round and round on this function early on...
> The reason for this function is of course to convert a native Javascript
> String into a json version of the String for transport.
> That's why we took out the hex conversion. It's not strictly needed as the
> String is still legal json (and a lot smaller and more quickly processed) if
> we only escape characters that *must* be escaped. (see the rail diagram for
> string at http://json.org)
> Otherwise doing this hex conversion wastes extra CPU and makes the escaped
> string a lot larger.
> All we need to do is quickly find characters that *must* be escaped for
> proper json conversion and transfer.
> As far as the supposed *bug* found. I don't see how that is a bug. Can
> you explain further?
> The purpose of that code is to escape a double quote char and a backslash
> chars if found.
> If this is a bug, please explain how you found it, and how it's hurting
> you.
> As far as the chained if goes: How about using a switch statement
> instead... wouldn't that be faster than both the chained if and the array
> scan?
> On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark <mich...@metaparadigm.com>wrote:
>> Good spotting!
>> It might be better with the 'chained if' we had in the earlier version
>> of that function. e.g.
>> if (c == "\b") return "\\b";
>> else if (c == "\f") return "\\f";
>> else if (c == "\n") return "\\n";
>> else if (c == "\r") return "\\r";
>> else if (c == "\t") return "\\t";
>> It is most likely faster as there is no need for loop local variable
>> initialization and loop iterator testing, etc.
>> From looking, the current version does not hex encode control chars
>> as required by the JSON spec (bare control chars are not allowed).
>> This is based on an older version of the function (without the bug)
>> but I have not tested it yet.
>> escapeJSONChar =
>> function escapeJSONChar(c)
>> {
>> var code = c.charCodeAt(0);
>> if (code >= 32) return c;
>> else if (c == "\"" || c == "\\") return "\\" + c;
>> else if (c == "\b") return "\\b";
>> else if (c == "\f") return "\\f";
>> else if (c == "\n") return "\\n";
>> else if (c == "\r") return "\\r";
>> else if (c == "\t") return "\\t";
>> var hex = code.toString(16);
>> if (hex.length == 1) return "\\u000" + hex;
>> else return "\\u00" + hex;
>> };
>> The current jabsorb coding style doesn't allow for concise expression of
>> chained if, so this would have to be modified before we can make a patch.
>> Let us know if this works for you and I'll translate it (personally I
>> prefer a more concise coding style). grumble grumble. :)
>> lxb...@gmail.com wrote:
>> > var escapeJSONChar=function ()
>> > {
>> > var escapeChars = ["\b","\t","\n","\f","\r"];
>> > return function(c){
>> > // Need to do these first as their ascii values are > 32 (34 & 92)
>> > if(c == "\"" || c == "\\")
>> > {
>> > return "\\"+c;
>> > }
>> > //Otherwise it doesn't need escaping
>> > if(c.charCodeAt(0)>=32)
>> > {
>> > return c;
>> > }
>> > // Otherwise it is has a code < 32 and may need escaping.
>> > for(var i=0;i<escapeChars.length;i++)
>> > {
>> > if(c==escapeChars[i])
>> > {
>> > return "\\" + c;
>> > }
>> > }
>> > // it was a character from 0-31 that wasn't one of the escape
>> > chars
>> > return c;
>> > };
>> > }();
>> > this function can not return correct quoted char .
>> > this line : return "\\" + c; is wrong , originalchar returnd
The reason it has 2 bugs is because the routine needs to return
an escaped version of the ctrl char. e.g.
if (c == "\b") return "\\b";
Whereas the current array version just returns the original
ctrl char unmodified without any escaping (as the original
poster noted). You would need a second array with the unescaped
versions of each of the control codes to make it work.
Although for ultimate performance, I have been thinking about it
and instead of the "chained if" I had originally, I propose we
modify your array approach into a 32 element lookup table. It adds
a small amount of bulk to the client - but is the ultimate in
performance. It also fits within the coding convention in a small
amount of space.
};
Arthur Blake wrote:
> After I wrote this, I looked into string escaping a little more.
> Technically, we are supposed to escape "control characters".
> What is exactly meant by a "control character" isn't rigidly defined > in the rail diagram, but the reference implementation at:
> defines it. Speaking of the reference implementation- It's pretty > well written. I've long thought we should convert jabsorb to use this > at some point.
> On Thu, Nov 27, 2008 at 9:13 AM, Arthur Blake <arthur.bl...@gmail.com > <mailto:arthur.bl...@gmail.com>> wrote:
> Will and I went round and round on this function early on...
> The reason for this function is of course to convert a native
> Javascript String into a json version of the String for transport.
> That's why we took out the hex conversion. It's not strictly
> needed as the String is still legal json (and a lot smaller and
> more quickly processed) if we only escape characters that *must*
> be escaped. (see the rail diagram for string at http://json.org)
> Otherwise doing this hex conversion wastes extra CPU and makes the
> escaped string a lot larger.
> All we need to do is quickly find characters that *must* be
> escaped for proper json conversion and transfer.
> As far as the supposed *bug* found. I don't see how that is a
> bug. Can you explain further?
> The purpose of that code is to escape a double quote char and a
> backslash chars if found.
> If this is a bug, please explain how you found it, and how it's
> hurting you.
> As far as the chained if goes: How about using a switch statement
> instead... wouldn't that be faster than both the chained if and
> the array scan?
> On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark
> <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>> wrote:
> Good spotting!
> It might be better with the 'chained if' we had in the earlier
> version
> of that function. e.g.
> if (c == "\b") return "\\b";
> else if (c == "\f") return "\\f";
> else if (c == "\n") return "\\n";
> else if (c == "\r") return "\\r";
> else if (c == "\t") return "\\t";
> It is most likely faster as there is no need for loop local
> variable
> initialization and loop iterator testing, etc.
> From looking, the current version does not hex encode control
> chars
> as required by the JSON spec (bare control chars are not allowed).
> This is based on an older version of the function (without the
> bug)
> but I have not tested it yet.
> escapeJSONChar =
> function escapeJSONChar(c)
> {
> var code = c.charCodeAt(0);
> if (code >= 32) return c;
> else if (c == "\"" || c == "\\") return "\\" + c;
> else if (c == "\b") return "\\b";
> else if (c == "\f") return "\\f";
> else if (c == "\n") return "\\n";
> else if (c == "\r") return "\\r";
> else if (c == "\t") return "\\t";
> var hex = code.toString(16);
> if (hex.length == 1) return "\\u000" + hex;
> else return "\\u00" + hex;
> };
> The current jabsorb coding style doesn't allow for concise
> expression of
> chained if, so this would have to be modified before we can
> make a patch.
> Let us know if this works for you and I'll translate it
> (personally I
> prefer a more concise coding style). grumble grumble. :)
> lxb...@gmail.com <mailto:lxb...@gmail.com> wrote:
> > var escapeJSONChar=function ()
> > {
> > var escapeChars = ["\b","\t","\n","\f","\r"];
> > return function(c){
> > // Need to do these first as their ascii values are > 32
> (34 & 92)
> > if(c == "\"" || c == "\\")
> > {
> > return "\\"+c;
> > }
> > //Otherwise it doesn't need escaping
> > if(c.charCodeAt(0)>=32)
> > {
> > return c;
> > }
> > // Otherwise it is has a code < 32 and may need escaping.
> > for(var i=0;i<escapeChars.length;i++)
> > {
> > if(c==escapeChars[i])
> > {
> > return "\\" + c;
> > }
> > }
> > // it was a character from 0-31 that wasn't one of the
> escape
> > chars
> > return c;
> > };
> > }();
> > this function can not return correct quoted char .
> > this line : return "\\" + c; is wrong , originalchar returnd
OK, I see the bug now- thanks for explaining it to me.
(I was looking at the first instance of 'return "\\" + c;' near the top of
the function)
There may be even more escape characters to worry about..
witness the json2.js (Douglas Crockford's) version:
var cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u20 60-\u206f\ufeff\ufff0-\uffff]/g,
escapable =
/[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\ u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
gap,
indent,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
},
rep;
function quote(string) {
// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.
escapable.lastIndex = 0;
return escapable.test(string) ?
'"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string' ? c :
'\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' :
'"' + string + '"';
}
I don't know if it's faster or not- because it uses regular expressions...
On Thu, Nov 27, 2008 at 7:48 PM, Michael Clark <mich...@metaparadigm.com>wrote:
> The reason it has 2 bugs is because the routine needs to return
> an escaped version of the ctrl char. e.g.
> if (c == "\b") return "\\b";
> Whereas the current array version just returns the original
> ctrl char unmodified without any escaping (as the original
> poster noted). You would need a second array with the unescaped
> versions of each of the control codes to make it work.
> Although for ultimate performance, I have been thinking about it
> and instead of the "chained if" I had originally, I propose we
> modify your array approach into a 32 element lookup table. It adds
> a small amount of bulk to the client - but is the ultimate in
> performance. It also fits within the coding convention in a small
> amount of space.
> Arthur Blake wrote:
> > After I wrote this, I looked into string escaping a little more.
> > Technically, we are supposed to escape "control characters".
> > What is exactly meant by a "control character" isn't rigidly defined
> > in the rail diagram, but the reference implementation at:
> > defines it. Speaking of the reference implementation- It's pretty
> > well written. I've long thought we should convert jabsorb to use this
> > at some point.
> > On Thu, Nov 27, 2008 at 9:13 AM, Arthur Blake <arthur.bl...@gmail.com
> > <mailto:arthur.bl...@gmail.com>> wrote:
> > Will and I went round and round on this function early on...
> > The reason for this function is of course to convert a native
> > Javascript String into a json version of the String for transport.
> > That's why we took out the hex conversion. It's not strictly
> > needed as the String is still legal json (and a lot smaller and
> > more quickly processed) if we only escape characters that *must*
> > be escaped. (see the rail diagram for string at http://json.org)
> > Otherwise doing this hex conversion wastes extra CPU and makes the
> > escaped string a lot larger.
> > All we need to do is quickly find characters that *must* be
> > escaped for proper json conversion and transfer.
> > As far as the supposed *bug* found. I don't see how that is a
> > bug. Can you explain further?
> > The purpose of that code is to escape a double quote char and a
> > backslash chars if found.
> > If this is a bug, please explain how you found it, and how it's
> > hurting you.
> > As far as the chained if goes: How about using a switch statement
> > instead... wouldn't that be faster than both the chained if and
> > the array scan?
> > On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark
> > <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>> wrote:
> > Good spotting!
> > It might be better with the 'chained if' we had in the earlier
> > version
> > of that function. e.g.
> > if (c == "\b") return "\\b";
> > else if (c == "\f") return "\\f";
> > else if (c == "\n") return "\\n";
> > else if (c == "\r") return "\\r";
> > else if (c == "\t") return "\\t";
> > It is most likely faster as there is no need for loop local
> > variable
> > initialization and loop iterator testing, etc.
> > From looking, the current version does not hex encode control
> > chars
> > as required by the JSON spec (bare control chars are not
> allowed).
> > This is based on an older version of the function (without the
> > bug)
> > but I have not tested it yet.
> > escapeJSONChar =
> > function escapeJSONChar(c)
> > {
> > var code = c.charCodeAt(0);
> > if (code >= 32) return c;
> > else if (c == "\"" || c == "\\") return "\\" + c;
> > else if (c == "\b") return "\\b";
> > else if (c == "\f") return "\\f";
> > else if (c == "\n") return "\\n";
> > else if (c == "\r") return "\\r";
> > else if (c == "\t") return "\\t";
> > var hex = code.toString(16);
> > if (hex.length == 1) return "\\u000" + hex;
> > else return "\\u00" + hex;
> > };
> > The current jabsorb coding style doesn't allow for concise
> > expression of
> > chained if, so this would have to be modified before we can
> > make a patch.
> > Let us know if this works for you and I'll translate it
> > (personally I
> > prefer a more concise coding style). grumble grumble. :)
Arthur Blake wrote:
> OK, I see the bug now- thanks for explaining it to me.
> (I was looking at the first instance of 'return "\\" + c;' near the > top of the function)
> There may be even more escape characters to worry about..
> witness the json2.js (Douglas Crockford's) version:
The table approach will obviously be the fastest but it will not escape these additional unicode chars (the required table would be too big).
The original version of the function from json-rpc-java also escaped unicode chars (in fact all above > 127).
I think the original bug you fixed was that it was not escaping \ and " from looking at the code (as it was called only for chars < 32 or > 127 in the escapeString function)
The escaping of unicode chars, although not strictly neccessary, was due to extensive browser testing, it was necessary so that it would work on pages that were not themselves UTF-8 and where the xmlhttprequest implementation didn't allow setting of the charset.
IIRC this was for webkit/konq - although this browser bug may be long since fixed.
The original test case I used was sending unicode chars from code on a page that itself has a different charset (I believe the unicode test pages is utf8 so we would have to temporarily change it to something else to run a test).
Feel free to bring in Douglas's version, i'm okay with that. We just will need to add in copyright and license (include the text of his MIT style license in the client) or get permission from him to redistribute under ASL 2.0.
> // If the string contains no control characters, no quote characters, and no
> // backslash characters, then we can safely slap some quotes around it.
> // Otherwise we must also replace the offending characters with safe escape
> // sequences.
> escapable.lastIndex = 0;
> return escapable.test(string) ?
> '"' + string.replace(escapable, function (a) {
> var c = meta[a];
> return typeof c === 'string' ? c :
> I don't know if it's faster or not- because it uses regular expressions...
> On Thu, Nov 27, 2008 at 7:48 PM, Michael Clark > <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>> wrote:
> The reason it has 2 bugs is because the routine needs to return
> an escaped version of the ctrl char. e.g.
> if (c == "\b") return "\\b";
> Whereas the current array version just returns the original
> ctrl char unmodified without any escaping (as the original
> poster noted). You would need a second array with the unescaped
> versions of each of the control codes to make it work.
> Although for ultimate performance, I have been thinking about it
> and instead of the "chained if" I had originally, I propose we
> modify your array approach into a 32 element lookup table. It adds
> a small amount of bulk to the client - but is the ultimate in
> performance. It also fits within the coding convention in a small
> amount of space.
> Arthur Blake wrote:
> > After I wrote this, I looked into string escaping a little more.
> > Technically, we are supposed to escape "control characters".
> > What is exactly meant by a "control character" isn't rigidly defined
> > in the rail diagram, but the reference implementation at:
> > defines it. Speaking of the reference implementation- It's pretty
> > well written. I've long thought we should convert jabsorb to
> use this
> > at some point.
> > On Thu, Nov 27, 2008 at 9:13 AM, Arthur Blake
> <arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>
> > <mailto:arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>>>
> wrote:
> > Will and I went round and round on this function early on...
> > The reason for this function is of course to convert a native
> > Javascript String into a json version of the String for
> transport.
> > That's why we took out the hex conversion. It's not strictly
> > needed as the String is still legal json (and a lot smaller and
> > more quickly processed) if we only escape characters that *must*
> > be escaped. (see the rail diagram for string at
> http://json.org)
> > Otherwise doing this hex conversion wastes extra CPU and
> makes the
> > escaped string a lot larger.
> > All we need to do is quickly find characters that *must* be
> > escaped for proper json conversion and transfer.
> > As far as the supposed *bug* found. I don't see how that is a
> > bug. Can you explain further?
> > The purpose of that code is to escape a double quote char and a
> > backslash chars if found.
> > If this is a bug, please explain how you found it, and how it's
> > hurting you.
> > As far as the chained if goes: How about using a switch
> statement
> > instead... wouldn't that be faster than both the chained if and
> > the array scan?
> > On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark
> > <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>
> <mailto:mich...@metaparadigm.com
> <mailto:mich...@metaparadigm.com>>> wrote:
> > Good spotting!
> > It might be better with the 'chained if' we had in the
> earlier
> > version
> > of that function. e.g.
> > if (c == "\b") return "\\b";
> > else if (c == "\f") return "\\f";
> > else if (c == "\n") return "\\n";
> > else if (c == "\r") return "\\r";
> > else if (c == "\t") return "\\t";
> > It is most likely faster as there is no need for loop local
> > variable
> > initialization and loop iterator testing, etc.
> > From looking, the current version does not hex encode
> control
> > chars
> > as required by the JSON spec (bare control chars are not
> allowed).
> > This is based on an older version of the function
> (without the
> > bug)
> > but I have not tested it yet.
> > escapeJSONChar =
> > function escapeJSONChar(c)
> > {
> > var code = c.charCodeAt(0);
> > if (code >= 32) return c;
> > else if (c == "\"" || c == "\\") return "\\" + c;
> > else if (c == "\b") return "\\b";
> > else if (c == "\f") return "\\f";
> > else if (c == "\n") return "\\n";
> > else if (c == "\r") return "\\r";
> > else if (c == "\t") return "\\t";
> > var hex = code.toString(16);
> > if (hex.length == 1) return "\\u000" + hex;
> > else return "\\u00" + hex;
> > };
> > The current jabsorb coding style doesn't allow for concise
> > expression of
> > chained if, so this would have to be modified before we can
> > make a patch.
> > Let us know if this works for you and I'll translate it
> > (personally I
> > prefer a more concise coding style). grumble grumble. :)
Below is a minimal patch for 1.3 based on a lookup table-based proposal.
I have tested in firefox and run unit tests on it.
Douglas's version I see is not character by character so it may well
be faster. Where exactly is it from Arthur? Do you have a link to it?
(he has lots of links to his various JSON code on his site and I couldn't
find the particular one).
I see some of his work is public domain but some has a MIT style license.
I'll let you and William decide on how you want to handle it unless
you are both happy with the fix below.
This patch can at least be used in the interim for those who need a fix.
Thanks to the original poster for spotting the bug.
- return function(c){
// Need to do these first as their ascii values are > 32 (34 & 92)
if(c == "\"" || c == "\\")
{
return "\\"+c;
}
- //Otherwise it doesn't need escaping
- if(c.charCodeAt(0)>=32)
+ // Escape control characters 0 to 31
+ var code = c.charCodeAt(0);
+ if (code < 32)
{
- return c;
+ return ctrlChars[code];
}
- // Otherwise it is has a code < 32 and may need escaping.
- for(var i=0;i<escapeChars.length;i++)
- {
- if(c==escapeChars[i])
- {
- return "\\" + c;
- }
- }
- // it was a character from 0-31 that wasn't one of the escape chars
+ // Otherwise we don't escape (although we need to look into
+ // escaping certain unicode chars here)
return c;
};
}();
>> OK, I see the bug now- thanks for explaining it to me.
>> (I was looking at the first instance of 'return "\\" + c;' near the >> top of the function)
>> There may be even more escape characters to worry about..
>> witness the json2.js (Douglas Crockford's) version:
> The table approach will obviously be the fastest but it will not escape > these additional unicode chars (the required table would be too big).
> The original version of the function from json-rpc-java also escaped > unicode chars (in fact all above > 127).
> I think the original bug you fixed was that it was not escaping \ and " > from looking at the code (as it was called only for chars < 32 or > 127 > in the escapeString function)
> The escaping of unicode chars, although not strictly neccessary, was due > to extensive browser testing, it was necessary so that it would work on > pages that were not themselves UTF-8 and where the xmlhttprequest > implementation didn't allow setting of the charset.
> IIRC this was for webkit/konq - although this browser bug may be long > since fixed.
> The original test case I used was sending unicode chars from code on a > page that itself has a different charset (I believe the unicode test > pages is utf8 so we would have to temporarily change it to something > else to run a test).
> Feel free to bring in Douglas's version, i'm okay with that. We just > will need to add in copyright and license (include the text of his MIT > style license in the client) or get permission from him to redistribute > under ASL 2.0.
>> function quote(string) {
>> // If the string contains no control characters, no quote characters, and no
>> // backslash characters, then we can safely slap some quotes around it.
>> // Otherwise we must also replace the offending characters with safe escape
>> // sequences.
>> escapable.lastIndex = 0;
>> return escapable.test(string) ?
>> '"' + string.replace(escapable, function (a) {
>> var c = meta[a];
>> return typeof c === 'string' ? c :
>> I don't know if it's faster or not- because it uses regular expressions...
>> On Thu, Nov 27, 2008 at 7:48 PM, Michael Clark >> <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>> wrote:
>> The reason it has 2 bugs is because the routine needs to return
>> an escaped version of the ctrl char. e.g.
>> if (c == "\b") return "\\b";
>> Whereas the current array version just returns the original
>> ctrl char unmodified without any escaping (as the original
>> poster noted). You would need a second array with the unescaped
>> versions of each of the control codes to make it work.
>> Although for ultimate performance, I have been thinking about it
>> and instead of the "chained if" I had originally, I propose we
>> modify your array approach into a 32 element lookup table. It adds
>> a small amount of bulk to the client - but is the ultimate in
>> performance. It also fits within the coding convention in a small
>> amount of space.
>> Arthur Blake wrote:
>> > After I wrote this, I looked into string escaping a little more.
>> > Technically, we are supposed to escape "control characters".
>> > What is exactly meant by a "control character" isn't rigidly defined
>> > in the rail diagram, but the reference implementation at:
>> > defines it. Speaking of the reference implementation- It's pretty
>> > well written. I've long thought we should convert jabsorb to
>> use this
>> > at some point.
>> > On Thu, Nov 27, 2008 at 9:13 AM, Arthur Blake
>> <arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>
>> > <mailto:arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>>>
>> wrote:
>> > Will and I went round and round on this function early on...
>> > The reason for this function is of course to convert a native
>> > Javascript String into a json version of the String for
>> transport.
>> > That's why we took out the hex conversion. It's not strictly
>> > needed as the String is still legal json (and a lot smaller and
>> > more quickly processed) if we only escape characters that *must*
>> > be escaped. (see the rail diagram for string at
>> http://json.org)
>> > Otherwise doing this hex conversion wastes extra CPU and
>> makes the
>> > escaped string a lot larger.
>> > All we need to do is quickly find characters that *must* be
>> > escaped for proper json conversion and transfer.
>> > As far as the supposed *bug* found. I don't see how that is a
>> > bug. Can you explain further?
>> > The purpose of that code is to escape a double quote char and a
>> > backslash chars if found.
>> > If this is a bug, please explain how you found it, and how it's
>> > hurting you.
>> > As far as the chained if goes: How about using a switch
>> statement
>> > instead... wouldn't that be faster than both the chained if and
>> > the array scan?
>> > On Wed, Nov 26, 2008 at 11:39 PM, Michael Clark
>> > <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>
>> <mailto:mich...@metaparadigm.com
>> <mailto:mich...@metaparadigm.com>>> wrote:
>> > Good spotting!
>> > It might be better with the 'chained if' we had in the
>> earlier
>> > version
>> > of that function. e.g.
> Below is a minimal patch for 1.3 based on a lookup table-based proposal.
> I have tested in firefox and run unit tests on it.
> Douglas's version I see is not character by character so it may well
> be faster. Where exactly is it from Arthur? Do you have a link to it?
> (he has lots of links to his various JSON code on his site and I couldn't
> find the particular one).
> I see some of his work is public domain but some has a MIT style license.
> I'll let you and William decide on how you want to handle it unless
> you are both happy with the fix below.
> This patch can at least be used in the interim for those who need a fix.
> Thanks to the original poster for spotting the bug.
> - return function(c){
> // Need to do these first as their ascii values are > 32 (34 & 92)
> if(c == "\"" || c == "\\")
> {
> return "\\"+c;
> }
> - //Otherwise it doesn't need escaping
> - if(c.charCodeAt(0)>=32)
> + // Escape control characters 0 to 31
> + var code = c.charCodeAt(0);
> + if (code < 32)
> {
> - return c;
> + return ctrlChars[code];
> }
> - // Otherwise it is has a code < 32 and may need escaping.
> - for(var i=0;i<escapeChars.length;i++)
> - {
> - if(c==escapeChars[i])
> - {
> - return "\\" + c;
> - }
> - }
> - // it was a character from 0-31 that wasn't one of the escape chars
> + // Otherwise we don't escape (although we need to look into
> + // escaping certain unicode chars here)
> return c;
> };
> }();
> Michael Clark wrote:
> > Arthur Blake wrote:
> >> OK, I see the bug now- thanks for explaining it to me.
> >> (I was looking at the first instance of 'return "\\" + c;' near the
> >> top of the function)
> >> There may be even more escape characters to worry about..
> >> witness the json2.js (Douglas Crockford's) version:
> > The table approach will obviously be the fastest but it will not escape
> > these additional unicode chars (the required table would be too big).
> > The original version of the function from json-rpc-java also escaped
> > unicode chars (in fact all above > 127).
> > I think the original bug you fixed was that it was not escaping \ and "
> > from looking at the code (as it was called only for chars < 32 or > 127
> > in the escapeString function)
> > The escaping of unicode chars, although not strictly neccessary, was due
> > to extensive browser testing, it was necessary so that it would work on
> > pages that were not themselves UTF-8 and where the xmlhttprequest
> > implementation didn't allow setting of the charset.
> > IIRC this was for webkit/konq - although this browser bug may be long
> > since fixed.
> > The original test case I used was sending unicode chars from code on a
> > page that itself has a different charset (I believe the unicode test
> > pages is utf8 so we would have to temporarily change it to something
> > else to run a test).
> > Feel free to bring in Douglas's version, i'm okay with that. We just
> > will need to add in copyright and license (include the text of his MIT
> > style license in the client) or get permission from him to redistribute
> > under ASL 2.0.
> >> function quote(string) {
> >> // If the string contains no control characters, no quote characters,
> and no
> >> // backslash characters, then we can safely slap some quotes around it.
> >> // Otherwise we must also replace the offending characters with safe
> escape
> >> // sequences.
> >> escapable.lastIndex = 0;
> >> return escapable.test(string) ?
> >> '"' + string.replace(escapable, function (a) {
> >> var c = meta[a];
> >> return typeof c === 'string' ? c :
> >> I don't know if it's faster or not- because it uses regular
> expressions...
> >> On Thu, Nov 27, 2008 at 7:48 PM, Michael Clark
> >> <mich...@metaparadigm.com <mailto:mich...@metaparadigm.com>> wrote:
> >> The reason it has 2 bugs is because the routine needs to return
> >> an escaped version of the ctrl char. e.g.
> >> if (c == "\b") return "\\b";
> >> Whereas the current array version just returns the original
> >> ctrl char unmodified without any escaping (as the original
> >> poster noted). You would need a second array with the unescaped
> >> versions of each of the control codes to make it work.
> >> Although for ultimate performance, I have been thinking about it
> >> and instead of the "chained if" I had originally, I propose we
> >> modify your array approach into a 32 element lookup table. It adds
> >> a small amount of bulk to the client - but is the ultimate in
> >> performance. It also fits within the coding convention in a small
> >> amount of space.
> >> Shall we test and commit something like this?
> >> Arthur Blake wrote:
> >> > After I wrote this, I looked into string escaping a little more.
> >> > Technically, we are supposed to escape "control characters".
> >> > What is exactly meant by a "control character" isn't rigidly
> defined
> >> > in the rail diagram, but the reference implementation at:
> >> > defines it. Speaking of the reference implementation- It's
> pretty
> >> > well written. I've long thought we should convert jabsorb to
> >> use this
> >> > at some point.
> >> > On Thu, Nov 27, 2008 at 9:13 AM, Arthur Blake
> >> <arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>
> >> > <mailto:arthur.bl...@gmail.com <mailto:arthur.bl...@gmail.com>>>
> >> wrote:
> >> > Will and I went round and round on this function early on...
> >> > The reason for this function is of course to convert a native
> >> > Javascript String into a json version of the String for
> >> transport.
> >> > That's why we took out the hex conversion. It's not strictly
> >> > needed as the String is still legal json (and a lot smaller
> and
> >> > more quickly processed) if we only escape characters that
> *must*
> >> > be escaped. (see the rail diagram for string at
> >> http://json.org)
> >> > Otherwise doing this hex conversion wastes extra CPU and
> >> makes the
> >> > escaped string a lot larger.
> >> > All we need to do is quickly find characters that *must* be
> >> > escaped for proper json conversion and transfer.
> >> > As far as the supposed *bug* found. I don't see how that is a
> >> > bug. Can you explain further?
> >> > The purpose of that code is to escape a double quote char and
> a
> >> > backslash chars if found.
> >> > If this is a bug, please explain how you found it, and how
> it's
> >> > hurting you.