function escapeJSONChar problem

162 views
Skip to first unread message

lxb...@gmail.com

unread,
Nov 26, 2008, 10:43:12 PM11/26/08
to jabsorb-user
var escapeJSONChar=function ()
{
var escapeChars = ["\b","\t","\n","\f","\r"];

return function(c){
// Need to do these first as their ascii values are > 32 (34 & 92)
if(c == "\"" || c == "\\")
{
return "\\"+c;
}
//Otherwise it doesn't need escaping
if(c.charCodeAt(0)>=32)
{
return c;
}
// Otherwise it is has a code < 32 and may need escaping.
for(var i=0;i<escapeChars.length;i++)
{
if(c==escapeChars[i])
{
return "\\" + c;
}
}
// it was a character from 0-31 that wasn't one of the escape
chars
return c;
};
}();


this function can not return correct quoted char .
this line : return "\\" + c; is wrong , originalchar returnd

Michael Clark

unread,
Nov 26, 2008, 11:39:39 PM11/26/08
to jabsor...@googlegroups.com
Good spotting!

It might be better with the 'chained if' we had in the earlier version
of that function. e.g.

if (c == "\b") return "\\b";
else if (c == "\f") return "\\f";
else if (c == "\n") return "\\n";
else if (c == "\r") return "\\r";
else if (c == "\t") return "\\t";

It is most likely faster as there is no need for loop local variable
initialization and loop iterator testing, etc.

From looking, the current version does not hex encode control chars
as required by the JSON spec (bare control chars are not allowed).

This is based on an older version of the function (without the bug)
but I have not tested it yet.

escapeJSONChar =
function escapeJSONChar(c)
{
var code = c.charCodeAt(0);
if (code >= 32) return c;
else if (c == "\"" || c == "\\") return "\\" + c;
else if (c == "\b") return "\\b";
else if (c == "\f") return "\\f";
else if (c == "\n") return "\\n";
else if (c == "\r") return "\\r";
else if (c == "\t") return "\\t";
var hex = code.toString(16);
if (hex.length == 1) return "\\u000" + hex;
else return "\\u00" + hex;
};

The current jabsorb coding style doesn't allow for concise expression of
chained if, so this would have to be modified before we can make a patch.
Let us know if this works for you and I'll translate it (personally I
prefer a more concise coding style). grumble grumble. :)

Arthur Blake

unread,
Nov 27, 2008, 9:13:04 AM11/27/08
to jabsor...@googlegroups.com
Will and I went round and round on this function early on...

The reason for this function is of course to convert a native Javascript String into a json version of the String for transport.

That's why we took out the hex conversion.  It's not strictly needed as the String is still legal json (and a lot smaller and more quickly processed) if we only escape characters that *must* be escaped.  (see the rail diagram for string at http://json.org)

Otherwise doing this hex conversion wastes extra CPU and makes the escaped string a lot larger.
All we need to do is quickly find characters that *must* be escaped for proper json conversion and transfer.

As far as the supposed *bug* found.  I don't see how that is a bug.  Can you explain further?
The purpose of that code is to escape a double quote char and a backslash chars if found.

If this is a bug, please explain how you found it, and how it's hurting you.

As far as the chained if goes:  How about using a switch statement instead... wouldn't that be faster than both the chained if and the array scan?

Arthur Blake

unread,
Nov 27, 2008, 9:47:57 AM11/27/08
to jabsor...@googlegroups.com
After I wrote this, I looked into string escaping a little more.  Technically, we are supposed to escape "control characters".
What is exactly meant by a "control character" isn't rigidly defined in the rail diagram, but the reference implementation at:

http://www.json.org/json2.js

defines it.  Speaking of the reference implementation-  It's pretty well written.  I've long thought we should convert jabsorb to use this at some point.

Michael Clark

unread,
Nov 27, 2008, 7:48:12 PM11/27/08
to jabsor...@googlegroups.com
The reason it has 2 bugs is because the routine needs to return
an escaped version of the ctrl char. e.g.

if (c == "\b") return "\\b";

Whereas the current array version just returns the original
ctrl char unmodified without any escaping (as the original
poster noted). You would need a second array with the unescaped
versions of each of the control codes to make it work.

Although for ultimate performance, I have been thinking about it
and instead of the "chained if" I had originally, I propose we
modify your array approach into a 32 element lookup table. It adds
a small amount of bulk to the client - but is the ultimate in
performance. It also fits within the coding convention in a small
amount of space.

Shall we test and commit something like this?

var ctrlChars = [ "\u0000", "\u0001", "\u0002", "\u0003",
"\u0004", "\u0005", "\u0006", "\u0007",
"\b", "\t", "\n", "\u000b",
"\f", "\r", "\u000e", "\u000f",
"\u0010", "\u0011", "\u0012", "\u0013",
"\u0014", "\u0015", "\u0016", "\u0017",
"\u0018", "\u0019", "\u001a", "\u001b",
"\u001c", "\u001d", "\u001e", "\u001f" ];

escapeJSONChar =
function escapeJSONChar(c)
{
var code = c.charCodeAt(0);
if (code >= 32)
{
return c;
}
else
{
return ctrlChars[code];
}
};

Arthur Blake

unread,
Nov 27, 2008, 8:13:14 PM11/27/08
to jabsor...@googlegroups.com
OK, I see the bug now- thanks for explaining it to me.
(I was looking at the first instance of 'return "\\" + c;' near the top of the function)

There may be even more escape characters to worry about..
witness the json2.js (Douglas Crockford's) version:

    var cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
escapable = /[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
gap,
indent,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
},
rep;


function quote(string) {

// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.

escapable.lastIndex = 0;
return escapable.test(string) ?
'"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string' ? c :
'\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' :
'"' + string + '"';
}


I don't know if it's faster or not- because it uses regular expressions...

Michael Clark

unread,
Nov 27, 2008, 8:37:37 PM11/27/08
to jabsor...@googlegroups.com


Arthur Blake wrote:
> OK, I see the bug now- thanks for explaining it to me.
> (I was looking at the first instance of 'return "\\" + c;' near the
> top of the function)
>
> There may be even more escape characters to worry about..
> witness the json2.js (Douglas Crockford's) version:
>
> var cx = /[\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
> escapable = /[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g,
>
> gap,
> indent,
> meta = { // table of character substitutions
> '\b': '\\b',
> '\t': '\\t',
> '\n': '\\n',
>
> '\f': '\\f',
> '\r': '\\r',
> '"' : '\\"',
> '\\': '\\\\'
> },
> rep;
>

The table approach will obviously be the fastest but it will not escape
these additional unicode chars (the required table would be too big).

The original version of the function from json-rpc-java also escaped
unicode chars (in fact all above > 127).

I think the original bug you fixed was that it was not escaping \ and "
from looking at the code (as it was called only for chars < 32 or > 127
in the escapeString function)

The escaping of unicode chars, although not strictly neccessary, was due
to extensive browser testing, it was necessary so that it would work on
pages that were not themselves UTF-8 and where the xmlhttprequest
implementation didn't allow setting of the charset.

IIRC this was for webkit/konq - although this browser bug may be long
since fixed.

The original test case I used was sending unicode chars from code on a
page that itself has a different charset (I believe the unicode test
pages is utf8 so we would have to temporarily change it to something
else to run a test).

Feel free to bring in Douglas's version, i'm okay with that. We just
will need to add in copyright and license (include the text of his MIT
style license in the client) or get permission from him to redistribute
under ASL 2.0.
> > <mailto:arthur...@gmail.com <mailto:arthur...@gmail.com>>>
> <mailto:mic...@metaparadigm.com

Michael Clark

unread,
Nov 27, 2008, 10:14:13 PM11/27/08
to jabsor...@googlegroups.com, Jabsorb Developer List, lxb...@gmail.com
Below is a minimal patch for 1.3 based on a lookup table-based proposal.
I have tested in firefox and run unit tests on it.

Douglas's version I see is not character by character so it may well
be faster. Where exactly is it from Arthur? Do you have a link to it?
(he has lots of links to his various JSON code on his site and I couldn't
find the particular one).

I see some of his work is public domain but some has a MIT style license.
I'll let you and William decide on how you want to handle it unless
you are both happy with the fix below.

This patch can at least be used in the interim for those who need a fix.

Thanks to the original poster for spotting the bug.

--- webapps/jsonrpc/jsonrpc.js (revision 436)
+++ webapps/jsonrpc/jsonrpc.js (working copy)
@@ -26,30 +26,32 @@

/* escape a character */

+var ctrlChars = [ "\\u0000", "\\u0001", "\\u0002", "\\u0003",
+ "\\u0004", "\\u0005", "\\u0006", "\\u0007",
+ "\\b", "\\t", "\\n", "\\u000b",
+ "\\f", "\\r", "\\u000e", "\\u000f",
+ "\\u0010", "\\u0011", "\\u0012", "\\u0013",
+ "\\u0014", "\\u0015", "\\u0016", "\\u0017",
+ "\\u0018", "\\u0019", "\\u001a", "\\u001b",
+ "\\u001c", "\\u001d", "\\u001e", "\\u001f" ];
+
var escapeJSONChar=function ()
{
- var escapeChars = ["\b","\t","\n","\f","\r"];
+ return function(c){

- return function(c){
// Need to do these first as their ascii values are > 32 (34 & 92)
if(c == "\"" || c == "\\")
{
return "\\"+c;
}
- //Otherwise it doesn't need escaping
- if(c.charCodeAt(0)>=32)
+ // Escape control characters 0 to 31
+ var code = c.charCodeAt(0);
+ if (code < 32)
{
- return c;
+ return ctrlChars[code];
}
- // Otherwise it is has a code < 32 and may need escaping.
- for(var i=0;i<escapeChars.length;i++)
- {
- if(c==escapeChars[i])
- {
- return "\\" + c;
- }
- }
- // it was a character from 0-31 that wasn't one of the escape chars
+ // Otherwise we don't escape (although we need to look into
+ // escaping certain unicode chars here)
return c;
};
}();

Arthur Blake

unread,
Nov 28, 2008, 9:38:03 AM11/28/08
to jabsor...@googlegroups.com, Jabsorb Developer List, lxb...@gmail.com
It is public domain-- from here:

http://www.json.org/json2.js

So I think we are safe hacking up pieces of it and using it.
Reply all
Reply to author
Forward
0 new messages