node.encodeUtf8

15 views
Skip to first unread message

Connor Dunn

unread,
Aug 2, 2009, 11:55:00 AM8/2/09
to nodejs
node.encodeUtf8 doesn't seem to work quite correctly with multibyte
characters. I think this is because String.fromCharCode wants Unicode
rather than utf8 character codes? I adapted the function from
https://cixar.com/tracs/javascript/browser/trunk/src/crypt/utf8.js?rev=286&format=txt
which seems to do the job.

function encodeUtf8(array) {
var string = "";
var i = 0;
var c = c1 = c2 = 0;

while (i < array.length) {
c = array[i];

if (c < 128) {
string += String.fromCharCode(c);
i++;
} else if ((c > 191) && (c < 224)) {
c2 = array[i+1]
string += String.fromCharCode(((c & 31) << 6) | (c2 &
63));
i += 2;
} else {
c2 = array[i+1];
c3 = array[i+2];
string += String.fromCharCode(((c & 15) << 12) | ((c2 &
63) << 6) | (c3 & 63));
i += 3;
}
}

return string;
};

ryan dahl

unread,
Aug 3, 2009, 5:27:17 AM8/3/09
to nod...@googlegroups.com
Wow. Okay - so charCode is different than the utf8 character code. I
guess that makes sense since utf8 characters would span multiple
integers. Do you have some strings you're using to test with? I can
write a test and replace encodeUtf8().

Connor Dunn

unread,
Aug 3, 2009, 9:24:49 AM8/3/09
to nod...@googlegroups.com
Ok,

[116,101,115,116,32,206,163,207,131,207,128,206,177,32,226,161,140,226,160,129,226,160,167,226,160,145]

has 4x 1 byte characters, a space, 4x 2 byte, space, 4x 3byte. Should
look like "test Σσπα ⡌⠁⠧⠑". Also

[240,144,129,128,240,144,129,129,240,144,130,139,240,144,129,131]

Which is 4x 4 byte characters I believe, however my function won't
decode these and receiving them as utf8 in node rather than raw does
not seem to work either I grabbed them from
http://en.wikipedia.org/wiki/Linear_B

Connor

2009/8/3 ryan dahl <coldre...@gmail.com>:

ryan dahl

unread,
Aug 6, 2009, 7:38:40 AM8/6/09
to nod...@googlegroups.com
Thanks. Applied in 9b3baf3d508dbef355b2952b15b5dd3f854f92cb.
Reply all
Reply to author
Forward
0 new messages