CouchDB encoding issue

335 views
Skip to first unread message

Dan Phiffer

unread,
Dec 22, 2010, 1:24:26 AM12/22/10
to nod...@googlegroups.com
Hello,

I'm getting tripped up on a simple character encoding issue. I must be missing something obvious here...

var http = require('http');

var db = new http.createClient(5984, '127.0.0.1');
var test = '{"hello":"“hello world”"}'; // Note the use of smartquotes

// Attempt to upload to CouchDB
var req = db.request('PUT', '/db/doc', {
'Content-Type': 'application/json',
'Content-Length': test.length
});
req.on('response', function(res) {
res.setEncoding('utf8');
res.on('data', function(chunk) {
console.log(chunk); // {"error":"bad_request","reason":"invalid UTF-8 JSON"}
});
});
req.write(test, 'utf8');
req.end();

Here's what is being logged on the CouchDB end:

1> [debug] [<0.3718.0>] Invalid JSON: <<123,34,104,101,108,108,111,34,58,34,226,128,156,104,101,108,
108,111,32,119,111,114,108,100,226>>

Thanks for any clues,
-Dan

Marak Squires

unread,
Dec 22, 2010, 1:31:05 AM12/22/10
to nod...@googlegroups.com
You are setting a content length but not sending any data?


--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Fedor Indutny

unread,
Dec 22, 2010, 1:33:15 AM12/22/10
to nod...@googlegroups.com
Nope, hi is sending data, but wrong.

Try this:
var http = require('http');

var db = new http.createClient(5984, '127.0.0.1');
var test = '{"hello":“hello world"}'; // Note the use of smartquotes

// Attempt to upload to CouchDB
var req = db.request('PUT', '/db/doc', {
 'Content-Type': 'application/json',
 'Content-Length': test.length
});
req.on('response', function(res) {
 res.setEncoding('utf8');
 res.on('data', function(chunk) {
   console.log(chunk); // {"error":"bad_request","reason":"invalid UTF-8 JSON"}
 });
});
req.write(test, 'utf8');
req.end();


2010/12/22 Marak Squires <marak....@gmail.com>

Dan Phiffer

unread,
Dec 22, 2010, 8:01:31 AM12/22/10
to nod...@googlegroups.com
On Dec 21, 2010, at 10:33 PM, Fedor Indutny wrote:

var test = '{"hello":“hello world"}'; // Note the use of smartquotes

That is definitely invalid JSON! Here's a different test case that also fails:

var test = JSON.stringify({
  anythingNonASCII: "ñ"
});

BTW, this is with node 0.3.2 and CouchDBX 1.0.1.1.

Jacob Chapel

unread,
Dec 22, 2010, 8:19:39 AM12/22/10
to nod...@googlegroups.com
I just do objects in Node and send it with JSON.stringify.

var bar = {key: 'value'}

req.write(JSON.stringify(bar))

Also you shouldn't have to add utf8 to the write function because by default if it is a string it will send it as utf8. 

var http = require('http');

var db = new http.createClient(5984, '127.0.0.1');
var test = {hello: 'hello world'}; // Making it an object instead of a string.


// Attempt to upload to CouchDB
var req = db.request('PUT', '/db/doc', {
 'Content-Type': 'application/json',
 'Content-Length': test.length
});
req.on('response', function(res) {
 res.setEncoding('utf8'); // Not needed

 res.on('data', function(chunk) {
   console.log(chunk); // {"error":"bad_request","reason":"invalid UTF-8 JSON"}
 });
});
req.write(JSON.stringify(test)); // So you can guarantee it is being sent as JSON.
req.end();

Dan Phiffer

unread,
Dec 22, 2010, 8:35:05 AM12/22/10
to nod...@googlegroups.com
On Dec 22, 2010, at 5:19 AM, Jacob Chapel wrote:

I just do objects in Node and send it with JSON.stringify.

var bar = {key: 'value'}

Try with something like "vålué" and I think this will fail in the same way my code does.

I should point out this is a simplified test case for code that's already running successfully. My users were pointing out that certain content wasn't getting saved into the database, and I've tracked it down to the very broad category "anything non-ASCII".

Am I supposed to \UXXXX encode my strings? Is there a library for that by any chance?

Thanks,
-Dan

Jorge

unread,
Dec 22, 2010, 8:53:59 AM12/22/10
to nod...@googlegroups.com
On 22/12/2010, at 14:01, Dan Phiffer wrote:
> On Dec 21, 2010, at 10:33 PM, Fedor Indutny wrote:
>
>> var test = '{"hello":“hello world"}'; // Note the use of smartquotes
>
> That is definitely invalid JSON!

It isn't invalid:

JSON.parse('{"hello":"hello world"}')
-> Object
• hello: "hello world"
• __proto__: Object

> Here's a different test case that also fails:
>
> var test = JSON.stringify({
> anythingNonASCII: "ñ"
> });

But it's valid JSON too:

JSON.stringify({ anythingNonASCII: "ñ" })
-> "{"anythingNonASCII":"ñ"}"

Jorge.

Jorge

unread,
Dec 22, 2010, 8:59:03 AM12/22/10
to nod...@googlegroups.com
On 22/12/2010, at 14:35, Dan Phiffer wrote:
> On Dec 22, 2010, at 5:19 AM, Jacob Chapel wrote:
>
>> I just do objects in Node and send it with JSON.stringify.
>>
>> var bar = {key: 'value'}
>
> Try with something like "vålué" and I think this will fail in the same way my code does.
>
> I should point out this is a simplified test case for code that's already running successfully. My users were pointing out that certain content wasn't getting saved into the database, and I've tracked it down to the very broad category "anything non-ASCII".
>
> Am I supposed to \UXXXX encode my strings? Is there a library for that by any chance?

You can do it but you don't have to, JSON strings can contain ~ any unicode char : http://json.org : string: "Any unicode character except " or \ or control character"

char
any-Unicode-character-
except-"-or-\-or-
control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
--
Jorge.

Dan Phiffer

unread,
Dec 22, 2010, 9:23:56 AM12/22/10
to nod...@googlegroups.com

On Dec 22, 2010, at 5:53 AM, Jorge wrote:

> On 22/12/2010, at 14:01, Dan Phiffer wrote:
>> On Dec 21, 2010, at 10:33 PM, Fedor Indutny wrote:
>>
>>> var test = '{"hello":“hello world"}'; // Note the use of smartquotes
>>
>> That is definitely invalid JSON!
>
> It isn't invalid:
>
> JSON.parse('{"hello":"hello world"}')
> -> Object
> • hello: "hello world"
> • __proto__: Object

I was just pointing out that Fedor was using a left double quote (a so-called smartquote) to delimit a string, which is invalid. Your example is different and valid!


>> Here's a different test case that also fails:
>>
>> var test = JSON.stringify({
>> anythingNonASCII: "ñ"
>> });
>
> But it's valid JSON too:
>
> JSON.stringify({ anythingNonASCII: "ñ" })
> -> "{"anythingNonASCII":"ñ"}"
>

Yes, it's valid JSON but if I try to send it to CouchDB with a ClientRequest it turns wonky somehow in transit. I've tested sending similar JSON objects (i.e., those with strings that contain non-ASCII characters) with jquery.couch.js in a browser and that works fine. So I'm fairly certain this isn't a problem originating from CouchDB.

Thanks,
-Dan

Jorge

unread,
Dec 22, 2010, 9:32:46 AM12/22/10
to nod...@googlegroups.com
On 22/12/2010, at 15:23, Dan Phiffer wrote:
> On Dec 22, 2010, at 5:53 AM, Jorge wrote:
>> On 22/12/2010, at 14:01, Dan Phiffer wrote:
>>> On Dec 21, 2010, at 10:33 PM, Fedor Indutny wrote:
>>>
>>>> var test = '{"hello":“hello world"}'; // Note the use of smartquotes
>>>
>>> That is definitely invalid JSON!
>>
>> It isn't invalid:
>>
>> JSON.parse('{"hello":"hello world"}')
>> -> Object
>> • hello: "hello world"
>> • __proto__: Object
>
> I was just pointing out that Fedor was using a left double quote (a so-called smartquote) to delimit a string, which is invalid. Your example is different and valid!
>
>
>>> Here's a different test case that also fails:
>>>
>>> var test = JSON.stringify({
>>> anythingNonASCII: "ñ"
>>> });
>>
>> But it's valid JSON too:
>>
>> JSON.stringify({ anythingNonASCII: "ñ" })
>> -> "{"anythingNonASCII":"ñ"}"
>>
>
> Yes, it's valid JSON but if I try to send it to CouchDB with a ClientRequest it turns wonky somehow in transit. I've tested sending similar JSON objects (i.e., those with strings that contain non-ASCII characters) with jquery.couch.js in a browser and that works fine. So I'm fairly certain this isn't a problem originating from CouchDB.
>

Perhaps the problem is content-length. You're sending the number of chars not the number of bytes. Perhaps you should convert test= new Buffer(test); instead of using it as a string, so it'll give the proper length (in bytes, not chars).
--
Jorge.

Dan Phiffer

unread,
Dec 22, 2010, 9:43:51 AM12/22/10
to nod...@googlegroups.com

That was the problem, thanks so much Jorge!

var req = db.request('PUT', '/db/doc', {
'Content-Type': 'application/json',

'Content-Length': Buffer.byteLength(test, 'utf8')
});

Jorge

unread,
Dec 22, 2010, 9:55:04 AM12/22/10
to nod...@googlegroups.com
On 22/12/2010, at 15:43, Dan Phiffer wrote:
> On Dec 22, 2010, at 6:32 AM, Jorge wrote:
>>
>>
>> Perhaps the problem is content-length. You're sending the number of chars not the number of bytes. Perhaps you should convert test= new Buffer(test); instead of using it as a string, so it'll give the proper length (in bytes, not chars).
>
> That was the problem, thanks so much Jorge!
>
> var req = db.request('PUT', '/db/doc', {
> 'Content-Type': 'application/json',
> 'Content-Length': Buffer.byteLength(test, 'utf8')
> });


Glad to know. But perhaps it would be better to convert it to a buffer just *once* : test= new Buffer(test); and then use 'Content-Length': test.length and req.end(test) ?
--
Jorge.

Dan Phiffer

unread,
Dec 22, 2010, 11:05:11 AM12/22/10
to nod...@googlegroups.com

I'm sure there are some advantages to this approach, but ultimately for me it comes down to adhering to API documentation:

http://nodejs.org/docs/v0.3.2/api/http.html#request.write
"The chunk argument should be an array of integers or a string."

The Buffer object surely implements toString(), but I think Postel's Law is still appropriate here: "be conservative in what you send; be liberal in what you accept."

Thanks,
-Dan

Dan Phiffer

unread,
Dec 22, 2010, 12:01:41 PM12/22/10
to nod...@googlegroups.com
In case anyone else runs into this problem in the future, I've patched node-couchdb-min to handle non-ASCII strings:

https://github.com/dphiffer/node-couchdb-min/commit/7d97455c367d18a6ad6e4dec22a2ca16b22a3c35

Reply all
Reply to author
Forward
0 new messages