Issue 81 in couchdb-python: Encoding is not quite right.

1 view
Skip to first unread message

codesite...@google.com

unread,
Jul 20, 2009, 8:39:29 PM7/20/09
to couchdb...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 81 by ian.schenck: Encoding is not quite right.
http://code.google.com/p/couchdb-python/issues/detail?id=81

What steps will reproduce the problem?
1. Create a view (in python) that yields a document.
2. Create a document with a string anywhere on it with characters out of
range of ascii.
3. View will explode when it processes that document.

What is the expected output? What do you see instead?

If I re-implement the same exact view in javascript, it works fine.

What version of the product are you using? On what operating system?

CouchDB 0.9.0, Tried CouchDB-Python 0.6 and then revision 185.

Please provide any additional information below.

Really, it's not the view code. But here they are anyways:

# Python
def by_customer(doc):
if doc.get('type', None) == 'internetvideo':
yield doc.get('customer',None),doc

Breaks the second I add a document with some upper unicode on it. However,
I can make that
view "pass" if I do something like...

# Python
def by_customer(doc):
if doc.get('type', None) == 'internetvideo':
doc['description'] = doc['description'].encode('utf-8')
yield doc.get('customer',None),doc


The corresponding javascript version works:

function( doc ) {
if( doc.type == 'internetvideo' ) {
emit(doc.customer, doc);
}
}

It do not believe it is acceptable to manually encode every string on every
possible object.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

codesite...@google.com

unread,
Jul 20, 2009, 8:57:33 PM7/20/09
to couchdb...@googlegroups.com

Comment #1 on issue 81 by ian.schenck: Encoding is not quite right.
http://code.google.com/p/couchdb-python/issues/detail?id=81

I made a tweak to the view server and this may (or may not) be correct.
Seems to fix the problem though at
least. Hope it's helpful.

Attachments:
encode_patch.diff 509 bytes

codesite...@google.com

unread,
Jul 21, 2009, 7:04:37 PM7/21/09
to couchdb...@googlegroups.com

Comment #2 on issue 81 by matt.goodall: Encoding is not quite right.
http://code.google.com/p/couchdb-python/issues/detail?id=81

I must admit, I was quite surprised by this patch as I would not have
expected the JSON to
need encoding. However it seems that couchdb-python tries to keep JSON as
unicode internally
by using simplejson.dump's ensure_ascii=False option.

So, the patch is probably correct if you're using simplejson (or the
stdlib's json) although
the BOM is probably unnecessary.

Unfortunately, I suspect it's *only* going to work with simplejson right
now as I believe
cjson always encodes to UTF-8. And that's going to affect how things are
handled internally.

Reply all
Reply to author
Forward
0 new messages