Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  17 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 3:59 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 07:59:08 +0000
Local: Sat, May 14 2011 3:59 am
Subject: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 179 by heshim...@gmail.com: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

What steps will reproduce the problem?
1.Create a document in couchdb, with some Chinese character like "文档"
2.Run couchdb-dump on the database

What is the expected output? What do you see instead?
couchdb-dump crashes upon reaching this document. Here are the last lines  
of the trace:
   File "/pylonsenv/lib/python2.6/site-packages/couchdb/multipart.py", line  
122, in __init__
     self._write_headers(headers)
   File "/pylonsenv/lib/python2.6/site-packages/couchdb/multipart.py", line  
175, in _write_headers
     self.fileobj.write(headers[name])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:  
ordinal not in range(128)

What version of the product are you using? On what operating system?
couchdb-python 0.8 against couchdb 1.0.1 on Ubuntu.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 4:21 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 08:21:19 +0000
Local: Sat, May 14 2011 4:21 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #1 on issue 179 by heshim...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

I just needed a quick solution to dump the database and reload it in  
another environment. So I made some changes to multipart.py to get pass  
this utf-8 thing. It did work.

However, I understand that other parts are using multipart.py too. This  
probably won't fit the MIME standard. If I have time, I'll investigate  
further and provide a patch that does satisfy the MIME standard.

Attachments:
        utf-8_dump_load.patch  1.1 KB


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 4:25 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 08:25:21 +0000
Local: Sat, May 14 2011 4:25 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #2 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Confirm. There is also invalid test case about how multipart module works  
with unicode data: StringIO could handle mixed "str" and "unicode" values,  
but files requires only "str" one.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 5:21 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 09:21:33 +0000
Local: Sat, May 14 2011 5:21 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #3 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Sorry, I was wrong about tests - StringIO confused me(: Don't rush, sit  
down and think about...yes(:
There is no needs to fix multipart module, only dump tool due to it pass  
unicode document id to multipart writer. This is about dump-tool.patch.

dump-tool-2.patch solves same problem, but with respect of Content-Type  
header and his charset. I suppose, that would a more correct solution.

Attachments:
        dump-tool.patch  648 bytes
        dump-tool-2.patch  1.5 KB


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 6:43 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 10:43:51 +0000
Local: Sat, May 14 2011 6:43 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #4 on issue 179 by heshim...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Ah, that's much smarter. Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 6:48 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 10:48:51 +0000
Local: Sat, May 14 2011 6:48 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #5 on issue 179 by heshim...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Hmm... another thing. I was under the impression that utf-8 encoded strings  
aren't valid ascii. Currently, isn't multipart.py expecting strict ascii  
strings as header?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 7:14 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 11:14:56 +0000
Local: Sat, May 14 2011 7:14 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #6 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Actually, only first 128 chars of utf-8 encoding are valid ascii. Problem  
was not in what characters in headers, but in type of string multipart  
tries to write into output stream. Files and streams doesn't expects pure  
unicode strings, but favors stings called as "bytes" in Python 3  
terminology and multipart module expects this behavior.

But there was a "hack" which adds to headers document id which used by  
couchdb-load tool to help create document with same id value. Since  
document id could be unicode, this "hack" breaks expectations and makes  
multipart crash.

You could try revert patch and replace in dump.py default value of output  
argument in dump_db function from sys.stdout to StringIO.StringIO and error  
wouldn't be occurred because StringIO could handle both str and unicode  
values.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 8:25 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 12:25:10 +0000
Local: Sat, May 14 2011 8:25 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #7 on issue 179 by djc.ocht...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

IMO the correct way to have non-ASCII strings in MIME headers would be to  
use RFC 2047 encoding for any non-ascii header values.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options May 14 2011, 8:51 am
From: couchdb-pyt...@googlecode.com
Date: Sat, 14 May 2011 12:51:15 +0000
Local: Sat, May 14 2011 8:51 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #8 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Correct, but looks like an overhead in such case, because it would applied  
only to one header while others should follow RFC 822. Wouldn't be better  
to use base64 encoding?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Jun 2 2011, 2:47 am
From: couchdb-pyt...@googlecode.com
Date: Thu, 02 Jun 2011 06:47:19 +0000
Local: Thurs, Jun 2 2011 2:47 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #9 on issue 179 by heshim...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Hmm... I'd like to make a note here that kxepal's dump-tool-2.patch  
actually generated some invalid multipart boundaries.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Sep 21 2012, 4:33 am
From: couchdb-pyt...@googlecode.com
Date: Fri, 21 Sep 2012 08:33:25 +0000
Local: Fri, Sep 21 2012 4:33 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Updates:
        Owner: kxepal

Comment #10 on issue 179 by djc.ochtman: couchdb-dump cannot deal with  
unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

(No comment was entered for this change.)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Sep 21 2012, 8:45 pm
From: couchdb-pyt...@googlecode.com
Date: Sat, 22 Sep 2012 00:45:28 +0000
Local: Fri, Sep 21 2012 8:45 pm
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Updates:
        Labels: Milestone-0.9

Comment #11 on issue 179 by wickedg...@gmail.com: couchdb-dump cannot deal  
with unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

(No comment was entered for this change.)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Oct 22 2012, 7:27 am
From: couchdb-pyt...@googlecode.com
Date: Mon, 22 Oct 2012 11:27:37 +0000
Local: Mon, Oct 22 2012 7:27 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #12 on issue 179 by djc.ochtman: couchdb-dump cannot deal with  
unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Any progress on this?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Oct 22 2012, 7:33 am
From: couchdb-pyt...@googlecode.com
Date: Mon, 22 Oct 2012 11:33:14 +0000
Local: Mon, Oct 22 2012 7:33 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids

Comment #13 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Yes, will submit patch with tests during this week. I'd agreed with you  
about RFC 2047 specification, so diving into it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Apr 24, 1:20 pm
From: couchdb-pyt...@googlecode.com
Date: Wed, 24 Apr 2013 17:20:24 +0000
Local: Wed, Apr 24 2013 1:20 pm
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Updates:
        Status: Accepted

Comment #14 on issue 179 by kxepal: couchdb-dump cannot deal with unicode  
characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Patch attached. Non-ascii headers now encoded following RFC 2047. Actually,  
I feel to rewrite multipart module to let him base on top of email package,  
but probably that would be another issue - need to workaround some email  
specific features to keep backward compatibility.

Attachments:
        couchdb-python_485.patch  3.9 KB

--
You received this message because this project is configured to send all  
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Apr 25, 6:09 am
From: couchdb-pyt...@googlecode.com
Date: Thu, 25 Apr 2013 10:09:49 +0000
Local: Thurs, Apr 25 2013 6:09 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Updates:
        Status: Fixed

Comment #16 on issue 179 by djc.ochtman: couchdb-dump cannot deal with  
unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

Pushed a slightly changed patch as rce40fd77ae8d, thanks!

--
You received this message because this project is configured to send all  
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
couchdb-pyt...@googlecode.com  
View profile  
 More options Apr 25, 7:17 am
From: couchdb-pyt...@googlecode.com
Date: Thu, 25 Apr 2013 11:17:56 +0000
Local: Thurs, Apr 25 2013 7:17 am
Subject: Re: Issue 179 in couchdb-python: couchdb-dump cannot deal with unicode characters in doc ids
Updates:
        Labels: -Milestone-0.9

Comment #17 on issue 179 by djc.ochtman: couchdb-dump cannot deal with  
unicode characters in doc ids
http://code.google.com/p/couchdb-python/issues/detail?id=179

(No comment was entered for this change.)

--
You received this message because this project is configured to send all  
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »