UnicodeDecodeError with Visual Studio files, RB 1.5.4, RBTools-0.3.2 (Python 2.7.1)

Craig A

unread,

Mar 23, 2011, 7:35:41 PM3/23/11

to reviewboard

I am getting an error similar to what was reported some time ago in
this post: http://groups.google.com/group/reviewboard/browse_thread/thread/56fb450ceaef45c1

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
1285: ordinal not in range(128)

>> I then went in to the rbtools postreview.py file, and changed:
>> return content_type, content.encode('utf-8')
>> to:
>> return content_type, content#.encode('utf-8')
>>
>> The user's post-review worked fine after that.

I too am seeing the problem only with Visual Studio 2008 .config
and.csproj files. I examined the postreview.py change that was
reported to fix the problem in the other post, and my version of
postreview.py seems to already have this fix (at the end of
_encode_multipart_formdata)

>> return content_type, content

The server is Ubuntu (although that likely does not matter since it is
blowing up client-side), installed following the directions from the
documentation.

Client OS is Windows 7 x64, Python info:

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'

We are loving Review Board but the manual process of creating reviews
via the web is too cumbersome, given the relative difficulty of
determining the 'Base Path' (it can be different for different
changesets) for the .patch files created by TortoiseSVN. We got this
patch http://reviews.reviewboard.org/r/2099/diff/ working great to
allow post-review to work with SVN changelists, so our last remaining
hurdle is getting past this encoding error. Thanks for any help you
can provide.

An example problem file can be found at this location:
http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=web.data.config

Chris Clark

unread,

Mar 25, 2011, 1:24:56 PM3/25/11

to revie...@googlegroups.com

Craig A wrote:
> I am getting an error similar to what was reported some time ago in
> this post: http://groups.google.com/group/reviewboard/browse_thread/thread/56fb450ceaef45c1
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
> 1285: ordinal not in range(128)
>

Specifically you are hitting:

http://groups.google.com/group/reviewboard/msg/13225461e83d7743

The BOM mark at the start of the file. Thanks for making the file
available so this could be confirmed. If this is a new file the diff
will include the BOM.

The error above usually occurs when there is an assumption that there is
a a Unicode string, and file diff contents should be bytes (i.e. it
shouldn't go through ANY string/unicode processing).

Can you post the error, the complete traceback? Along with the version
you are using.

You mention you've customized postreview (pretty common, my company does
too) but it is possible that your new code has the Unicode assumption in
it rather than the original RBtools. It is worth your while doing a
quick test with a virgin version of the latest RBTools (that traceback
would be the most helpful to people on the list).

Chris

Craig A

unread,

Mar 25, 2011, 6:15:03 PM3/25/11

to reviewboard

Hi Chris, thanks for taking time to look at this issue.

>> If this is a new file the diff will include the BOM.

The file is not new, but in fact the BOM is being introduced as part
of the change (the config file was rewritten using C# XML helper
classes - that introduces the BOM in the new file)

Here is a visual verification of that:
http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=BOMDiff.png

>> Thanks for making the file available so this could be confirmed.

Just to be complete, here are before and after files:

original file: http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=web.data.config.original
modified file: http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=web.data.config

>> Can you post the error, the complete traceback? Along with the version you are using.

Sure, the version of RBTools is 0.3.2. Here is a link to my modified
postreview.py (it is the stock 0.3.2 version with this diff applied -
> http://reviews.reviewboard.org/r/2099/diff/ ) and the full traceback
is at the bottom of this post.
http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=postreview.py

>> You mention you've customized postreview ...
>> is it possible that your new code has the Unicode assumption in

>> it rather than the original RBtools.

I don't think so, the relevant line of code looks like this:
>> return content_type, content

I don't see any call to .encode('utf-8') - in fact nowhere in the
entire postreview.py is there any encoding from what I can see. I
probably need to go out and brush up on character encoding issues and
figure it out for myself - I was just hoping someone else might have
run into this and solved it already :)

>> It is worth your while doing a quick test with a virgin version of the latest RBTools
>> (that traceback would be the most helpful to people on the list).

The latest version is 0.3.2 from what I can see at
http://downloads.reviewboard.org/releases/RBTools/0.3/ . I agree, I
will try reinstalling, perhaps with Python 2.6.6 this time.

It is probably worth mentioning that (like in the other posting) I can
upload a TortoiseSVN-generated patch file and Review Board handles it
fine that way. Here is a link to the patch file:
http://craigandliz.homedns.org/cw/ReviewBoard?action=AttachFile&do=get&target=TortoiseSVN-Gen.patch

Traceback (most recent call last):
File "C:\apps\python\2.7.1\scripts\post-review-script.py", line 8,
in <module>
load_entry_point('RBTools==0.3.2', 'console_scripts', 'post-
review')()
File "C:\apps\python\2.7.1\lib\site-packages\rbtools-0.3.2-py2.7.egg
\rbtools\postreview.py", line 3800, in main
submit_as=options.submit_as)
File "C:\apps\python\2.7.1\lib\site-packages\rbtools-0.3.2-py2.7.egg
\rbtools\postreview.py", line 3466, in tempt_fate
parent_diff_content)
File "C:\apps\python\2.7.1\lib\site-packages\rbtools-0.3.2-py2.7.egg
\rbtools\postreview.py", line 769, in upload_diff
fields, files)
File "C:\apps\python\2.7.1\lib\site-packages\rbtools-0.3.2-py2.7.egg
\rbtools\postreview.py", line 982, in api_post
return self.process_json(self.http_post(path, fields, files))
File "C:\apps\python\2.7.1\lib\site-packages\rbtools-0.3.2-py2.7.egg
\rbtools\postreview.py", line 903, in http_post
data = urllib2.urlopen(r).read()
File "C:\apps\python\2.7.1\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\apps\python\2.7.1\lib\urllib2.py", line 392, in open
response = self._open(req, data)
File "C:\apps\python\2.7.1\lib\urllib2.py", line 410, in _open
'_open', req)
File "C:\apps\python\2.7.1\lib\urllib2.py", line 370, in _call_chain
result = func(*args)
File "C:\apps\python\2.7.1\lib\urllib2.py", line 1186, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\apps\python\2.7.1\lib\urllib2.py", line 1155, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "C:\apps\python\2.7.1\lib\httplib.py", line 941, in request
self._send_request(method, url, body, headers)
File "C:\apps\python\2.7.1\lib\httplib.py", line 975, in
_send_request
self.endheaders(body)
File "C:\apps\python\2.7.1\lib\httplib.py", line 937, in endheaders
self._send_output(message_body)
File "C:\apps\python\2.7.1\lib\httplib.py", line 795, in
_send_output
msg += message_body

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position

689: ordinal not in range(128)

Chris Clark

unread,

Mar 25, 2011, 6:50:37 PM3/25/11

to revie...@googlegroups.com

Craig A wrote:
>>> You mention you've customized postreview ...
>>> is it possible that your new code has the Unicode assumption in
>>> it rather than the original RBtools.
>>>
>
> I don't think so, the relevant line of code looks like this:
>
>>> return content_type, content
>>>
>
>

The trace back does not appear to show that line of code. There appears
to be a concatenation of unicode and (2.x) str (bytes) type.

This may well be "fixed" by trying python 2.6 (or even 2.5) as you
suggested. You'd need to debug this through to see where the Unicode
string is coming from, 2.7 is the cross over version for 2.x and 3 and
it does try to deal with Unicode types when possible. Of course json
data is supposed to be utf8 encoded so this maybe where this is failing
down.

I've left the traceback below for reference.

Craig A

unread,

Apr 5, 2011, 12:31:51 PM4/5/11

to reviewboard

> This may well be "fixed" by trying python 2.6 (or even 2.5) as you
> suggested.

I tried Python 2.6.6 (could not find an msi installer for 2.5.5) and
voila! It works now, no more problems with BOM characters.

Thanks for your help Chris, I am glad this is working now.

In case this helps anyone else, I tweaked the Windows install script
InstallRBTools.cmd from here:
http://groups.google.com/group/reviewboard/browse_thread/thread/b46411ccd9f2e517?fwc=1

...to add x64/x86 support and the patch for svn changelist support:

@echo off
%~d0
cd %~dp0

REM These are the files that are part of this install script
REM \InstallRbTools.cmd
REM \postreview.py.changelistpatched
REM \resourcekit\pathman.exe
REM \diffutils\cmp.exe
REM \diffutils\diff.exe
REM \diffutils\diff3.exe
REM \diffutils\libiconv2.dll
REM \diffutils\libintl3.dll
REM \diffutils\sdiff.exe
REM \installers\python-2.6.6.amd64.msi
REM \installers\python-2.6.6.msi
REM \installers\setuptools-0.6c11.win32-py2.6.exe

echo This install script will install components for post-review,
echo the command-line client for ReviewBoard
echo.
echo Installing Python 2.6.6 into C:\apps\python\2.6.6

REM Set correct python installer path for x86/x64
SET PYTHONINSTALLER=python-2.6.6.amd64.msi
IF "%PROCESSOR_ARCHITECTURE%"=="x86" SET
PYTHONINSTALLER=python-2.6.6.msi

msiexec /passive /log %TEMP%\python-install.log TARGETDIR=C:\apps
\python\2.6.6 /i installers\%PYTHONINSTALLER%

echo.
echo Installing Python 2.6 SetupTools (accept all defaults during
install)
echo.
pause
installers\setuptools-0.6c11.win32-py2.6.exe

echo Copying "diff" command and dependent libraries
xcopy /y /s diffutils\*.* C:\apps\python\2.6.6\*.*

echo Copying pathman.exe (from Windows 2003 Server Resource Kit)
xcopy /y resourcekit\*.* C:\apps\python\2.6.6\*.*

echo Adding C:\apps\bin and Python to User PATH environment variable/
Windows registry
set PATH=%PATH%;C:\apps\python\2.6.6;C:\apps\python\2.6.6\scripts
pathman.exe /au C:\apps\python\2.6.6;C:\apps\python\2.6.6\scripts

echo Installing Review BoardRBToolsPython Package
easy_install -Z -U RBTools

REM This step patches postreview to support SVN changelists
http://www.google.com/url?sa=D&q=http://reviews.reviewboard.org/r/2099/diff/
xcopy /y postreview.py.changelistpatched "C:\apps\python\2.6.6\Lib
\site-packages\RBTools-0.3.2-py2.6.egg\rbtools\postreview.py"

echo.
echo Installation Complete
echo To use post-review, start a new command prompt in the source
folder
echo where the changes to be reviewed are. Ensure the changes are
organized into
echo a changelist that is named the same as the bug number.
echo Use 'post-review --label [changelistname]' to post the review to
the ReviewBoard
echo.

pause

Chris Clark

unread,

Apr 5, 2011, 4:42:52 PM4/5/11

to revie...@googlegroups.com

Craig A wrote:
>> This may well be "fixed" by trying python 2.6 (or even 2.5) as you
>> suggested.
>>
>
> I tried Python 2.6.6 (could not find an msi installer for 2.5.5) and
> voila! It works now, no more problems with BOM characters.
>
> Thanks for your help Chris, I am glad this is working now.
>

Whilst this impacted a Windows user (Windows Python defaults to 7 bit
US-ASCII encoding) with a utf8 encoded (diff) file. This could easily
impact any user on any platform where the encoding for the file is not
valid for the (default) Python site string encoding. For example this is
likely to impact users of Python 2.7 if:

* diff contains a single latin1 character (e.g. copyright symbol, an
"e" with an accent, etc.) and the site encoding is not latin1.
This would impact all Unix/Linux platforms where the Python
installation tends to default to utf8

It looks like the Python 3.x byte/string difference that is in 2.7 will
need to be looked at. Possibly via an encoding flag to postreview.

I'm snowed under at the moment otherwise I'd look at this myself :-(

I have a number of Windows users (with non ascii diffs) so I'm at risk
for this problem BUT this isn't an issue for me as I supply them with an
exe generated with Python 2.4 and py2exe. Craig This may be your best
option.

Chris

Reply all

Reply to author

Forward