RBTools UnicodeDecodeError of fresh VS2K8 project from fresh Win7 x64 + Py26 + RBTools0.2RC1

88 views
Skip to first unread message

Pv

unread,
Feb 23, 2010, 7:20:49 PM2/23/10
to reviewboard
I have installed a fresh copy of Win7 x64 and VS2K8.
I then installed Py2.6 + RBTools 0.2rc1.
I create a new C# project, svn add, and then post-review the project.
I get a "UnicodeDecodeError" on the Solution's/Project's xml/config
files.
There is a signature of "\xef\xbb\xbf" at the beginning of these
files.

So I do this:
c:\>svn diff --diff-cmd=diff > output.txt
c:\>python.exe
>>>f = open('output.txt')
>>>s = f.read()
>>>unicode(s)
...
UnicodeDecodeError: ...

NOTE that a "s.encode('utf8')" seems to work fine, but if I alter
rbtools to do that, the reviewboard server croaks on the upload.

Has anyone seen this issue recently?

Pv

unread,
Feb 23, 2010, 7:26:00 PM2/23/10
to reviewboard
NOTE: If I manually browse to my reviewboard server and upload the
diff it accepts it no problem.

I am not doing anything to intentionally encode these files beyond the
default VS2K8 encoding.

Pv

Christian Hammond

unread,
Feb 23, 2010, 8:27:38 PM2/23/10
to revie...@googlegroups.com
Hi Pv,

Which version of Review Board is this?

Christian

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com


--
Want to help the Review Board project? Donate today at http://www.reviewboard.org/donate/
Happy user? Let us know at http://www.reviewboard.org/users/
-~----------~----~----~----~------~----~------~--~---
To unsubscribe from this group, send email to reviewboard...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/reviewboard?hl=en

Pv

unread,
Feb 24, 2010, 7:55:42 PM2/24/10
to reviewboard
1.1 alpha 2 (dev)

I am pretty sure this aborts in RBTools itself before it ever gets to
the server.
Again, a manual upload of the diff file to the server works fine.

Pv

On Feb 23, 5:27 pm, Christian Hammond <chip...@chipx86.com> wrote:
> Hi Pv,
>
> Which version of Review Board is this?
>
> Christian
>
> --

> Christian Hammond - chip...@chipx86.com
> Review Board -http://www.reviewboard.org
> VMware, Inc. -http://www.vmware.com

> > Happy user? Let us know athttp://www.reviewboard.org/users/


> > -~----------~----~----~----~------~----~------~--~---
> > To unsubscribe from this group, send email to

> > reviewboard...@googlegroups.com<reviewboard%2Bunsubscribe@googlegr oups.com>

Pv

unread,
Feb 25, 2010, 2:15:37 PM2/25/10
to reviewboard
FYI: I ran an older version of post-review on the same code and it
uploaded just fine.
I "easy_install -U rbtools" and ran "post-review -r #" and it failed
w/ the UnicodeDecodeError.
Again, the files post-review is having a hard time with use the
default encoding of Visual Studio 2008.

After running in to the problem w/ default encoding I have tried
saving the culprit files w/ various other encodings, but nothing seems
to make post-review happy.

Pv

Thilo-Alexander Ginkel

unread,
Feb 25, 2010, 3:25:27 PM2/25/10
to revie...@googlegroups.com
On Thursday 25 February 2010 01:55:42 Pv wrote:
> I am pretty sure this aborts in RBTools itself before it ever gets to
> the server.
> Again, a manual upload of the diff file to the server works fine.

Could http://reviews.reviewboard.org/r/1298/ have caused this?

Regards,
Thilo

Pv

unread,
Feb 25, 2010, 3:43:34 PM2/25/10
to reviewboard
Yes, I just commented that out and the upload was successful.

Pv

On Feb 25, 12:25 pm, "Thilo-Alexander Ginkel" <th...@ginkel.com>
wrote:

Christian Hammond

unread,
Feb 25, 2010, 3:45:42 PM2/25/10
to revie...@googlegroups.com
Hi,

We'll have to figure out a new solution to that bug then. By any chance, can you reproduce this against the RBTools or Review Board repository, so we can make a test case?

Christian

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com


>
> Regards,
> Thilo

--
Want to help the Review Board project? Donate today at http://www.reviewboard.org/donate/
Happy user? Let us know at http://www.reviewboard.org/users/

-~----------~----~----~----~------~----~------~--~---
To unsubscribe from this group, send email to reviewboard...@googlegroups.com

Pv

unread,
Feb 25, 2010, 3:48:01 PM2/25/10
to reviewboard
It should have been:
return content_type, content.encode('utf-8', 'ignore')

Pv

Pv

unread,
Feb 25, 2010, 4:01:46 PM2/25/10
to reviewboard
Actually, that doesn't work either.

This stuff has always blown my mind a bit:
>>> s = 'La Pe\xf1a'
>>> print s
La Pe±a
>>> s.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position
5: ordinal not in range(128)
>>> s.encode('utf8','ignore')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position
5: ordinal not in range(128)
>>> s.encode('utf8','replace')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position
5: ordinal not in range(128)
>>> u = unicode(s, 'utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python25\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5-6:
unexpected end of data
>>> u = unicode(s, 'utf8', 'ignore')
>>> u
u'La Pe'
>>> u = unicode(s, 'utf8', 'replace')
>>> u
u'La Pe\ufffd'
>>>

I don't know the best final result for the upload would be.

Pv

Message has been deleted

Christian Hammond

unread,
Feb 26, 2010, 2:13:58 PM2/26/10
to revie...@googlegroups.com
Mine too. We really need some example diffs that break things so we can put them into the unit test suite and verify when we fix it that other diffs don't break.

Christian

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com


On Thu, Feb 25, 2010 at 1:03 PM, Pv <p...@swooby.com> wrote:
Sorry for the poor formatting (couldn't find how to edit the previous
post):
Pv

--
Want to help the Review Board project? Donate today at http://www.reviewboard.org/donate/
Happy user? Let us know at http://www.reviewboard.org/users/

-~----------~----~----~----~------~----~------~--~---
To unsubscribe from this group, send email to reviewboard...@googlegroups.com

Pv

unread,
Mar 3, 2010, 2:01:23 PM3/3/10
to reviewboard
Considering all Visual Studio 2008 new source code files have initial
unicode BOM characters, any diff of source files from a default VS
install should do.

I am surprised *anyone* that uses VS can use post-review.
I set up a new user on the latest RBTools and had them modify a
checked in file and run post-review.
post-review failed w/ the same error code.

I deleted their rbtools egg file and re-installed RBTools using the
following command:
easy_install -Z -U rbtools

I then went in to the rbtools postreview.py file, and changed:
return content_type, content.encode('utf-8')
to:
return content_type, content#.encode('utf-8')

The user's post-review worked fine after that.

Notably, this explains why *all* of my VS source file reviews have a
red rectangle boxing some token char(s) at the beginning of the
diff(s).
That is the BOM char that reviewboard doesn't like.
The post-review "utf8" patch was intended to remove those...but I
don't think removing them is the best solution.
It would be best if reviewboard itself could just display the unicode
chars, optionally without boxing them in a red rectangle.

It would also be nice if reviewboard diffs wouldn't put a red
rectangle around the initial BOM chars.
The existence of BOM chars is normal and should be gracefully/silently
ignored.
A red box indicates to me an error/warning of some sort.
If the BOM differs between the two files then that should be
gracefully indicated.

Pv

On Feb 26, 11:13 am, Christian Hammond <chip...@chipx86.com> wrote:
> Mine too. We really need some example diffs that break things so we can put
> them into the unit test suite and verify when we fix it that other diffs
> don't break.
>
> Christian
>
> --

> Christian Hammond - chip...@chipx86.com
> Review Board -http://www.reviewboard.org

> VMware, Inc. -http://www.vmware.com

> > Happy user? Let us know athttp://www.reviewboard.org/users/


> > -~----------~----~----~----~------~----~------~--~---
> > To unsubscribe from this group, send email to

> > reviewboard...@googlegroups.com<reviewboard%2Bunsubscribe@googlegr oups.com>

Christian Hammond

unread,
Mar 3, 2010, 6:59:35 PM3/3/10
to revie...@googlegroups.com
The red rectangle doesn't come from us. This is a Pygments thing, so you'd need to talk to them about changing that. We have no control over it.

It seems that change to post-review for encoding in UTF-8 breaks a lot of things, so I'm going to remove it and look into an alternative fix.

Christian

--
Christian Hammond - chi...@chipx86.com
Review Board - http://www.reviewboard.org
VMware, Inc. - http://www.vmware.com


Happy user? Let us know at http://www.reviewboard.org/users/

-~----------~----~----~----~------~----~------~--~---
To unsubscribe from this group, send email to reviewboard...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages