Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
"\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 32 of 32 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Toshi MARUYAMA  
View profile  
 More options Aug 30 2010, 12:19 pm
From: Toshi MARUYAMA <marutosi...@yahoo.co.jp>
Date: Mon, 30 Aug 2010 09:19:12 -0700 (PDT)
Local: Mon, Aug 30 2010 12:19 pm
Subject: "\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))

I'm sorry for typo 'shitf-jis'.

Adrian Buehlmann wrote (2010-08-28 21:11):

I confired this logic fails in case of "\xC2\x80" of
latin1('iso-8859-1')
This logic is used in hglib.tounicode(str).

*********************************

diff --git a/tests/hglib_encoding_test.py b/tests/
hglib_encoding_test.py
--- a/tests/hglib_encoding_test.py
+++ b/tests/hglib_encoding_test.py
@@ -93,3 +93,16 @@
 def test_toutf_fallback():
     assert_equals(JAPANESE_KANA_I.encode('utf-8'),
                   hglib.toutf(JAPANESE_KANA_I.encode('euc-jp')))
+
+@with_encoding('iso-8859-1')
+def test_latin1_1():
+    str = "\41\x42"
+    assert_equals(str,
+                  hglib.fromunicode(hglib.tounicode(str)))
+
+@with_encoding('iso-8859-1')
+def test_latin1_2():
+    str = "\xC2\x80"
+    assert_equals(str,
+                  hglib.fromunicode(hglib.tounicode(str)))
+

*********************************

$ /r/Python26/Scripts/nosetests.exe tests/hglib_encoding_test.py
..............F
======================================================================
FAIL: hglib_encoding_test.test_latin1_2
----------------------------------------------------------------------
Traceback (most recent call last):
  File "r:\Python26\lib\site-packages\nose-0.11.4-py2.6.egg\nose
\case.py", line
186, in runTest
    self.test(*self.arg)
  File "C:\WEB-DOWN\tortoisehg\tests\hglib_encoding_test.py", line
108, in test_
latin1_2
    hglib.fromunicode(hglib.tounicode(str)))
AssertionError: '\xc2\x80' != '\x80'

----------------------------------------------------------------------
Ran 15 tests in 0.000s

FAILED (failures=1)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Toshi MARUYAMA  
View profile  
 More options Aug 30 2010, 12:41 pm
From: Toshi MARUYAMA <marutosi...@yahoo.co.jp>
Date: Mon, 30 Aug 2010 09:41:05 -0700 (PDT)
Local: Mon, Aug 30 2010 12:41 pm
Subject: Re: "\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))

On Aug 31, 1:19, Toshi MARUYAMA <marutosi...@yahoo.co.jp> wrote:

This logic fails in following matrix.

first char  : 0xc2 - 0xdf (30chars)
second char : 0x80 - 0xbf (64chars)

Total: 1920 (=30*64)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yuya Nishihara  
View profile  
 More options Aug 30 2010, 12:42 pm
From: Yuya Nishihara <y...@tcha.org>
Date: Tue, 31 Aug 2010 01:42:41 +0900
Local: Mon, Aug 30 2010 12:42 pm
Subject: Re: [thg-dev] "\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))

It looks "\xC2\x80" is handled as 'utf-8' with no error.
I'm not sure why 'utf-8' has precedence of locale encoding in tounicode().

IMHO, try-and-error is reasonable for *showing* file contents, name,
commit messages, etc., but dangerous for identifying something.

Yuya,


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Toshi MARUYAMA  
View profile   Translate to Translated (View Original)
 More options Aug 30 2010, 12:54 pm
From: Toshi MARUYAMA <marutosi...@yahoo.co.jp>
Date: Mon, 30 Aug 2010 09:54:30 -0700 (PDT)
Local: Mon, Aug 30 2010 12:54 pm
Subject: Re: "\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))

On 8月31日, 午前1:42, Yuya Nishihara <y...@tcha.org> wrote:

"\xC2\x80" is two chars in latin-1.
"\xC2\x80" is valid utf-8. "\xC2\x79" is invalid utf-8.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yuya Nishihara  
View profile  
 More options Aug 31 2010, 10:47 am
From: Yuya Nishihara <y...@tcha.org>
Date: Tue, 31 Aug 2010 23:47:37 +0900
Local: Tues, Aug 31 2010 10:47 am
Subject: Re: [thg-dev] Re: "\xC2\x80" of latin1 fails in hglib.tounicode(str) (Re: shellext dlls upload (Re: alpha quality installers))

Indeed.
I think tounicode() should respect the locale encoding before trying 'utf-8'.
It's used for conversion from Mercurial string to Qt unicode string.

Yuya,

Yuya,


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "shellext dlls upload (Re: alpha quality installers)" by Toshi MARUYAMA
Toshi MARUYAMA  
View profile  
 More options Sep 1 2010, 7:20 am
From: Toshi MARUYAMA <marutosi...@yahoo.co.jp>
Date: Wed, 01 Sep 2010 20:20:51 +0900
Local: Wed, Sep 1 2010 7:20 am
Subject: Re: [thg-dev] Re: shellext dlls upload (Re: alpha quality installers)
Adrian Buehlmann wrote (2010-08-28 21:11):

I tried Stefan job at
http://bitbucket.org/tortoisehg/stable/issue/672/shell-extension-unic... .
But this is not solution.
Fixutf8 sets encoding.encoding = 'utf8' at
http://bitbucket.org/stefanrusek/hg-fixutf8/src/baf283ab9f92/fixutf8.... .
It effects all repositories in regardless of whether fixutf8
is activated or not activated.
It is same problem with Steve posted a mail "managing extensions"
http://groups.google.com/group/thg-dev/browse_frm/thread/a5119a56a9c2...

> Is "try and error" really correct?

> What happens if the filenames are in some other encoding? e.g. 'latin1'
> with a 'ü'?

As described at my previous post, this is incorrect.

> And what is your logic to write .hg/thgstatus? In what encoding are the
> filenames in .hg/thgstatus written?

I don't touch OverlayServer python which writes .hg/thgstatus.
It reads simply .hg/dirstate and writes .hg/thgstatus.
Because .hg/dirstate path separator is '/', there is no problem of 0x5c.

If fixutf8 is activated, .hg/thgstatus and .hg/thgstatus encoding is utf-8.
If fixutf8 is not activated, .hg/thgstatus and .hg/thgstatus encoding is CP_ACP(shift-jis).

> If win32mbcs is activated, in what encoding is .hg/thgstatus written?
> shift-jis?

It is shift-jis.
Win32mbcs does not change repository encoding.
It hooks mercurial function simply.
Regardless of whether win32mbcs is activated or is not activated,
repository encoding is shift-jis in Japan and big5 in China(Taiwan).

I give up shellext supports fixutf8.
And I remove assuming utf-8 for filename.
My current shellext is only converting from CP_ACP to Unicode(Wide char).
This is compatible with main stream shellext.
And it resolves 0x5c problem.
It is big improvement for Japanese and Chinese(Taiwanese) thg users.

I finished to merge and resolve conflicts and I pushed my bitbucket.

Normal changeset:
http://bitbucket.org/marutosi/tortoisehg/changeset/18a682ecdfb1

MQ:
http://bitbucket.org/marutosi/tortoisehg-shellext-mq/changeset/98513b...

I uploaded Windows shellext dlls (ThgShellx86.dll and ThgShellx64.dll).

http://bitbucket.org/marutosi/tortoisehg/downloads
http://bitbucket.org/marutosi/tortoisehg/downloads/ThgShellx86.201009...
http://bitbucket.org/marutosi/tortoisehg/downloads/ThgShellx64.201009...

I don't have 64 bit Windows now, so I can't confirm to run 64 bit dll.
You can replace existing dll to new dll by the way of the following link.
http://bitbucket.org/tortoisehg/stable/src/386a21068b48/win32/shellex...

I hope to pull my shellext to thg main stream.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Adrian Buehlmann  
View profile  
 More options Sep 1 2010, 7:47 am
From: Adrian Buehlmann <adr...@cadifra.com>
Date: Wed, 01 Sep 2010 13:47:09 +0200
Local: Wed, Sep 1 2010 7:47 am
Subject: Re: [thg-dev] Re: shellext dlls upload (Re: alpha quality installers)
On 01.09.2010 13:20, Toshi MARUYAMA wrote:

I hope this is *not* pushed.

Doesn't meet minimal quality standards and I give up trying to dissect
these things.

For a start, http://mercurial.selenic.com/wiki/ContributingChanges might
be helpful.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »