BUG: TortoiseMerge automatically converts ANSI file to UTF8 w/o BOM

785 views
Skip to first unread message

debose

unread,
Sep 9, 2008, 9:41:50 AM9/9/08
to us...@tortoisesvn.tigris.org
I'm using TortoiseSVN with Delphi6, but Delphi still can not deal with
unicode, so all sources are saved in ANSI. I found that after applying
TortoiseMerge's feature "Use this text block", text block in Working
Copy becomes UTF8 encoded, even if original text block(in Working
Base) is ANSI.

here's an illustration: http://i38.tinypic.com/ilkln6.jpg

Version info:
TortoiseMerge 1.5.3, Build 13783 - 32 Bit , 2008/08/30 20:59:46
libsvn_diff 1.5.2,
apr 1.2.12
apr-utils 1.2.12

---------------------------------------------------------------------
To unsubscribe, e-mail: users-un...@tortoisesvn.tigris.org
For additional commands, e-mail: users...@tortoisesvn.tigris.org

Stefan Küng

unread,
Sep 9, 2008, 11:28:15 AM9/9/08
to us...@tortoisesvn.tigris.org
debose wrote:
> I'm using TortoiseSVN with Delphi6, but Delphi still can not deal with
> unicode, so all sources are saved in ANSI. I found that after applying
> TortoiseMerge's feature "Use this text block", text block in Working
> Copy becomes UTF8 encoded, even if original text block(in Working
> Base) is ANSI.
>
> here's an illustration: http://i38.tinypic.com/ilkln6.jpg

Try a nightly build please:
http://nightlybuilds.tortoisesvn.net/1.5.x/

Stefan


--
___
oo // \\ "De Chelonian Mobile"
(_,\/ \_/ \ TortoiseSVN
\ \_/_\_/> The coolest Interface to (Sub)Version Control
/_/ \_\ http://tortoisesvn.net

signature.asc

debose

unread,
Sep 9, 2008, 1:38:48 PM9/9/08
to us...@tortoisesvn.tigris.org
Thank you for quick response, Stefan.

In night build(13872) Working Copy is not converted to UTF8 encoding,
if it was in ANSI.

But, if Working Copy already contained any text in UTF8 w/o BOM, then,
all the following text that is copied using "Use this text block"
feature in TortoiseMerge becomes UTF8 encoded.

Isn't that possible to disable auto-encoding in any option?

On Sep 9, 6:28 pm, Stefan Küng <tortoise...@gmail.com> wrote:
>
> Try a nightly build please:http://nightlybuilds.tortoisesvn.net/1.5.x/
>
> Stefan
>
> --
>        ___
>   oo  // \\      "De Chelonian Mobile"
>  (_,\/ \_/ \     TortoiseSVN
>    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
>    /_/   \_\    http://tortoisesvn.net
>

>  signature.asc
> < 1KViewDownload

Stefan Küng

unread,
Sep 9, 2008, 2:27:10 PM9/9/08
to us...@tortoisesvn.tigris.org
debose wrote:
> Thank you for quick response, Stefan.
>
> In night build(13872) Working Copy is not converted to UTF8 encoding,
> if it was in ANSI.
>
> But, if Working Copy already contained any text in UTF8 w/o BOM, then,
> all the following text that is copied using "Use this text block"
> feature in TortoiseMerge becomes UTF8 encoded.

Sure, it you have utf8 encoded text, then the file is utf8 encoded.

> Isn't that possible to disable auto-encoding in any option?

No, that's not possible. Sorry.

signature.asc
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

debose

unread,
Sep 11, 2008, 7:30:34 AM9/11/08
to us...@tortoisesvn.tigris.org
13872 build still incorrectly detects UTF8 encoding.
For example, files containing only ASCII(without locale-dependent
symbols) are still treated as UTF8 CRLF. But in fact they are ANSI
files. The worst is that, after editing these files in
TortoiseMerge(adding any locale-dependent symbol), they will actually
become UTF8, which is not acceptable.

Fisher, John

unread,
Sep 11, 2008, 9:53:00 AM9/11/08
to us...@tortoisesvn.tigris.org
Maybe you should use a separate app like Win Merge or Beyond Compare, since you can configure Tortoise to use them instead of TortoiseMerge. Have you tried that, yet?

-----Original Message-----
From: debose [mailto:ex.d...@gmail.com]
Sent: Thursday, September 11, 2008 6:32 AM
To: us...@tortoisesvn.tigris.org
Subject: Re: BUG: TortoiseMerge automatically converts ANSI file to UTF8 w/o BOM

13872 build still incorrectly detects UTF8 encoding.
For example, files containing only ASCII(without locale-dependent
symbols) are still treated as UTF8 CRLF. But in fact they are ANSI
files. The worst is that, after editing these files in
TortoiseMerge(adding any locale-dependent symbol), they will actually
become UTF8, which is not acceptable.

debose

unread,
Sep 11, 2008, 11:00:29 AM9/11/08
to us...@tortoisesvn.tigris.org
I tried WinMerge about a year ago, but after some time i returned back
to TortoiseDiff. I don't remember what the reasons were.

But if TortoiseSVN Team will not confirm that as a bug, I'll try other
diff-tools =)

On Sep 11, 4:53 pm, "Fisher, John" <jfis...@forthrightsolutions.com>
wrote:


> Maybe you should use a separate app like Win Merge or Beyond Compare, since you can configure Tortoise to use them instead of TortoiseMerge.  Have you tried that, yet?
>

---------------------------------------------------------------------

Stefan Küng

unread,
Sep 11, 2008, 11:43:52 AM9/11/08
to us...@tortoisesvn.tigris.org
debose wrote:

> On Sep 9, 9:27 pm, Stefan Küng <tortoise...@gmail.com> wrote:
>> Sure, it you have utf8 encoded text, then the file is utf8 encoded.
>
> Seems like that build, incorrectly detects UTF8 w/o BOM. F.e. every
> file that contains ONLY ASCII(plain English, without locale-dependent
> symbols) is treated in TortoiseMerge as UTF8 CRLF, while in fact it is
> ANSI.
> And after editing in TortoiseMerge, it will become UTF8, which is not
> acceptable for me. =(

I added a registry key to set the default:
HKCU\Software\TortoiseMerge\UseUTF8
(DWORD value)

if set to 1, then TortoiseMerge will default to utf8 when loading files
which only have chars < 127 in it.

This was done in r13918.

Stefan

signature.asc

Simon Large

unread,
Sep 11, 2008, 3:46:44 PM9/11/08
to us...@tortoisesvn.tigris.org
2008/9/11 Stefan Küng <torto...@gmail.com>:

> debose wrote:
>> On Sep 9, 9:27 pm, Stefan Küng <tortoise...@gmail.com> wrote:
>>> Sure, it you have utf8 encoded text, then the file is utf8 encoded.
>>
>> Seems like that build, incorrectly detects UTF8 w/o BOM. F.e. every
>> file that contains ONLY ASCII(plain English, without locale-dependent
>> symbols) is treated in TortoiseMerge as UTF8 CRLF, while in fact it is
>> ANSI.
>> And after editing in TortoiseMerge, it will become UTF8, which is not
>> acceptable for me. =(
>
> I added a registry key to set the default:
> HKCU\Software\TortoiseMerge\UseUTF8
> (DWORD value)
>
> if set to 1, then TortoiseMerge will default to utf8 when loading files
> which only have chars < 127 in it.

The code looks different from this description. It appears that it
will use UTF8 in the following conditions:
a) if there is already a UTF8 BOM, or
b) if there are non-ascii chars AND the UseUTF8 key is non-zero.

The UseUTF8 key defaults to 0, so by default there is no conversion.

Is that correct?

Simon

--
: ___
: oo // \\ "De Chelonian Mobile"
: (_,\/ \_/ \ TortoiseSVN
: \ \_/_\_/> The coolest Interface to (Sub)Version Control
: /_/ \_\ http://tortoisesvn.net

---------------------------------------------------------------------

Stefan Küng

unread,
Sep 11, 2008, 4:27:56 PM9/11/08
to us...@tortoisesvn.tigris.org
Simon Large wrote:
> 2008/9/11 Stefan Küng <torto...@gmail.com>:
>> debose wrote:
>>> On Sep 9, 9:27 pm, Stefan Küng <tortoise...@gmail.com> wrote:
>>>> Sure, it you have utf8 encoded text, then the file is utf8 encoded.
>>> Seems like that build, incorrectly detects UTF8 w/o BOM. F.e. every
>>> file that contains ONLY ASCII(plain English, without locale-dependent
>>> symbols) is treated in TortoiseMerge as UTF8 CRLF, while in fact it is
>>> ANSI.
>>> And after editing in TortoiseMerge, it will become UTF8, which is not
>>> acceptable for me. =(
>> I added a registry key to set the default:
>> HKCU\Software\TortoiseMerge\UseUTF8
>> (DWORD value)
>>
>> if set to 1, then TortoiseMerge will default to utf8 when loading files
>> which only have chars < 127 in it.
>
> The code looks different from this description. It appears that it
> will use UTF8 in the following conditions:
> a) if there is already a UTF8 BOM, or
> b) if there are non-ascii chars AND the UseUTF8 key is non-zero.
>
> The UseUTF8 key defaults to 0, so by default there is no conversion.

As I said: if the registry key is set to 1 (UseUTF8 is 1, not the
default 0), then an ANSI file which only has chars < 127 is treated as
utf8 (if there are no chars > 127, then the encoding can be either ANSI
or utf8 - it's the same).
But that registry key is of course only used if the automatic encoding
detection isn't able to detect the encoding automatically.

Stefan

signature.asc

Craig McQueen

unread,
Sep 11, 2008, 8:19:58 PM9/11/08
to us...@tortoisesvn.tigris.org
Stefan Küng wrote:
debose wrote:
  
On Sep 9, 9:27 pm, Stefan Küng <tortoise...@gmail.com> wrote:
    
Sure, it you have utf8 encoded text, then the file is utf8 encoded.
      
Seems like that build, incorrectly detects UTF8 w/o BOM. F.e. every
file that contains ONLY ASCII(plain English, without locale-dependent
symbols) is treated in TortoiseMerge as UTF8 CRLF, while in fact it is
ANSI.
And after editing in TortoiseMerge, it will become UTF8, which is not
acceptable for me. =(
    
I added a registry key to set the default:
HKCU\Software\TortoiseMerge\UseUTF8
(DWORD value)

if set to 1, then TortoiseMerge will default to utf8 when loading files
which only have chars < 127 in it.

This was done in r13918.

Stefan

  
This situation begs for an svn property to store the file's character encoding -- in a similar way to svn:mime-type stores the MIME type. I've done a web search and found discussion about svn:charset.
 This seems like a great idea -- is it likely to become mainstream?

Regards,
Craig McQueen

Konstantin Kolinko

unread,
Sep 12, 2008, 5:22:43 AM9/12/08
to us...@tortoisesvn.tigris.org
2008/9/12 Craig McQueen <mcqu...@edsrd1.yzk.co.jp>:

Well, there is
http://subversion.tigris.org/issues/show_bug.cgi?id=2329
but closed due to "lack of consensus".

I, personally, wonder, why not to include the charset into existing
svn:mime-type property, e.g.
svn:mime-type=text/plain;charset=ISO-8859-9

Have a nice day!

Best regards,
Konstantin Kolinko

debose

unread,
Sep 12, 2008, 8:31:10 AM9/12/08
to us...@tortoisesvn.tigris.org
Thank you, Stefan.

On Sep 11, 6:43 pm, Stefan Küng <tortoise...@gmail.com> wrote:
>
> I added a registry key to set the default:
> HKCU\Software\TortoiseMerge\UseUTF8
> (DWORD value)
>
> if set to 1, then TortoiseMerge will default to utf8 when loading files
> which only have chars < 127 in it.
>
> This was done in r13918.
>
> Stefan

---------------------------------------------------------------------

Craig McQueen

unread,
Sep 14, 2008, 8:43:55 PM9/14/08
to us...@tortoisesvn.tigris.org
If we set the charset in the svn:mime-type, will TSVN use it instead of auto-detecting?

BTW auto-detecting character encoding sounds like amazing magic.

Craig

Stefan Küng

unread,
Sep 15, 2008, 3:15:31 AM9/15/08
to us...@tortoisesvn.tigris.org
On Mon, Sep 15, 2008 at 02:43, Craig McQueen <mcqu...@edsrd1.yzk.co.jp> wrote:
> If we set the charset in the svn:mime-type, will TSVN use it instead of
> auto-detecting?

No. A mime type doesn't contain the charset (usually). And TMerge
hasn't linked the libsvn_wc to read properties.

Stefan

--
___
oo // \\ "De Chelonian Mobile"
(_,\/ \_/ \ TortoiseSVN
\ \_/_\_/> The coolest Interface to (Sub)Version Control
/_/ \_\ http://tortoisesvn.net

---------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages