Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CVS and UTF-8 encoded files.

628 views
Skip to first unread message

Thomas Eliassson

unread,
Dec 5, 2001, 9:01:12 AM12/5/01
to
Hi!

We're writing Java code, and have swedish characters in the source.
We've been trying to keep the sourcecode in CVS, but we get warnings
when adding a file containing swedish characters. Our platform is W2k
clients and a Solaris 8 server. The warning looks like this:
Warning : 'fdafda.txt' has some escape characters in it (0x00-0x20,
0x80-0xFF), you should correct it first

Files are saved in UTF-8 format on our w2k clients. Since a swedish
special characters are represented by two bytes (e.g. 81 E5 hex) the
first byte is considered an escape character by CVS.

How does CVS handle escape characters?
Can I safely ignore this message?
It seems to work fine.

Thanks for your response
/Thomas

Jesus Manuel NAVARRO LOPEZ

unread,
Dec 5, 2001, 12:47:53 PM12/5/01
to
Hi, Thomas:

Thomas Eliassson wrote:

> Hi!
>
> We're writing Java code, and have swedish characters in the source.
> We've been trying to keep the sourcecode in CVS, but we get warnings
> when adding a file containing swedish characters. Our platform is W2k
> clients and a Solaris 8 server. The warning looks like this:
> Warning : 'fdafda.txt' has some escape characters in it (0x00-0x20,
> 0x80-0xFF), you should correct it first
>


CVS per se has problems understanding as *text* anything else than 7bit
US ASCII, so UTF encoding is not *text* as CVS understands it, here the
warnings.


> Files are saved in UTF-8 format on our w2k clients. Since a swedish
> special characters are represented by two bytes (e.g. 81 E5 hex) the
> first byte is considered an escape character by CVS.
>
> How does CVS handle escape characters?
> Can I safely ignore this message?
> It seems to work fine.
>


Yes. Only problems can come from time to time in the moment to resolve
conflicts arising since CVS *migth* not show you diffs properly. Since
you *know* those files are mergeable, and are mergeable indeed by the
algorithm CVS uses, the most that can arise is some annoyance from time
to time (unless someone can correct me).


...on the other hand, is arguably that you *need* special chars at your
source code: you'd better go with only US ASCII (migthbe you would need
to enforce english comments and english-ish varible names). If in the
future you need to share code with other people (even not foreigners)
you migth see it pays. Obviously, the "internazionalization" part of
your program (obviously user and/or GUI strings should be in whatever
chartable fits).
--
SALUD,
Jesús
***
jesus_...@promofinarsa.es
***

Jorgen Grahn

unread,
Dec 5, 2001, 2:33:55 PM12/5/01
to
On Wed, 05 Dec 2001 18:47:53 +0100, Jesus Manuel NAVARRO LOPEZ <jesus_...@promofinarsa.es> wrote:
> Hi, Thomas:
>
> Thomas Eliassson wrote:
>
>> Hi!
>>
>> We're writing Java code, and have swedish characters in the source.
>> We've been trying to keep the sourcecode in CVS, but we get warnings
>> when adding a file containing swedish characters. Our platform is W2k
>> clients and a Solaris 8 server. The warning looks like this:
>> Warning : 'fdafda.txt' has some escape characters in it (0x00-0x20,
>> 0x80-0xFF), you should correct it first
>>
>
>
> CVS per se has problems understanding as *text* anything else than 7bit
> US ASCII, so UTF encoding is not *text* as CVS understands it, here the
> warnings.

Not true in my experience; ISO 8859-1 (Latin1) works well for me (and
incidentally, that's also by far the most common character set for encoding
Swedish text).

(Unless the Windows client breaks something. I don't do Windows.)

Actually, I wonder about the "has some escape characters" message. I don't
think I've ever seen it, and I have had text files with escape characters
under version control, as text files. I've seen 'cvs diff' become confused
by it, but not commit/checkout.

If I were the OP, I'd switch encoding to plain 8-bit text, in Latin-1. If he
doesn't need to write in non-western languages, that's the most portable way
to represent plain text. And CVS handles it.

/Jorgen

--
// Jorgen Grahn <jgrahn@ ''Battle ye not with monsters,
\X/ algonet.se> lest ye become a monster''

Marcin Kasperski

unread,
Dec 6, 2001, 8:46:34 AM12/6/01
to
>>
>>CVS per se has problems understanding as *text* anything else than 7bit
>>US ASCII, so UTF encoding is not *text* as CVS understands it, here the
>>warnings.
>>
>
> Not true in my experience; ISO 8859-1 (Latin1) works well for me (and
> incidentally, that's also by far the most common character set for encoding
> Swedish text).


iso 8859-2 (Latin2) works well too.

Marcin Kasperski

unread,
Dec 6, 2001, 9:00:14 AM12/6/01
to
>>>The warning looks like this:
>>>Warning : 'fdafda.txt' has some escape characters in it (0x00-0x20,
>>>0x80-0xFF), you should correct it first
>>>
>>>
>>
> Actually, I wonder about the "has some escape characters" message. I don't
> think I've ever seen it, and I have had text files with escape characters
> under version control, as text files. I've seen 'cvs diff' become confused
> by it, but not commit/checkout.

Just an idea: maybe the warning is generated by WinCVS client, not by
the cvs itself?

WinCVS generates some warning about possible binary nature of files
being commited, usually this is helpful and reminds that - say - Word
files should be marked as binary..

Arthur Barrett

unread,
Dec 6, 2001, 4:30:33 PM12/6/01
to Thomas Eliassson
If you are using windows...

Recent versions of CVSNT have some special handling of UTF-8 (most of
which I ignored since it doesn't concern me - yet).

Check out:
http://www.cvsnt.com

There is also a separate mailing list (details on the site).

Regs,


Arthur Barrett

Thomas Eliassson

unread,
Dec 7, 2001, 8:07:02 AM12/7/01
to
Yes, after investigating I found out that it's WinCVS that produces the
warning message.
I guess we can live with that.
/Thomas
0 new messages