Filename issues, linux --> Fat32

Peter Chant

unread,

Aug 2, 2011, 3:17:28 AM8/2/11

to

I can't find an answer googling, though this must be an ages old question:

I'm trying to copy files from linux to a FAT32 drive (phone SD card). The
latter is FAT32. Specifically MP3's which I ripped on the linux machine.
However, this now contains some characters which FAT32 does not like in the
file names, such as those containing ':' or accented letters. This makes
trasferring anything by Bjork or Royksopp (sorry, no umlauts on this
keyboard) tricky. I can of course write a script to sort this one, but
surely there is already a tool to do this that I am not aware of?

Pete

--
http://www.petezilla.co.uk

Eef Hartman

unread,

Aug 2, 2011, 4:11:53 AM8/2/11

to

Peter Chant <pet...@mpeteozilla.vco.uke> wrote:
> I'm trying to copy files from linux to a FAT32 drive (phone SD card). The
> latter is FAT32. Specifically MP3's which I ripped on the linux machine.
> However, this now contains some characters which FAT32 does not like in the
> file names, such as those containing ':' or accented letters.

A : is a nono in FAT, as it's the device separator. It isn't a good idea
(although allowed) in *ix either, as it's used as remote host terminator
in the rcp/scp (and some other) network commands.
As for the accented letters: are you using UTF-8 on the Linux machine
or one of the iso 8859 charsets?
UTF should transfer some (not all) of them ok, 8859 is a wholly different
story as it's a DIFFERENT charset (in the upper 128 chars) then what
FAT is using and the translations aren't all that trivial.
Furthermore there are quite a few chars that are not allowed (at all)
in FAT filenames:
chars / ? < > \ : * | ??? and any character you can type with the Ctrl
key; in addition to those illegal characters the caret ^ is also not
permitted in FAT (the first set isn't permitted in NTFS either).

> surely there is already a tool to do this that I am not aware of?

If that exist I'm not aware of it either.
But I must admit, I never try to store Linux files (and filenames)
on a FAT filesystem, if I really must, I put them in a .tar file
to preserver all *ix attributes and names.
--
******************************************************************
** Eef Hartman, Delft University of Technology, dept. SSC/ICT **
** e-mail: E.J.M....@tudelft.nl - phone: +31-15-27 82525 **
******************************************************************

goarilla

unread,

Aug 2, 2011, 5:33:20 AM8/2/11

to

Internationalization bites again !

Aragorn

unread,

Aug 2, 2011, 6:07:25 AM8/2/11

to

On Tuesday 02 August 2011 11:33 in alt.os.linux.slackware, goarilla
enlightened humanity with the following words...:

> On Tue, 02 Aug 2011 10:11:53 +0200, Eef Hartman wrote:
>

>> [...] But I must admit, I never try to store Linux files (and
>> filenames) on a FAT filesystem, if Ireally must, I put them in a .tar

>> file to preserver all *ix attributes and names.

+1

> Internationalization bites again !

It would be more correct to state that it's the braindead designs which
are biting again, because these are outdated designs that Microsoft not
only has continued to support but even continued to push until at least
the year 2000 - which is when Windows ME was released - and which
everyone seems to still want to support because Microsoft is considered
sacred somehow.

"Kneel and bow before your God, Billgatus of Borg, you worm!" <eg>

--
Aragorn
(registered GNU/Linux user #223157)

Henrik Carlqvist

unread,

Aug 2, 2011, 6:43:36 AM8/2/11

to

I did a quick google on "sanitize file names" and got the following page
as the second hit. I haven't tried it, but it seems to be a useful perl
script for this purpose.

http://www.karakas-online.de/forum/viewtopic.php?t=10430

Even though the script does not seem to replace ":" adding yet another
line for that seems easy.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc123(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost

Eef Hartman

unread,

Aug 2, 2011, 6:43:42 AM8/2/11

to

Aragorn <str...@telenet.be.invalid> wrote:
> It would be more correct to state that it's the braindead designs which
> are biting again, because these are outdated designs that Microsoft not
> only has continued to support but even continued to push until at least

The problem is it isn't just M$ but especially the vendors of memory sticks,
camera memory cards etc that still are using FAT (in the vfat, Windows-9x
cq -ME, way). MicroSoft itself has now completely switched to ntfs.

Ralph Spitzner

unread,

Aug 2, 2011, 8:09:44 AM8/2/11

to

Eef Hartman wrote:
[...]

> The problem is it isn't just M$ but especially the vendors of memory sticks,
> camera memory cards etc that still are using FAT (in the vfat, Windows-9x
> cq -ME, way). MicroSoft itself has now completely switched to ntfs.

Well the Stick/CF doesn't really care what you do to it.
RAW/XFS/extX etc. are all go, the problem is more likely
the limited resources of say a Camera, so a stupid small
braindead filesystem is more easy to implement on that
hardware....

-rasp

Eef Hartman

unread,

Aug 2, 2011, 9:15:49 AM8/2/11

to

Ralph Spitzner <ra...@spitzner.org> wrote:
> Well the Stick/CF doesn't really care what you do to it.

A pure stick, no (although my DaneElec 16 GB stick keeps on
listing a "1440 MB floppy" device, even though I reclaimed
its space into the ext3 partition I created on it:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: Model: USB DISK Pro Rev: PMAP
Type: Direct-Access ANSI SCSI revision: 00
Host: scsi7 Channel: 00 Id: 00 Lun: 01
Vendor: Model: USB DISK Pro Rev: PMAP
Type: Direct-Access ANSI SCSI revision: 00
the first is the 16 GB ext3 partition, the 2nd a now 0 MB in size
fat "floppy image" the firmware keeps on showing), but especially
MP3 cq media players are often braindead as of the fs'es they support
for the files they can play.
I once had a MP3 player that didn't even support "files in sub
directories", all the playable music HAD to be in the root dir.
Luckily it is broken now and my current player DOES support them.

Paweł Wlaź

unread,

Aug 2, 2011, 12:17:40 PM8/2/11

to

What are your locale settings. I had some problems with filenames like
yours - but the problems disspeared with utf8 locales set on
slackware. Now I can copy files with names like:

1990 Gling-Glo - 14 Bï¿œrnin Viï¿œ Tjï¿œrnina.mp3

from slackware to phone (mounted as usb stick, vfat filesystem) and
the device reports the filename without any change.

But you did not mention the way you mount the device. For me works
mounting in Dolphin when using fluxbox or auto-mounting with kde.

PW

Martin

unread,

Aug 2, 2011, 2:32:21 PM8/2/11

to

Peter Chant wrote:

for converting character sets and sometimes unescaping filenames i use a
script called "conmv". I do not recall where i got it ftom but it starts
with

#!/usr/bin/perl
# convmv 1.14 - converts filenames from one encoding to another
# Copyright © 2003-2008 Bjoern JACKE <bjo...@j3e.de>
#
# This program comes with ABSOLUTELY NO WARRANTY; it may be copied or
modified
# under the terms of the GNU General Public License version 2 or 3 as
# published by the Free Software Foundation.

However, I am not sure it deals with ':' and such like characters.

Martin

unread,

Aug 2, 2011, 2:34:45 PM8/2/11

to

Martin wrote:

> for converting character sets and sometimes unescaping filenames i use a
> script called "conmv". I do not recall where i got it ftom but it starts
> with
>
> #!/usr/bin/perl
> # convmv 1.14 - converts filenames from one encoding to another
> # Copyright © 2003-2008 Bjoern JACKE <bjo...@j3e.de>
> #
> # This program comes with ABSOLUTELY NO WARRANTY; it may be copied or
> modified
> # under the terms of the GNU General Public License version 2 or 3 as
> # published by the Free Software Foundation.
>
> However, I am not sure it deals with ':' and such like characters.
>
> Martin

third google hit: http://www.j3e.de/linux/convmv/

Helmut Hullen

unread,

Aug 2, 2011, 2:49:00 PM8/2/11

to

Hallo, Martin,

Du meintest am 02.08.11:

> for converting character sets and sometimes unescaping filenames i
> use a script called "conmv". I do not recall where i got it ftom but
> it starts with

> #!/usr/bin/perl
> # convmv 1.14 - converts filenames from one encoding to another

> # Copyright ? 2003-2008 Bjoern JACKE <bjo...@j3e.de>

http://slackfind.net/

There you can find "convmv-1.14-noarch" slackware packets from several
places.

Viele Gruesse
Helmut

"Ubuntu" - an African word, meaning "Slackware is too hard for me".

jim dorey

unread,

Aug 2, 2011, 6:24:56 PM8/2/11

to

Eef Hartman wrote:

>
> If that exist I'm not aware of it either.
> But I must admit, I never try to store Linux files (and filenames)
> on a FAT filesystem, if I really must, I put them in a .tar file
> to preserver all *ix attributes and names.

i put the names in the note section of the mp3 tags, it keeps the
cyrillic from my little overseas collection intact.

goarilla

unread,

Aug 3, 2011, 3:30:21 AM8/3/11

to

On Tue, 02 Aug 2011 15:15:49 +0200, Eef Hartman wrote:

> Ralph Spitzner <ra...@spitzner.org> wrote:
>> Well the Stick/CF doesn't really care what you do to it.
>
> A pure stick, no (although my DaneElec 16 GB stick keeps on listing a
> "1440 MB floppy" device, even though I reclaimed its space into the ext3
> partition I created on it: Host: scsi7 Channel: 00 Id: 00 Lun: 00
> Vendor: Model: USB DISK Pro Rev: PMAP Type:
> Direct-Access ANSI SCSI revision: 00
> Host: scsi7 Channel: 00 Id: 00 Lun: 01
> Vendor: Model: USB DISK Pro Rev: PMAP Type:
> Direct-Access ANSI SCSI revision: 00
> the first is the 16 GB ext3 partition, the 2nd a now 0 MB in size fat
> "floppy image" the firmware keeps on showing), but especially MP3 cq
> media players are often braindead as of the fs'es they support for the
> files they can play.
> I once had a MP3 player that didn't even support "files in sub
> directories", all the playable music HAD to be in the root dir. Luckily
> it is broken now and my current player DOES support them.

Whoo I smell a murder mystery.

Ralph Spitzner

unread,

Aug 3, 2011, 4:33:55 AM8/3/11

to

Eef Hartman wrote:
> Ralph Spitzner<ra...@spitzner.org> wrote:

[...]

> I once had a MP3 player that didn't even support "files in sub
> directories", all the playable music HAD to be in the root dir.
> Luckily it is broken now and my current player DOES support them.

Remove the 'Firmware' and make it 'Flat' :-)

Sorry, couldn't resist...

-rasp

Aragorn

unread,

Aug 3, 2011, 5:51:40 AM8/3/11

to

On Wednesday 03 August 2011 09:30 in alt.os.linux.slackware, goarilla

enlightened humanity with the following words...:

> On Tue, 02 Aug 2011 15:15:49 +0200, Eef Hartman wrote:

Nah, everyone knows the butler did it. ;-)

Peter Chant

unread,

Aug 3, 2011, 2:31:02 PM8/3/11

to

Eef Hartman wrote:

> Peter Chant <pet...@mpeteozilla.vco.uke> wrote:
>> I'm trying to copy files from linux to a FAT32 drive (phone SD card).
>> The
>> latter is FAT32. Specifically MP3's which I ripped on the linux machine.
>> However, this now contains some characters which FAT32 does not like in
>> the file names, such as those containing ':' or accented letters.
>
> A : is a nono in FAT, as it's the device separator. It isn't a good idea

I'm fairly up on this, having went from a ZX Spectum+ to a PC (12MHz 286 no
less).

> (although allowed) in *ix either, as it's used as remote host terminator
> in the rcp/scp (and some other) network commands.
> As for the accented letters: are you using UTF-8 on the Linux machine
> or one of the iso 8859 charsets?

Up to now I never knew or cared. I've just changed in
/etc/profile.d/lang.sh from en_US to en_GB, but as I've not rebooted that is
fairly irrelevant.

Does FAT32 support UTF-8 or 8859 in any case?

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 3, 2011, 2:44:56 PM8/3/11

to

Henrik Carlqvist wrote:

> I did a quick google on "sanitize file names" and got the following page
> as the second hit. I haven't tried it, but it seems to be a useful perl
> script for this purpose.
>
> http://www.karakas-online.de/forum/viewtopic.php?t=10430
>
> Even though the script does not seem to replace ":" adding yet another
> line for that seems easy.

Hmm, I feel a bit of python coding coming on.

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 3, 2011, 2:33:06 PM8/3/11

to

Aragorn wrote:

>
> It would be more correct to state that it's the braindead designs which
> are biting again, because these are outdated designs that Microsoft not
> only has continued to support but even continued to push until at least
> the year 2000 - which is when Windows ME was released - and which
> everyone seems to still want to support because Microsoft is considered
> sacred somehow.
>

It would be an advantage if all these devices supported something like ext2
(there must be a better choice than that now) and just made sure that the
windroids got drivers pushed towards them left, right and centre. THe
chances of MS adding it are low.

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 3, 2011, 2:43:44 PM8/3/11

to

Pawe? Wla? wrote:

^ ^

Evidence that at least knode does not have any issue?

> What are your locale settings. I had some problems with filenames like
> yours - but the problems disspeared with utf8 locales set on
> slackware. Now I can copy files with names like:
>

Do I do that in /etc/profile.d/lang.sh?

At the moment I get:

bash-4.1$ echo $LANG
en_US

bash-4.1$ locale -a | grep en_GB
en_GB
en_GB.utf8
bash-4.1$

Do I set LANG to en_GB.utf8?

#!/bin/sh
# Set the system locale. (no, we don't have a menu for this ;-)
# For a list of locales which are supported by this machine, type:
# locale -a

# en_US is the Slackware default locale:
#export LANG=en_US
export LANG=en_GB

# 'C' is the old Slackware (and UNIX) default, which is 127-bit
# ASCII with a charmap setting of ANSI_X3.4-1968. These days,
# it's better to use en_US or another modern $LANG setting to
# support extended character sets.
#export LANG=C

# There is also support for UTF-8 locales, but be aware that
# some programs are not yet able to handle UTF-8 and will fail to
# run properly. In those cases, you can set LANG=C before
# starting them. Still, I'd avoid UTF unless you actually need it.
#export LANG=en_US.UTF-8

# Another option for en_US:
#export LANG=en_US.ISO8859-1

# One side effect of the newer locales is that the sort order
# is no longer according to ASCII values, so the sort order will
# change in many places. Since this isn't usually expected and
# can break scripts, we'll stick with traditional ASCII sorting.
# If you'd prefer the sort algorithm that goes with your $LANG
# setting, comment this out.
export LC_COLLATE=C

# End of /etc/profile.d/lang.sh

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 3, 2011, 2:34:08 PM8/3/11

to

Eef Hartman wrote:

> If that exist I'm not aware of it either.
> But I must admit, I never try to store Linux files (and filenames)
> on a FAT filesystem, if I really must, I put them in a .tar file
> to preserver all *ix attributes and names.

Like my phone can read .tar files.

Actually it's android, so if I put the effort in...

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 3, 2011, 2:49:26 PM8/3/11

to

Helmut Hullen wrote:

>
> There you can find "convmv-1.14-noarch" slackware packets from several
> places.
>

>

> "Ubuntu" - an African word, meaning "Slackware is too hard for me".

Not in my case, I just build conmv from source...

(actually it was trival to build it from source)

--
http://www.petezilla.co.uk

Paweł Wlaź

unread,

Aug 3, 2011, 4:03:59 PM8/3/11

to

On Wed, 3 Aug 2011, Peter Chant wrote:

[ . . . ]

>
>> What are your locale settings. I had some problems with filenames like
>> yours - but the problems disspeared with utf8 locales set on
>> slackware. Now I can copy files with names like:
>>
>
> Do I do that in /etc/profile.d/lang.sh?
>

Yes, that's the place.

> At the moment I get:
>
>

[ . . . ]

>
> Do I set LANG to en_GB.utf8?
>

I think so. I have (in /etc/profile.d/lang.sh)

export LANG=pl_PL.utf8
export LC_ALL=pl_PL.utf8

but for you en_GB.utf8 is probably beter :)

Pawel

Eef Hartman

unread,

Aug 4, 2011, 4:00:27 AM8/4/11

to

Peter Chant <pet...@mpeteozilla.vco.uke> wrote:
> /etc/profile.d/lang.sh from en_US to en_GB,

No .UTF-8 suffix, so you're using iso 8859 (probably -1, the
"Latin-1" charset).
My LANG settings here (work, openSUSE) is: en_US.UTF-8

> Does FAT32 support UTF-8 or 8859 in any case?

Certainly not iso 8859, I _think_ the vfat extensions that came in
with one of the service packs of Win-95 do support Unicode (UCS-2).

PS: fat32 came in with - I believe - Win-98, although according to
a web page I found it was also available in OEM Service Release 2
of Win-95 (a OEM Windows release you only get WITH a new PC, it
was never sold separately i.e. as upgrade for older Win-95 systems).

Eef Hartman

unread,

Aug 4, 2011, 4:07:01 AM8/4/11

to

Peter Chant <pet...@mpeteozilla.vco.uke> wrote:
> Do I do that in /etc/profile.d/lang.sh?

Yes. If anyone uses tcsh, also change lang.csh the same way.

> Do I set LANG to en_GB.utf8?

Not quite, the extension should be .UTF-8 (note the caps), like in
the example here:
> #export LANG=en_US.UTF-8

Just change the US into GB and remove the #

> # Another option for en_US:
> #export LANG=en_US.ISO8859-1

This is the way to explicitly choose your 8859 charset (-1 is
good enough for GB, most EU countries will probably prefere -15,
the extended charset _with_ the Euro char included).

> export LC_COLLATE=C

And I would leave this in, so filenames still sort in ASCII order.

Peter Chant

unread,

Aug 5, 2011, 2:28:58 PM8/5/11

to

Peter Chant wrote:

Here is the python script I wrote. It is not behaving entirely as expected
but it gets 95% of the job done. I like python, despite, for some reason
the feeling that I am not coding python in the most efficient way.

No doubt someone will suggest parts of it are not very pythonic.

Of couse, if it were a perl script someone would suggest an alternative that
was half the length and four times more cryptic... ;-)

#!/bin/env python

import os, re, shutil

def SanitiseCopy(fullFilename,newPath):
fullFilename = fullFilename.strip()
#prog = re.compile('[0-9a-zA-Z_-/\.]+')
print fullFilename
match = re.search(r'([^0-9a-zA-Z\ -_\./]+)',fullFilename)
if ( match==None ):
print 'SAFE: ',fullFilename
#print match
new = fullFilename
else:
print 'UNSAFE: ',fullFilename
print ' MATCH:',match.group(0),"|"
#Convert unsafe - perform bitwise AND mask.
new = ''
for i in range(0,len(fullFilename)):
c = fullFilename[i]

if ord(c) > 127:
c = '_'

if c == ':':
c = '-'

if c == '?':
c = '_'

if c == '*':
c = '_'

if c == ',':
c = '_'

if c == '(' or c == ')':
c = '_'

if c == '{' or c == '}':
c = '_'

if c == "'":
c = '_'

new = new + c

print ' NOW:',new

#Now make a symbolic link
to = os.path.join(newPath,new[2:])
base = os.path.dirname(to)
#Check to see if directory exists
if ( not os.path.exists(base) ):
print " CREATING:",base
os.makedirs(base)
print 'LINK: source ',fullFilename,' \n -> ',to
#os.link(fullFilename,to)
#For some reason it has to be a file copy, links and symlinks do not
work.
shutil.copyfile(fullFilename,to)

print 'Make sanitised MP3 directory that vfat does not complain about.'

#Make a directory
if (os.path.exists('../mp3-sanitised')):
shutil.rmtree('../mp3-sanitised')
os.mkdir('../mp3-sanitised')

#walk
walkresult = os.walk('.')
#print walkresult

for dir in walkresult:
print 'DIR: ',dir[0]

for filename in dir[2]:
#print ' ',filename
fullFilename = os.path.join(dir[0],filename)
#print ' ',fullFilename
SanitiseCopy(fullFilename,'../mp3-sanitised')

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 5, 2011, 2:34:05 PM8/5/11

to

Eef Hartman wrote:

> Peter Chant <pet...@mpeteozilla.vco.uke> wrote:
>> Do I do that in /etc/profile.d/lang.sh?
>
> Yes. If anyone uses tcsh, also change lang.csh the same way.
>
>> Do I set LANG to en_GB.utf8?
>
> Not quite, the extension should be .UTF-8 (note the caps), like in
> the example here:
>> #export LANG=en_US.UTF-8
>
> Just change the US into GB and remove the #

I have en_GB.UTF-8. Now the non-standard - for the UK - characters come out
as strange question mark symbols. Not tried re-ripping the files to see how
that goes. However, this works:

bash-4.1$ touch @?ee�????��
bash-4.1$ ls @*
@?ee�????��
bash-4.1$

Would not want to try writing that to vfat though?

> This is the way to explicitly choose your 8859 charset (-1 is
> good enough for GB, most EU countries will probably prefere -15,
> the extended charset _with_ the Euro char included).
>

So perhaps I should have had:

export LANG=en_GB.ISO8859-15

So I can handle anything around here?

>> export LC_COLLATE=C
>
> And I would leave this in, so filenames still sort in ASCII order.

--
http://www.petezilla.co.uk

Martin

unread,

Aug 6, 2011, 4:26:04 AM8/6/11

to

Peter Chant wrote:

> I have en_GB.UTF-8.

Personally I think utf-8 encoding is not a bad choice even for english-
speakers because it enables you to communicate with the non-english-speaking
world. I realize in the prevailing anglo-american hegemony this is not a big
incentive but it had to be said. ;-)

> Now the non-standard - for the UK - characters come
> out as strange question mark symbols.

I can only assume this happens for files that have been created before your
switch to utf-8 and that are consequently encoded in a different (now:
wrong) way. You can use the previously mentioned convmv script to convert
those file names.

> Not tried re-ripping the files to
> see how
> that goes.

That should work, too.

> However, this works:
>
> bash-4.1$ touch @?ee¶????øþ
> bash-4.1$ ls @*
> @?ee¶????øþ

> bash-4.1$
>
> Would not want to try writing that to vfat though?

No, start with a correctly encoded file name and run a test to see what
happens. ;-)

> So perhaps I should have had:
>
> export LANG=en_GB.ISO8859-15
>
> So I can handle anything around here?
>

I prefer utf-8.

Martin

Aragorn

unread,

Aug 6, 2011, 7:50:19 AM8/6/11

to

On Saturday 06 August 2011 10:26 in alt.os.linux.slackware, Martin

enlightened humanity with the following words...:

> Peter Chant wrote:

>
>> I have en_GB.UTF-8.
>
> Personally I think utf-8 encoding is not a bad choice even for
> english- speakers because it enables you to communicate with the
> non-english-speaking world. I realize in the prevailing anglo-american
> hegemony this is not a big incentive but it had to be said. ;-)

I fully agree with that. Besides, most GNU/Linux distributions now use
UTF-8 out of the box.

>> So perhaps I should have had:
>>
>> export LANG=en_GB.ISO8859-15

iso8859-15 is the same as iso8859-1, but with the Euro symbol added

> I prefer utf-8.

+1

notbob

unread,

Aug 6, 2011, 1:29:28 PM8/6/11

to

On 2011-08-02, Eef Hartman <E.J.M....@tudelft.nl> wrote:

> The problem is it isn't just M$ but especially the vendors of memory sticks,
> camera memory cards etc that still are using FAT (in the vfat, Windows-9x
> cq -ME, way). MicroSoft itself has now completely switched to ntfs.

I don't see the problem. I have several Scandisk keys, which are all
vfat. I mount 'em, and copy whole Linux data dirs to them with no
problem. Certainly, I don't have any kumquats or other zany
restricted chars. Why would I?

If keeping them loquats and other bizarreness is necessary, wipe the
key with dd and put in a Linux extn FS. I've done the same with an
SDHD card, going back and forth between vfat to ext2, using a card
reader. I don't think the storage media is the problem, but the
filenames.

nb

--
vi ...the heart of evil

Peter Chant

unread,

Aug 7, 2011, 7:04:08 AM8/7/11

to

Aragorn wrote:

>>> So perhaps I should have had:
>>>
>>> export LANG=en_GB.ISO8859-15
>
> iso8859-15 is the same as iso8859-1, but with the Euro symbol added
>
>> I prefer utf-8.
>
> +1
>

Would it not be better to have one character set to do everything, and
everything standardise on that?

--
http://www.petezilla.co.uk

Peter Chant

unread,

Aug 7, 2011, 6:57:18 AM8/7/11

to

notbob wrote:

> On 2011-08-02, Eef Hartman <E.J.M....@tudelft.nl> wrote:
>
>> The problem is it isn't just M$ but especially the vendors of memory
>> sticks, camera memory cards etc that still are using FAT (in the vfat,
>> Windows-9x cq -ME, way). MicroSoft itself has now completely switched to
>> ntfs.
>
> I don't see the problem. I have several Scandisk keys, which are all
> vfat. I mount 'em, and copy whole Linux data dirs to them with no
> problem. Certainly, I don't have any kumquats or other zany
> restricted chars. Why would I?

I've had no problems until now, but don't normally have any concern about
characters that are not used in English.

>
> If keeping them loquats and other bizarreness is necessary, wipe the
> key with dd and put in a Linux extn FS. I've done the same with an
> SDHD card, going back and forth between vfat to ext2, using a card
> reader. I don't think the storage media is the problem, but the
> filenames.
>

The issue there is that not all devices support ext2 or similar. Very much
doubt that my camera supports ext2. MY phone _ought_ to, but I'd put money
on it not working if I tried it.

Pete

--
http://www.petezilla.co.uk

Helmut Hullen

unread,

Aug 7, 2011, 7:51:00 AM8/7/11

to

Hallo, Peter,

Du meintest am 07.08.11:

>>>> export LANG=en_GB.ISO8859-15

>> iso8859-15 is the same as iso8859-1, but with the Euro symbol added

>>> I prefer utf-8.

> Would it not be better to have one character set to do everything,

> and everything standardise on that?

Surely! Let's take ASCII (the real 7-bit stuff)!

Viele Gruesse
Helmut

Martin

unread,

Aug 7, 2011, 10:12:49 AM8/7/11

to

Peter Chant wrote:

> Would it not be better to have one character set to do everything, and
> everything standardise on that?
>

i believe that's the idea behind Unicode of which UTF-8 is a fairly 7bit-
ASCII-compatible encoding. ;-)

Martin

Aragorn

unread,

Aug 7, 2011, 1:11:38 PM8/7/11

to

On Sunday 07 August 2011 13:04 in alt.os.linux.slackware, Peter Chant

enlightened humanity with the following words...:

> Aragorn wrote:

That's what Unicode was designed for. ;-)

http://en.wikipedia.org/wiki/Unicode

Henrik Carlqvist

unread,

Aug 8, 2011, 4:14:19 PM8/8/11

to

Aragorn <str...@telenet.be.invalid> wrote:
> That's what Unicode was designed for. ;-)

Unfortunately most users of unicode use utf-8 which has some nasty
properties like the number of required bytes are not proportional to the
length of the string. For programmers this means that the good old strlen
function no longher can be used to count the number of characters in a
string.

Yep, I prefer iso-8859-1 much for this reason, but I am very much aware of
its shortcomings for foreign languages like chinese.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc123(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost

Aragorn

unread,

Aug 8, 2011, 5:41:13 PM8/8/11

to

On Monday 08 August 2011 22:14 in alt.os.linux.slackware, Henrik
Carlqvist enlightened humanity with the following words...:

> Aragorn <str...@telenet.be.invalid> wrote:
>
>> That's what Unicode was designed for. ;-)
>
> Unfortunately most users of unicode use utf-8 which has some nasty
> properties like the number of required bytes are not proportional to
> the length of the string. For programmers this means that the good old
> strlen function no longher can be used to count the number of
> characters in a string.

You can't have your cake and eat it. <grin>

Besides, apparently most programmers seem to have found a way around
this, because I haven't come across any application yet that hasn't been
built with Unicode support.

> Yep, I prefer iso-8859-1 much for this reason, but I am very much
> aware of its shortcomings for foreign languages like chinese.

Not to mention that you won't even get to see accented characters
properly, nor the Euro symbol, which is not part of iso-8859-1.

Peter Chant

unread,

Aug 9, 2011, 3:16:45 AM8/9/11

to

Aragorn wrote:

> You can't have your cake and eat it. <grin>
>
> Besides, apparently most programmers seem to have found a way around
> this, because I haven't come across any application yet that hasn't been
> built with Unicode support.
>
>> Yep, I prefer iso-8859-1 much for this reason, but I am very much
>> aware of its shortcomings for foreign languages like chinese.
>
> Not to mention that you won't even get to see accented characters
> properly, nor the Euro symbol, which is not part of iso-8859-1.
>

At some point I need to read up on this.

--
http://www.petezilla.co.uk

Eef Hartman

unread,

Aug 9, 2011, 4:27:02 AM8/9/11

to

Aragorn <str...@telenet.be.invalid> wrote:
> iso8859-15 is the same as iso8859-1, but with the Euro symbol added

Not quite, a few other changes have been made:
(about -1 in the man page):
> However, it lacks the EURO symbol and does not fully cover Finnish
> and French. ISO 8859-15 is a modification of ISO 8859-1 that covers
> these needs

But it still is a 8-bit charset, so can never cover all the characters
that are around. UTF-8 _is_ a different charset that can extend into
multiple byte chars, so "one charset covers all". But it DOES mean
that the codes between 128 and 255 have different meanings in
iso 8859 or utf-8.

Eef Hartman

unread,

Aug 9, 2011, 4:36:44 AM8/9/11

to

Henrik Carlqvist <Henrik.C...@deadspam.com> wrote:
> Aragorn <str...@telenet.be.invalid> wrote:
>> That's what Unicode was designed for. ;-)
>
> Unfortunately most users of unicode use utf-8 which has some nasty
> properties like the number of required bytes are not proportional to the
> length of the string.

That's true for full Unicode too (a Unicode char in UCS-4 can be up to
4 bytes). The Windows usage (UCS-2, Unicode Character Set 2)
circumvents it by making ALL chars 2 bytes (the original 7 bit ASCII
ones are then: <zero byte> <ascii value>),
UTF-8 does it another way by keeping the original 7-bits chars in a
single byte and only using extended bytes for chars, starting with a
0xc0 thru 0xfd byte, thus creating a full 31-bit (2 G) charset.

Wolfgang Schelongowski

unread,

Aug 10, 2011, 10:29:20 AM8/10/11

to

Peter Chant <pet...@MpeteOzilla.Vco.ukE> writes:

>Aragorn wrote:
>
>> You can't have your cake and eat it. <grin>
>>
>> Besides, apparently most programmers seem to have found a way around
>> this, because I haven't come across any application yet that hasn't been
>> built with Unicode support.
>>
>>> Yep, I prefer iso-8859-1 much for this reason, but I am very much
>>> aware of its shortcomings for foreign languages like chinese.
>>
>> Not to mention that you won't even get to see accented characters
>> properly, nor the Euro symbol, which is not part of iso-8859-1.

<fx: flourishes his HP-LJ4 manual of 1992>

I beg to differ, Sir :^) Page B-7 of the above manual has the codepage
of ISO-8859-1. A (prolonged) look at it reveals that it supports
Spanish, French, and most, if not all Nordic languages.

Oh, and German, too.

The slavic languages and Greek have their own 8859 code pages.

>At some point I need to read up on this.

So did I, but I knew where to find 8859-1.
--
The first entry of Sin into the mind occurs when, out of cowardice or
conformity or vanity, the Real is replaced by a comforting lie.
-- Integritas, Consonantia, Claritas

Eef Hartman

unread,

Aug 10, 2011, 11:05:01 AM8/10/11

to

Wolfgang Schelongowski <Wolfgang.Sc...@gmx.de> wrote:
> I beg to differ, Sir :^) Page B-7 of the above manual has the codepage
> of ISO-8859-1. A (prolonged) look at it reveals that it supports
> Spanish, French, and most, if not all Nordic languages.

man iso-8859-1
would have worked too (and yes, the other variants of iso 8859
have their own man pages too).
The "high" chars will only be shown correctly, of course, when
the window you DO the man command is is using that charset,
but the octal, decimal and hex values are correct in any window.

From the header OF that man page:
>The ISO 8859 standard includes several 8-bit extensions to the ASCII
>character set (also known as ISO 646-IRV). Especially important is
>ISO 8859-1, the "Latin Alphabet No. 1", which has become widely
>implemented and may already be seen as the de-facto standard ASCII
>replacement.

>ISO 8859-1 supports the following languages: Afrikaans, Basque,
>Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician,
>German, Icelandic, Irish, Italian, Norwegian, Portuguese, Scottish,
>Spanish, and Swedish.
(what it doesn't say that there are holes in the French and Finnish
charsets, which have been fixes in ISO 8859_15).

And furtheron it gives the complete set:
>ISO 8859-1 West European languages (Latin-1)
>ISO 8859-2 Central and East European languages (Latin-2)
>ISO 8859-3 Southeast European and miscellaneous languages (Latin-3)
>ISO 8859-4 Scandinavian/Baltic languages (Latin-4)
>ISO 8859-5 Latin/Cyrillic
>ISO 8859-6 Latin/Arabic
>ISO 8859-7 Latin/Greek
>ISO 8859-8 Latin/Hebrew
>ISO 8859-9 Latin-1 modification for Turkish (Latin-5)
>ISO 8859-10 Lappish/Nordic/Eskimo languages (Latin-6)
>ISO 8859-11 Latin/Thai
>ISO 8859-13 Baltic Rim languages (Latin-7)
>ISO 8859-14 Celtic (Latin-8)
>ISO 8859-15 West European languages (Latin-9)
>ISO 8859-16 Romanian (Latin-10)

Peter Chant

unread,

Aug 10, 2011, 5:20:32 PM8/10/11

to

Wolfgang Schelongowski wrote:

>
> I beg to differ, Sir :^) Page B-7 of the above manual has the codepage
> of ISO-8859-1. A (prolonged) look at it reveals that it supports
> Spanish, French, and most, if not all Nordic languages.

If you are going to get all geeky on us then I must mention that I wrote my
own raster printer driver for a plotting program I wrote - all in FORTRAN.
:-)

--
http://www.petezilla.co.uk

Wolfgang Schelongowski

unread,

Aug 19, 2011, 2:33:14 PM8/19/11

to

Eef Hartman <E.J.M....@tudelft.nl> writes:

>Wolfgang Schelongowski <Wolfgang.Sc...@gmx.de> wrote:
>> I beg to differ, Sir :^) Page B-7 of the above manual has the codepage
>> of ISO-8859-1. A (prolonged) look at it reveals that it supports
>> Spanish, French, and most, if not all Nordic languages.
>
>man iso-8859-1

Good old Dead Tree Technology was (and is) faster for me _right here_.

[from the above man page, no me:]

>>ISO 8859-1 supports the following languages: Afrikaans, Basque,
>>Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician,
>>German, Icelandic, Irish, Italian, Norwegian, Portuguese, Scottish,
>>Spanish, and Swedish.
>(what it doesn't say that there are holes in the French and Finnish
>charsets, which have been fixes in ISO 8859_15).

From "man iso-8859-15":]
[about ISO 8859-1]

However, it lacks the EURO symbol and does
not fully cover Finnish and French. ISO 8859-15 is a

modification of ISO 8859-1 that covers these needs.

diff tells me that -15 has the ligature 'oe' which is missing from -1.
Well, this just goes to show that my French is rather rusty, whereas
my Finnish is non-existant at all.

Wolfgang Schelongowski

unread,

Aug 19, 2011, 4:49:04 PM8/19/11

to

Peter Chant <pet...@MpeteOzilla.Vco.ukE> writes:

What I wrote wasn't geeky at all, and neither is writing a printer
driver - it's merely a filter, and doesn't do e.g. interrupt handling.