I'm funning FreeBSD, and after years of hanging on to it,
have finally left the "C" locale and its nice collations
for the LANG=en_US.UTF-8 locale.
To somewhat complicate matters, over the years, I have ripped a
number of CDs (classical ones are the most problematic in this
respect) where the names of the tracks, or the artitsts involve
European letters. It appears that in those cases, the filenames
generated are encoded in iso8859-1 and stored that way in the
Unix directory entries.
This causes me occasional problems as the "ls" command now
(after my locale shift) displays those files with '?' characters
for the characters over 0x7f, and sometimes other operations fail
as well.
I would like to write a Tcl script to look at all the files in a
directory and rename them from the iso8859-1 encoding to the UTF-8
encoding.
I'm a little confused about what "glob {*}" does in the face of
files which do not have valid names in the current system locale.
I wrote a brief test script to see what glob returns, and it does
return an entry for each file in the directory. Furthermore, I can
transcode that entry and then create a new (empty) file with the
same name, but in valid utf-8:
#!/usr/local/bin/tclsh8.6
proc main {} {
foreach f [glob {*}] {
set f [encoding convertfrom iso8859-1 $f]
close [open "/tmp/t/$f" w]
puts $f
}
exit 0
}
main
The generated files in /tmp/t have names which display correctly
with "ls".
So:
1) Should glob be telling me it's giving me names that aren't
valid in the current encoding?
2) If not, is there some way I can test a name to see if it is
valid UTF-8 or not? (Ideally, I would like to run a recursive
script that only renames iso8859-1 encoded filename files).
3) Is "file rename" going to work reliably if I give it the name
from glob on the left and the transcoded name on the right?
--
------
columbiaclosings.com
What's not in Columbia anymore..