>>> The disk is formatted using the FAT32 filesystem and thus does not use Unicode for filenamesThis isn't correct, the disk may be formatted as FAT32 but it also includes long filename support (VFAT). The character encoding used for long filename is Unicode, I believe that the actual encoding is UCS-2 (a predecessor to UTF-16), therefore there should be no need to restrict yourself to just US-ASCII. Because the character encoding is Unicode the filenames should be correctly displayed on any operating system.The invalid characters that can not be used are the ones that have some meaning to the operating system/file system, these are <>"/\|?*%^: see https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words for more information.
You are correct in that FAT12, FAT16, and FAT32 file systems use the OEM character set, they also only support 8.3 filenames, however the VFAT long filename extensions encoded as Unicode characters. As you can see from the following output from my BB1 it is using VFAT and therefore it is using Unicode.
# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/root ext4 1761268 207704 1446048 13% /
devtmpfs devtmpfs 200328 0 200328 0% /dev
tmpfs tmpfs 204520 0 204520 0% /dev/shm
tmpfs tmpfs 204520 644 203876 0% /tmp
tmpfs tmpfs 204520 48 204472 0% /run
/dev/mmcblk0p1 vfat 31640 7270 24370 23% /boot
/dev/mmcblk0p3 vfat 29297152 16711344 12585808 57% /media/sd3
/dev/b2_usba_p1 vfat 31932416 16206288 15726128 51% /media/usba
#
If you plugged one of these devices into a system that supports FAT32 but not the VFAT extensions you would still be able to access the files on that device but you would only see the 8.3 short filenames, displayed with whatever default character set that system used.
Note that the article you refer to points out that it is uses OEM character set, this is NOT US-ASCII and it is possible that a filename created using one OEM character set will not display correctly on a system using a different one, which is one of the reasons for Unicode in the first place. The follow example shows a simple text file that only uses 7 bit character codes displayed with the default codepage (850 on my system but I believe the in the US the default is 437), however when I display this file with codepage 20106 (7 bit German) then the results are very different. This is an extreme example but it does highlight there are no guarantees of portability using just 7bit ASCII characters.
$ chcp
Active code page: 850
$ type Test.txt
{ a[i] = '\n'; }
$ chcp 20106
Active code page: 20106
$ type Test.txt
ä aÄiÜ = 'Ön'; ü
Regarding the examples you give the first two appear show the correct operation and confirm that Unicode is supported. The third highlights a limitation of the Busybox implementation of the ls command, I believe that this is by design to prevent it from trying to display non-printable characters which can cause various problems, it does not affect the normal operation of the BB1. The last two examples appear to show that non-Unicode applications which have no connection at all with the Brennan can’t handle Unicode, to use this as a reason for stating that the Brennan can’t handle Unicode, or that Unicode is bad, seems a bit perverse.
Both freeDB and Musicbrainz use Unicode and therefore if use these databases to name your files then you are going to get non-ASCII characters in your filenames, I have numerous files like this which I have copied between the BB1, Windows, a couple of different NAS implementations, iPad and iPhone and used them with various applications without issue.
You may have prefectly good reasons for only using just characters from ‘US-ASCII’, but you statement that Brennan does not support Unicode is still incorrect.
You may have prefectly good reasons for only using just characters from ‘US-ASCII’, but you statement that Brennan does not support Unicode is still incorrect.
On Friday, 12 February 2021 at 12:18:33 UTC Mark Fishman wrote:
By all means, name your files any way you like. If you ever use them on more than one system, I have found that for maximum portability and safety the only reliable character mapping that displays exactly the same way on every hardware/software/OS/filesystem is the 7-bit character set called US-ASCII. Personally I'd rather not wonder if something will break someday, somewhere, because of an accented character or punctuation.But that's just me. Cheers -- m.