Import ID3 info?

378 views
Skip to first unread message

N Tucker

unread,
Feb 8, 2021, 4:45:43 AM2/8/21
to Brennan Forum
I have a large collection of music which, due to filesystem limitations, is not stored with filenames matching the metadata.  Examples include song titles with question marks and apostrophes in them.  The ID3 is very complete in these files, but the actual files on disk have had their names munged to avoid these sorts of things; generally replaced with underscores or eliminated.  I would like to import this into my B2 without losing the proper metadata, but it seems B2 only supports basing its own metadata on the FAT32 filenames rather than the obvious choice of reading the info from the ID3 in the files.  Is there any way around this?  The approach B2 takes seems really deficient from a data quality sense.

Barring this, I have investigated the db format that B2 uses, and it's very simple.  I could write a utility which populates this DB completely based on the ID3 provided in the files uploaded, but it would need to run on the B2 itself, and would probably need to do something ugly like kill the 'b2' process in order to restart it (it doesn't seem to support HUP).  If the answer is that brennan isn't going to properly support ID3, I may offer this as an open source utility.

Edwyn Corteen

unread,
Feb 8, 2021, 5:08:06 AM2/8/21
to Brennan Forum
Hi, the B2 was conceived as a CD ripper with Album and Track data fetched from on-line sources, so ID3 tags were not used, you can export and import the database so if you are able to match things up this may work with changes made on your laptop, in the Maintenance menu you will find....

b2db to USB C (Copies music index to drive in USB C)
load b2db (Copies music index from drive in USB C)

Peter Lowham

unread,
Feb 8, 2021, 8:57:18 AM2/8/21
to Brennan Forum
Hi ntu,

I think that you can achieve your goal relatively easily, if I have understood your query correctly.  The B2 only looks at file names, track names and metadata when it is ripping / compressing from a CD.  If you 'Import' your collection from a USB device using the prescribed format, then the B2 will preserve your artist names, album names, track names and metadata as they are now.  During the 'Import' run, the B2 will build its internal database (b2db) using your current <Artistname>, <Albumname> and <Trackname>.  So you would not need to kill the b2 process.

I have attached a PDF file which gives the required structure of the 'Import' data set as well as the rules to work with to enable the B2 to successfully complete the 'Import'.

Regards,
Peter.
Export & Import Rules for Brennan.pdf

N Tucker

unread,
Feb 8, 2021, 3:42:04 PM2/8/21
to Brennan Forum
I understand the current limitation, but "it's just a CD ripper so it doesn't do that" is a really lousy answer to still be giving in 2021 (note: the website advertises it also as a "jukebox" and "Complete audio system"), and so is "just spend a few days on the manual labor of renaming files."  It's not like ID3 is some exotic thing that's hard to understand, and encoding all the metadata in FAT32 filenames and directory names is asinine.  This problem was literally solved over 20 years ago and when I bought the device, better support for ID3 was mentioned in the FAQ as something they might do in the future.

Brennan, please fix this deficiency.  Just read the ID3 tags for track title, artist, album, and track number, and fall back to filenames if that's not available.  How hard is this?

BTW, I tried importing my whole music library from USB and it seems to have succeeded (although the metadata quality is very low because it was taken from filenames, which have a mix of naming conventions from various music sources over the years, so not only do I have "The Pretenders" but also "The_Pretenders" entries), but now the web UI is completely broken because /b2gci.fcgi is returning 500 Internal Server Error.

PMB

unread,
Feb 9, 2021, 4:29:44 AM2/9/21
to Brennan Forum
Hi ntu...

If the web UI is playing up try closing it >> clear browsing data >> open a new browser page >> enter the IP Address manually.

Paul
Brennan Support.

N Tucker

unread,
Feb 9, 2021, 7:34:48 PM2/9/21
to Brennan Forum
It's something in the db.  I ssh'd in and truncated the b2db file to about half its original size and the 500 errors went away immediately, so I suspect something about one or more of the filenames in the second half of the db is triggering a bug.  I'll do a manual binary search when I have the time and see if I can pinpoint it.

PMB

unread,
Feb 10, 2021, 4:59:25 AM2/10/21
to Brennan Forum
Hi ntu...

Have a look for illegal characters ?/ etc.

Paul
Brennan Support.

N Tucker

unread,
Feb 10, 2021, 11:54:43 AM2/10/21
to Brennan Forum
The fact that song metadata may contain "Illegal" characters is a perfect illustration as to why using filenames as authoritative on metadata is stupid and amateurish.  Of course my song titles contain question marks and slashes.  If this is the reason it's broken I'm giving up on this thing entirely.

Mark Fishman

unread,
Feb 10, 2021, 4:58:00 PM2/10/21
to Brennan Forum
I'm going to start with a riddle that Abraham Lincoln used to ask his staff: If you call a tail a leg, how many legs does a dog have?

The answer is FOUR: calling a tail a leg does not make it one.

It's perfectly all right for metadata tags to contain characters that would be illegal in a filesystem. The problems arise when you try to create filenames that contain characters not permitted in the filesystem.

The filenames are NOT metadata. They are NAMES. There's a difference. The metadata -- data about data -- is stored inside the files. What you CALL the FILES is completely independent of what is IN the files.

Martin Brennan has used filenames to find and play files containing music for many years. It's one (but only one) of the reasons that I didn't buy the previous model, JB7. The Brennan devices do not use metadata to identify/describe music: the files are identified by their names.

I agree with you that it would be more flexible, and in some ways preferable, for the device to use information stored inside the files in tags. It would also mean much more device complexity: for one thing, to use/search/display all that information, it would have to be stored in a database structure -- not a flat text file laughingly called a "db" -- so you'd probably lose the ability to see and modify what is in the device as if it were a networked drive. If that's really what you want, there already devices on the market that do that sort of thing: Cocktail Audio has several, and the prices go up from there.

You already have files whose names do not match their ID3 tags, for the very reasons you already cite: the filesystem doesn't permit it. If you put the tag information into the b2db file, Brennan devices will not find the files, because the b2db file contents have to match the filesystem contents: file NAMES.

Stop trying to make the device do things it is designed NOT TO DO. If what it does isn't what you want, get something else. You've explained your reasons. Vote with your wallet.

Daniel Taylor

unread,
Feb 10, 2021, 5:14:38 PM2/10/21
to Brennan Forum
Interesting.  I'd never thought about it before, but if a search had to open each track file and read the pertinent data, it would take more than twice as long as just looking at the filename.

N Tucker

unread,
Feb 10, 2021, 11:29:34 PM2/10/21
to Brennan Forum
It really needn't cause any performance impact.  You would just store those things in the "db" as is already done; you'd just get them out of the MP3 as they are imported, in which case you're already reading the entire file.  I have personally written code to extract ID3 info from my entire music collection and it's plenty fast, even on a raspberry pi.

@Brennan folks: can you give me a spec for what characters are "illegal"?  Anything beyond those mentioned?  If I'm going to put effort into code that transforms my music library so I can avoid the bugs I'd prefer not to trial-and-error it for days to rid it of all the forbidden characters one at a time.  My plan is to write a simple utility which will take an arbitrary set of MP3s and copy them into the folder structure you require.

Mark Fishman

unread,
Feb 11, 2021, 5:46:27 AM2/11/21
to Brennan Forum
Not an official "Brennan" answer, but based on experience using multiple operating systems -- feel free to disregard:
If you want to avoid causing trouble for yourself, take a look at https://www.mtu.edu/umc/services/digital/writing/characters-avoid/

The Brennan devices run Linux. The disk is formatted using the FAT32 filesystem and thus does not use Unicode for filenames. Character mappings on Linux, MS-Windows, macOS, and other operating systems diverge for 8-bit characters, so for safety and portability you should restrict the character set to US-ASCII. Brennan's own music-player code breaks for filenames longer than (approx.) 170 characters. Entries in the b2db file must match actual filenames or directorynames.

That's a start. -- m.


AJ

unread,
Feb 11, 2021, 8:49:48 PM2/11/21
to Brennan Forum
>>> The disk is formatted using the FAT32 filesystem and thus does not use Unicode for filenames

This isn't correct, the disk may be formatted as FAT32 but it also includes long filename support (VFAT).  The character encoding used for long filename is Unicode, I believe that the actual encoding is UCS-2 (a predecessor to UTF-16), therefore there should be no need to restrict yourself to just US-ASCII.  Because the character encoding is Unicode the filenames should be correctly displayed on any operating system.

The invalid characters that can not be used are the ones that have some meaning to the operating system/file system, these are <>"/\|?*%^: see https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words for more information.

The following is some simple python code that I use to replace invalid characters.


invalid_characters = r'<>"/\|?*%^:'

#
# Replace any characters that are invalid in file names.  The output should be a
# string that can be used as file or directory name but still maintain a reasonable
# level of readability.
#
def replace_invalid_characters(old_string):
    #
    # Remove any full stops at the end of the name.
    #
    new_string = str(old_string).rstrip('.')

    #
    # Replace double quoted string with left/right double quotes.
    #
    double_quote_count = old_string.count('"')

    if double_quote_count != 0 and (double_quote_count % 2) == 0:
        new_string = ''
        new_char = '“'

        for old_char in old_string:
            if old_char == '"':
                new_string += new_char

                if new_char == '“':
                    new_char = '”'
                else:
                    new_char = '“'
            else:
                new_string += old_char

    #
    # Replace characters that have a fairly obvious substitute
    #
    new_string = new_string.replace('<', '[')
    new_string = new_string.replace('>', ']')
    new_string = new_string.replace('"', "'")
    new_string = new_string.replace('*', '!')

    #
    # Just remove question marks
    #
    new_string = new_string.replace('?', '')

    #
    # Replace the other invalid characters with the sequence <space><middle dot><space>
    #
    for c in invalid_characters:
        new_string = new_string.replace(c, ' · ')

    #
    # Remove any sequence of multiple spaces.
    #
    while new_string.find('  ') > 0:
        new_string = new_string.replace('  ', ' ')

    #
    # Remove any sequence of multiple middle dots.
    #
    while new_string.find('· ·') > 0:
        new_string = new_string.replace('· ·', '·')

    #
    # Add any full stops back if it makes sense.
    #
    if old_string[-3:] == '...':
        new_string += '...'
    elif (old_string[-1] == '.') and not (old_string[-2].islower() or old_string[-2].isdigit()):
        new_string += '.'

    return new_string

Mark Fishman

unread,
Feb 12, 2021, 7:18:33 AM2/12/21
to Brennan Forum
According to Microsoft (who as we know is never wrong and always documents things clearly) at 
"NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set. For more information, see Code Pages." Also, "Windows code page and OEM code page character sets used on Japanese-language operating systems contain the Yen symbol (¥) instead of a backslash (\). Thus, the Yen symbol is a prohibited character for NTFS and FAT file systems."

Here is a track title with an accented character, as it appears on Windows 10 (in a command window or the File Explorer window):
Pièces de clavecin_ No. 19, Rondeau in D major _Les tourbillons_ (Rameau) (Anton Heiller soloist).mp3

Here it is in the b2db file (as viewed using vi on the B2, in an ssh session):
Pièces de clavecin_ No. 19, Rondeau in D major _Les tourbillons_ (Rameau) (Anton Heiller soloist).mp3

Here it is as it appears on the B2 (looking at the directory entry in an ssh session):
Pi??ces de clavecin_ No. 19, Rondeau in D major _Les tourbillons_ (Rameau) (Anton Heiller soloist).mp3

Here it is in the b2db file as viewed with QB64 or in the VEDIT text editor on Windows 10:
Pièces de clavecin_ No. 19, Rondeau in D major _Les tourbillons_ (Rameau) (Anton Heiller soloist).mp3

Here it is on my b2Export directory as viewed with QB64 on Windows 10:
PiŠces de clavecin_ No. 19, Rondeau in D major _Les tourbillons_ (Rameau) (Anton Heiller soloist).mp3

I haven't taken this disk to my wife's Mac, but experience tells me it will probably have Yet Another appearance depending on the tool I use to look at the files. That's also likely on the iPod(s) I have, on my car's audio system display, and so on. On occasion in the past, I have had files that couldn't be renamed or deleted (except over the network using a client with a different OS from the one running on the host server) because what should have been readable filenames had apparently scrambled a directory display.

By all means, name your files any way you like. If you ever use them on more than one system, I have found that for maximum portability and safety the only reliable character mapping that displays exactly the same way on every hardware/software/OS/filesystem is the 7-bit character set called US-ASCII. Personally I'd rather not wonder if something will break someday, somewhere, because of an accented character or punctuation.

But that's just me. Cheers -- m.


On Thursday, February 11, 2021 at 8:49:48 PM UTC-5 AJ wrote:
>>> The disk is formatted using the FAT32 filesystem and thus does not use Unicode for filenames

This isn't correct, the disk may be formatted as FAT32 but it also includes long filename support (VFAT).  The character encoding used for long filename is Unicode, I believe that the actual encoding is UCS-2 (a predecessor to UTF-16), therefore there should be no need to restrict yourself to just US-ASCII.  Because the character encoding is Unicode the filenames should be correctly displayed on any operating system.

The invalid characters that can not be used are the ones that have some meaning to the operating system/file system, these are <>"/\|?*%^: see https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words for more information.



N Tucker

unread,
Feb 12, 2021, 5:51:28 PM2/12/21
to Brennan Forum
If your users are reading the spec for FAT32 to get an import to work correctly, you're doing consumer products wrong.  Especially when there's a really, really simple solution.

AJ

unread,
Feb 12, 2021, 6:29:56 PM2/12/21
to Brennan Forum

You are correct in that FAT12, FAT16, and FAT32 file systems use the OEM character set, they also only support 8.3 filenames, however the VFAT long filename extensions encoded as Unicode characters.  As you can see from the following output from my BB1 it is using VFAT and therefore it is using Unicode. 

# df -T

Filesystem           Type       1K-blocks      Used Available Use% Mounted on

/dev/root            ext4         1761268    207704   1446048  13% /

devtmpfs             devtmpfs      200328         0    200328   0% /dev

tmpfs                tmpfs         204520         0    204520   0% /dev/shm

tmpfs                tmpfs         204520       644    203876   0% /tmp

tmpfs                tmpfs         204520        48    204472   0% /run

/dev/mmcblk0p1       vfat           31640      7270     24370  23% /boot

/dev/mmcblk0p3       vfat        29297152  16711344  12585808  57% /media/sd3

/dev/b2_usba_p1      vfat        31932416  16206288  15726128  51% /media/usba

# 

If you plugged one of these devices into a system that supports FAT32 but not the VFAT extensions you would still be able to access the files on that device but you would only see the 8.3 short filenames, displayed with whatever default character set that system used. 

Note that the article you refer to points out that it is uses OEM character set, this is NOT US-ASCII and it is possible that a filename created using one OEM character set will not display correctly on a system using a different one, which is one of the reasons for Unicode in the first place.  The follow example shows a simple text file that only uses 7 bit character codes displayed with the default codepage (850 on my system but I believe the in the US the default is 437), however when I display this file with codepage 20106 (7 bit German) then the results are very different.  This is an extreme example but it does highlight there are no guarantees of portability using just 7bit ASCII characters.

 

$ chcp

Active code page: 850

$ type Test.txt

{ a[i] = '\n'; }

 

$ chcp 20106

Active code page: 20106

$ type Test.txt

ä aÄiÜ = 'Ön'; ü


Regarding the examples you give the first two appear show the correct operation and confirm that Unicode is supported.  The third highlights a limitation of the Busybox implementation of the ls command, I believe that this is by design to prevent it from trying to display non-printable characters which can cause various problems, it does not affect the normal operation of the BB1.  The last two examples appear to show that non-Unicode applications which have no connection at all with the Brennan can’t handle Unicode, to use this as a reason for stating that the Brennan can’t handle Unicode, or that Unicode is bad, seems a bit perverse.

Both freeDB and Musicbrainz use Unicode and therefore if use these databases to name your files then you are going to get non-ASCII characters in your filenames, I have numerous files like this which I have copied between the BB1, Windows, a couple of different NAS implementations, iPad and iPhone and used them with various applications without issue.

You may have prefectly good reasons for only using just characters from ‘US-ASCII’, but you statement that Brennan does not support Unicode is still incorrect.

Mark Fishman

unread,
Feb 12, 2021, 6:47:51 PM2/12/21
to Brennan Forum
My statement (twice) was "Character mappings on Linux, MS-Windows, macOS, and other operating systems diverge for 8-bit characters, so for safety and portability you should restrict the character set to US-ASCII."

Having spent many hours recovering filesystems that were written to from multiple operating systems, and read from by multiple operating systems, and having transported many fiules among many different devices, I stand by that statement. For safety and portability, the most common subset of Unicode -- whether UTF-8 or UTF-16 -- and other types of character mappings is likely to be ASCII, which does not provide accented characters. One should also avoid characters that have "special" meanings or functions in the command shells of various operating systems, e.g., some of the shifted number characters on most keyboards and the two types of slashes.

I don't make the rules. I certainly am in no position to tell you or anyone else what you must, or even should, do. I can only relate my own experience and conclusions. But, as the exasperated parent said to the child, "If you fall off that rock and break both legs, don't come running to me."

-- m.

On Friday, February 12, 2021 at 6:29:56 PM UTC-5 AJ wrote:

You may have prefectly good reasons for only using just characters from ‘US-ASCII’, but you statement that Brennan does not support Unicode is still incorrect.

On Friday, 12 February 2021 at 12:18:33 UTC Mark Fishman wrote:
By all means, name your files any way you like. If you ever use them on more than one system, I have found that for maximum portability and safety the only reliable character mapping that displays exactly the same way on every hardware/software/OS/filesystem is the 7-bit character set called US-ASCII. Personally I'd rather not wonder if something will break someday, somewhere, because of an accented character or punctuation.

But that's just me. Cheers -- m.

AJ

unread,
Feb 12, 2021, 7:20:20 PM2/12/21
to Brennan Forum
I don’t disagree the 8 bit character sets diverge, but so do 7 bit character sets.  The whole purpose of Unicode was clear up this mess so that a file created on any system can be correctly read on any other system.  In my experience this is what it does and even if you limit yourself to ‘US-ASCII’ characters when naming files it will be still encoded as Unicode.  

In your earlier post you stated that ‘The disk is formatted using the FAT32 filesystem and thus does not use Unicode for filenames’.  All I am trying to do is clarify that it uses VFAT and it does support Unicode.  Whether Unicode is a good idea or not we will have to disagree on. 

Mark Fishman

unread,
Feb 23, 2021, 8:10:43 AM2/23/21
to Brennan Forum
I have finally found a Microsoft-related source that confirms your statement about long filenames being in Unicode instead of the local character set, even on FAT32, and I apologize for my error. (I still think that if one is interested in portability one should take that into account when naming, but I was wrong about LFN support on FAT32.)

"Random musings on the introduction of long file names on FAT" states:
"One last historical note: The designers of this system didn’t really expect Windows NT to adopt long file names on FAT, since Windows NT already had its own much-better file system, namely, NTFS. If you wanted long file names on Windows NT, you’d just use NTFS and call it done. Nevertheless, the decision was made to store the file names in Unicode on disk, breaking with the long-standing practice of storing FAT file names in the OEM character set. The decision meant that long file names would take up twice as much space (and this was back in the days when disk space was expensive), but the designers chose to do it anyway 'because it’s the right thing to do.' And then Windows NT added support for long file names on FAT and the decision taken years earlier to use Unicode on disk proved eerily clairvoyant."

Me My Self and I

unread,
Mar 2, 2021, 1:35:08 PM3/2/21
to Brennan Forum
It's a relatively simple procedure to load a folder or disk's worth of mp3 files into mp3Tag and then ask it to create a filename/folder structure from the mp3 ID3 tags. Those files and folders can then be used as the import structure for the Brennan. mp3Tag can also be used to create any album artwork from the artwork embedded in the ID3 tags.

Personally, having used a music server for a large number of years I have always exported the mp3 files from iTunes using an AppleScript which replaces any non-alphanumeric characters with underscores. That (as pointed out eloquently by Mark Fishman above) ensures maximum compatibility with Android, iPod, HiFidelio, smart TVs, in car media devices, everything. For the Brennan I have modified this script to allow spaces in folders and filenames as it seems it can cope with these and it keeps things tidy.
Reply all
Reply to author
Forward
0 new messages