Identifying characters unrecognizable to Windows

Skip to first unread message

Lara Friedman-Shedlov

Feb 16, 2022, 2:52:30 PMFeb 16
Hi all,

We frequently ingest born digital collections that were created on an OS that allows characters in folder and filenames that are not recognized by Windows.  These will show up in some contexts as a diamond with a question mark or sometimes as a small bullet point:


I'd like to be able to get a list of any files or folders in a directory that have this issue. Does anyone know a way to search for these characters wherever they appear in file or folder names in a given directory?  

Lara Friedman-Shedlov

Lara D. Friedman-Shedlov    (she / they)  (hear my name)
Digital Records Archivist | Archives & Special Collections 
University of Minnesota Libraries | | 612.626.7972

I acknowledge that the University of Minnesota is located on the traditional, ancestral and contemporary lands of Indigenous people and was built with money from slaveholders

Kieran O Leary

Feb 17, 2022, 4:31:57 AMFeb 17

Yes this is a bit of a pain. I find that the easiest way to detect these is by viewing a directory listing, whether that's a bag checksum manifest or something else. I just tested this with a bag manifest, a teracopy manifest and an FTk Imager CSV listing and it worked.
You could open these listings in something like Notepad++, or any text editor that allows you to search with regular expressions. I'm purely giving Windows instructions here as opposed to BitCurator itself.
If you search for [^\x00-\x7F]+ as a regular expression, you might be able to detect those characters.Hopefully this screenshot works:


Kieran O'Leary
Digital Preservation Manager
National Library of Ireland

You received this message because you are subscribed to the Google Groups "BitCurator Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
Reply all
Reply to author
0 new messages