Identifying characters unrecognizable to Windows

20 views
Skip to first unread message

Lara Friedman-Shedlov

unread,
Feb 16, 2022, 2:52:30 PMFeb 16
to bitcurat...@googlegroups.com
Hi all,

We frequently ingest born digital collections that were created on an OS that allows characters in folder and filenames that are not recognized by Windows.  These will show up in some contexts as a diamond with a question mark or sometimes as a small bullet point:

image.png

I'd like to be able to get a list of any files or folders in a directory that have this issue. Does anyone know a way to search for these characters wherever they appear in file or folder names in a given directory?  

Thanks,
Lara Friedman-Shedlov




--
Lara D. Friedman-Shedlov    (she / they)  (hear my name)
Digital Records Archivist | Archives & Special Collections 
University of Minnesota Libraries | lib.umn.edu | 612.626.7972

I acknowledge that the University of Minnesota is located on the traditional, ancestral and contemporary lands of Indigenous people and was built with money from slaveholders

Kieran O Leary

unread,
Feb 17, 2022, 4:31:57 AMFeb 17
to bitcurat...@googlegroups.com
Hi,

Yes this is a bit of a pain. I find that the easiest way to detect these is by viewing a directory listing, whether that's a bag checksum manifest or something else. I just tested this with a bag manifest, a teracopy manifest and an FTk Imager CSV listing and it worked.
You could open these listings in something like Notepad++, or any text editor that allows you to search with regular expressions. I'm purely giving Windows instructions here as opposed to BitCurator itself.
If you search for [^\x00-\x7F]+ as a regular expression, you might be able to detect those characters.Hopefully this screenshot works:
image.png

Best,

Kieran O'Leary
Digital Preservation Manager
National Library of Ireland

--
You received this message because you are subscribed to the Google Groups "BitCurator Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcurator-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcurator-users/CAKWpb_bNShwL-6JsCtD12Rbby0cP2GERr6-KiJx1P7PG3GwwuQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages