--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digital-curati...@googlegroups.com.
To post to this group, send email to digital-...@googlegroups.com.
Visit this group at https://groups.google.com/group/digital-curation.
For more options, visit https://groups.google.com/d/optout.
It's not an aside- it's a front and center :-)
BagIt allows the use of any character encoding as long as it's UTF-8, and any character in the Unicode repertoire except for byte order markers (BOMs) [asterisk end of line markers].
Bagger uses Java strings, which are UTF-16 internally, and which unless otherwise specified, are decoded using a configurable default decoder. Multiple decoders can be used in the same application, but it can be a little bit trickier to wrangle things in file name translation. Not impossible, but trickier.
Important diagnostic information :
1. Is there a stack trace for where the error is being thrown?
2. What type of file system did the files come from?
3. What type of file system are the files unpacked on to?
4. Do the filenames show up properly when looking at them on the bagging machine (with accents)?
5. Is there are sample disk image with problematic files you can provide?
Simon