You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to OneCommander
Background ZIP file specifications historically use the system's local encoding (e.g., CP936 for Chinese, Shift_JIS for Japanese) to store filenames, rather than UTF-8. This causes cross-language compatibility issues: for example, filenames created by Japanese users (Shift_JIS) will appear as garbled text when extracted by Chinese users (using GBK/CP936), and vice versa.
This issue often happens in many content-creation communities that strongly rely on international communication, such as UTAU singing voice synthesis and MMD animation.
Example I'm using Windows zh-CN. Here is a zip file from Japan. All the file names in the zip file are in Japanese.
This is the file extracted with OneCommander (filenames are incorrectly decoded with GB2312, resulted in garbled Chinese text)
This is what the filenames supposed to be (decoded with Shift-JIS)
Things to do
Support extracting a zip file with user-chosen encoding
Auto detect encoding
When creating a zip file, use UTF-8 encoding by default (which won't get garbled if I send it to a foreigner)
Proposed technical implementation details I've implemented the features above in Files, another third party file namager for Windows written in C#. Here are my PRs:
https://github.com/files-community/Files/pull/17022 (using SharpZipLib which support extracting with specified encoding)
https://github.com/files-community/Files/pull/17026 (add parameter "cu on" when adding files into archive)
https://github.com/files-community/Files/pull/17045 (using UTF.Unknown to detect encoding) Why I'm working on this I'm a developer of OpenUtau, an open source C# implementation of UTAU, a singing voice synthesis system that comes from Japan. Due to the zip encoding issue, when a non-Japanese user downloads a Japanese voicebank, it will become garbled. So OpenUtau introduced a builtin voicebank extractor that let the user choose encoding, and a voicebank publisher that creates UTF-8 zip archive for global release.
So I'm checking existing file management tools for their status of zip globalization, and working on these projects if it's written in a programming language that I know how to use. References Wikipedia: https://en.wikipedia.org/wiki/ZIP_(file_format)#Internationalization_issues
ZipUnicode, a tool for detecting the encoding of a zip file: https://pypi.org/project/ZipUnicode/