Note: please ignore my earlier post "Script for handling UTF-16 files". Use this method instead.
Microsoft programs have mostly standardized on the UTF-16 (UCS-2) encoding for their Unicode files. Sadly, none of common open source version control systems (Git, Mercurial and Bazaar) handled this encoding properly. These programs all treat UTF-16 files as binary, because they usually contain a lot of NULL characters.
My first attempt to solve this problem involved using a diff textconv conversion to display UTF-16 files in diffs. But this approach is it is limited to diffs only.
Fortunately, Git provides a much better fix: a clean & smudge filter. This allows Git to treat your UTF-16 files as text in most cases: merge, git-grep, gitattributes (eol-conversion, ident-replacement, built-in diff patterns...).
Here is how to set up this much nicer handling of UTF-16 files in msysGit for Windows:
- Get Gnu libiconv, and install
- Ensure that the libiconv directory (usually "C:\Program Files\GnuWin32\bin") is in your %PATH%
- Add the following to ~\Git\etc\gitconfig:
[filter "winutf16"]
clean = iconv -f utf-16 -t utf-8
smudge = iconv -f utf-8 -t utf-16
required - Add this line to your global ~/Git/etc/gitattributes or local ~/.gitattributes:
*.txt filter=winutf16
That's it! UTF-16 files should look and work normally for most Git functions.
Thanks to Karsten Blees and Erik Faye-Lund for their help!
-Ken