Comment #1 on issue 339 by sro...@gmail.com: Online Decoder - Charset /
Character encoding issue
http://code.google.com/p/zxing/issues/detail?id=339
Yes it has to do with the decoder having to guess the right encoding for
the text,
since there's not really proper support for encoding non-Latin characters
in QR
codes.
That encoder is using UTF-8, always. Not a bad choice, but it's not
something
support by QR codes technically. The decoder recognizes the bytes aren't
ISO-8859-
1, and has to guess at the encoding. The only plausible alternatives* are
UTF-8 and
Shift_JIS. Shift_JIS is "usually" the right answer since tons of QR codes
in Japan would
use this encoding, in byte mode (even when they ought use Kanji mode but oh
well).
So the guessing procedure heavily favors Shift_JIS and that's what happens
here. The
first one is long enough that the heuristics think it's not Shift_JIS,
guess UTF-8, and
all is well. The second one is short and so it doesn't have enough
confidence to guess
UTF-8.
The real solution is to encode with ISO-8859-1. You can do that with the
project's
encoder code, where you can force the encoding.
The guessing procedure could always be better but I'm unlikely to put much
effort
into it.. it's always going to get something wrong and it's too easy to
mess up some
Japanese QR codes.
* well, unless you specify an ECI segment in the code. THat's the right
way, and it's
supported. Unfortunately ECI is not really documented, so it's useless in
practice.