Issue 339 in zxing: Online Decoder - Charset / Character encoding issue

8 views

Skip to first unread message

zx...@googlecode.com

unread,

Feb 4, 2010, 10:26:11 AM2/4/10

to zx...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 339 by juergen.treml: Online Decoder - Charset / Character
encoding issue
http://code.google.com/p/zxing/issues/detail?id=339

I'm experiencing a weird issue using the online decoder.

I've created two QR codes with contact information
1) MECARD:N:Jürgen;TEL:1234567890;URL:www.test.com;EMAIL:te...@test.com;;
(http://chart.apis.google.com/chart?cht=qr&chs=350x350&chl=MECARD%3AN%3AJ%C3%BCrgen%3BTEL%3A1234567890%3BURL%3Awww.test.com%3BEMAIL%3Atest%40test.com%3B%3B)
2) MECARD:N:Jürgen;;
(http://chart.apis.google.com/chart?cht=qr&chs=350x350&chl=MECARD%3AN%3AJ%C3%BCrgen%3B%3B)

The first one decodes correctly showing the name as "Jürgen", the second
one decodes with the name displaying as "Jﾃｼrgen", i.e. some chinese /
japanese characters.

This is only one example, but I've experienced the same issue in other
cases, too. Sometimes non-ASCII characters decode correctly, sometimes not.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

zx...@googlecode.com

unread,

Feb 4, 2010, 11:36:05 AM2/4/10

to zx...@googlegroups.com

Updates:
Status: NotABug
Labels: -Priority-Medium Priority-Low

Comment #1 on issue 339 by sro...@gmail.com: Online Decoder - Charset /

Character encoding issue
http://code.google.com/p/zxing/issues/detail?id=339

Yes it has to do with the decoder having to guess the right encoding for
the text,
since there's not really proper support for encoding non-Latin characters
in QR
codes.

That encoder is using UTF-8, always. Not a bad choice, but it's not
something
support by QR codes technically. The decoder recognizes the bytes aren't
ISO-8859-
1, and has to guess at the encoding. The only plausible alternatives* are
UTF-8 and
Shift_JIS. Shift_JIS is "usually" the right answer since tons of QR codes
in Japan would
use this encoding, in byte mode (even when they ought use Kanji mode but oh
well).

So the guessing procedure heavily favors Shift_JIS and that's what happens
here. The
first one is long enough that the heuristics think it's not Shift_JIS,
guess UTF-8, and
all is well. The second one is short and so it doesn't have enough
confidence to guess
UTF-8.

The real solution is to encode with ISO-8859-1. You can do that with the
project's
encoder code, where you can force the encoding.

The guessing procedure could always be better but I'm unlikely to put much
effort
into it.. it's always going to get something wrong and it's too easy to
mess up some
Japanese QR codes.

* well, unless you specify an ECI segment in the code. THat's the right
way, and it's
supported. Unfortunately ECI is not really documented, so it's useless in
practice.

Reply all

Reply to author

Forward

0 new messages