Issue 103 in zxing: QR Code Chinese Characters in UTF8

63 views

Skip to first unread message

codesite...@google.com

unread,

Nov 6, 2008, 11:03:46 PM11/6/08

to zx...@googlegroups.com

Issue 103: QR Code Chinese Characters in UTF8
http://code.google.com/p/zxing/issues/detail?id=103

New issue report by ysakaed:
What steps will reproduce the problem?
1. Create the image from quickmark website.
2. Test with QR Code image

What is the expected output? What do you see instead?
expected : 123測試的文字
what i see is : 123貂ｬ隧ｦ逧�枚蟄�

What version of the product are you using? On what operating system?
Online decoder

Please provide any additional information below.
Does the decoder support Chinese characters in UTF8? Do QR Code images
store encoding information in the image? I have tried the cpp version and
when its guessing the encoding, and the bytes never actually starts with
UTF-8 byte mark.

Attachments:
081007115124fg1.png 1.1 KB

Issue attributes:
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

codesite...@google.com

unread,

Nov 8, 2008, 9:35:11 AM11/8/08

to zx...@googlegroups.com

Issue 103: QR Code Chinese Characters in UTF8
http://code.google.com/p/zxing/issues/detail?id=103

Comment #1 by srowen:
This is a tough one.

Really, you can't use UTF-8 in a QR Code. QR Code 'byte mode' assumes
ISO-8859-1 by
default. QR Code provides allows for defining character encoding by use of
ECI
segments (see the spec). The decoder does support these. But I do not know
of a
character set ECI that selects UTF-8.

The decoder does try to guess the encoding though, since in practice, many
QR Codes
just use Shift_JIS in byte mode (instead of Kanji mode) instead of
ISO-8859-1. The
decoder even tries to guess UTF-8.

The decoder will guess UTF-8 if the bytes start with a UTF-8 byte order
mark, but
this one doesn't.

The problem is that this short message encoded in UTF-8 is the valid
encoding of a
string in Shift_JIS, so that is what is guessed in this case.

So, I am saying this symbol is not correctly encoded. To address it:
1) add a UTF-8 byte order mark at the start (EF BB BF) or
2) specify UTF-8 with an ECI segment (and then let me know what it is so we
can
support it since i've not found this value yet!) or
3) use an alternate encoding for Chinese, one that is supported by ECI, and
define
the character set via ECI in the encoding