Imager::QRCode-ing octet sequences vs. zbarimg(1)

Ivan Shmakov

unread,

Mar 13, 2013, 7:40:23 AM3/13/13

to

[AIUI, discussion of Perl modules is more appropriate for
news:comp.lang.perl.modules. Yet, it appears to be abandoned,
thus cross-posting to news:comp.lang.perl.misc. Cross-posting
to news:alt.barcodes, too, just in case.]

I wonder if QR codes are suitable for encoding arbitrary octet
sequences (AKA 8-bit data)? I've tried the following Perl code,
but it appears that the resulting transformations aren't "8-bit
clean." Somehow, I suspect a QR::Imager fault, although
zbarimg(1) may be responsible. (Unfortunately, the Perl module
itself doesn't provide a decoder.)

Any idea what may be going on?

TIA.

(The leading 51522d436f64653a and the trailing 0a after
"Decoded:" are "QR-Code:" and a newline, respectively. In the
first example, the first three octets in the output, 621d4f,
appear to match the input. Incidentally, the fourth octet has
its most significant bit set.)

$ perl \
89br96tnpoogun68sfh1jkj1sb.perl # "use bytes;" commented out
Blob: 621d4f87d3ae92b60932c96b7f81f3a916faff9b03ae54f97d8163987dc8733df1bd8f8b92fb5317657ee2a0a97eed1f12423cdbfa1a73b3166a39cb4b1c0f43
Image: 123 by 123
Decoded: 51522d436f64653a621d4fc287c393c2aec292c2b60932c3896b7fc281c3b3c2a916c3bac3bfc29b03c2ae54c3b97dc28163c2987dc388733dc3b1c2bdc28fc28bc292c3bb5317657ec3a2c2a0c2a97ec3ad1f12423cc39bc3ba1a73c2b3166a39c38b4b1c0f430a
scanned 1 barcode symbols from 1 images in 0.02 seconds

$ perl \
89br96tnpoogun68sfh1jkj1sb.perl # "use bytes;" in place
Blob: 8abdab3e25ae4e44fbc50d9aedcadfb34b1eb959f78ca306bff1182f00024d1ca9e5d7db8827fdd4ab8169a18130cc3de3b31da82150bff080fe57d591f909cf
Image: 99 by 99
Decoded: 51522d436f64653ac28ac2bdc2ab3e25c2ae4e44c3bbc3850dc29ac3adc38ac39fc2b34b1ec2b959c3b7c28cc2a306c2bfc3b1182f0a
scanned 1 barcode symbols from 1 images in 0.02 seconds

$ LC_ALL=C perl \
89br96tnpoogun68sfh1jkj1sb.perl # "use bytes;" in place
Blob: aba7c3b1e7721a22660308e7a7a7f6cfdb48b18fb2143d823021ece0bb2dde2ed0fe2d4b06fb56c4167e867a1f0ef4f495a46a6efb2ce76621fb58b5bd817605
Image: 123 by 123
Decoded: 51522d436f64653ac2abc2a7c383c2b1c3a7721a22660308c3a7c2a7c2a7c3b6c38fc39b48c2b1c28fc2b2143dc2823021c3acc3a0c2bb2dc39e2ec390c3be2d4b06c3bb56c384167ec2867a1f0ec3b4c3b4c295c2a46a6ec3bb2cc3a76621c3bb58c2b5c2bdc28176050a
scanned 1 barcode symbols from 1 images in 0.03 seconds

$ cat < 89br96tnpoogun68sfh1jkj1sb.perl
use bytes;
use common::sense;
use English;

require Imager::QRCode;
require IPC::Open2;

sub rand_blob (;$) {
my ($len) = @_;
$len
//= 24;
open (my $f, "<", "/dev/urandom")
or die ($OS_ERROR);
binmode ($f);
my $s;
die ($OS_ERROR)
unless (read ($f, $s, $len) == $len);
## .
$s;
}

my $blob
= rand_blob (64);
print ("Blob: ", unpack ("H*", $blob), "\n");

my $qr
= Imager::QRCode->new (qw (mode 8-bit casesensitive 1));
my $img
= $qr->plot ($blob);
print ("Image: ", $img->getwidth (),
" by ", $img->getheight (), "\n");

my ($in, $out);
my $pid
= IPC::Open2::open2 ($in, $out, qw (zbarimg -- -))
or die ($OS_ERROR);
binmode ($in);
binmode ($out);

$img->write ("fh" => $out, "type" => "pnm")
or die ($img->errstr ());
close ($out);
my $dec
= <$in>;
print ("Decoded: ", unpack ("H*", $dec), "\n");
$

--
FSF associate member #7257

Ben Morrow

unread,

Mar 13, 2013, 12:27:42 PM3/13/13

to

Quoth Ivan Shmakov <onei...@gmail.com>:

>
> I wonder if QR codes are suitable for encoding arbitrary octet
> sequences (AKA 8-bit data)? I've tried the following Perl code,
> but it appears that the resulting transformations aren't "8-bit
> clean." Somehow, I suspect a QR::Imager fault, although
> zbarimg(1) may be responsible. (Unfortunately, the Perl module
> itself doesn't provide a decoder.)

There is a Perl decoder based on zbar (Barcode::ZBar), though presumably
it would behave the same as zbarimg.

[...]

>
> (The leading 51522d436f64653a and the trailing 0a after
> "Decoded:" are "QR-Code:" and a newline, respectively. In the
> first example, the first three octets in the output, 621d4f,
> appear to match the input. Incidentally, the fourth octet has
> its most significant bit set.)
>
> $ perl \
> 89br96tnpoogun68sfh1jkj1sb.perl # "use bytes;" commented out
> Blob:
> 621d4f87d3ae92b60932c96b7f81f3a916faff9b03ae54f97d8163987dc8733df1bd
> 8f8b92fb5317657ee2a0a97eed1f12423cdbfa1a73b3166a39cb4b1c0f43
> Image: 123 by 123
> Decoded:
> 51522d436f64653a621d4fc287c393c2aec292c2b60932c3896b7fc281c3b3c2a916
> c3bac3bfc29b03c2ae54c3b97dc28163c2987dc388733dc3b1c2bdc28fc28bc292c3
> bb5317657ec3a2c2a0c2a97ec3ad1f12423cc39bc3ba1a73c2b3166a39c38b4b1c0f
> 430a
> scanned 1 barcode symbols from 1 images in 0.02 seconds

~% perl -MEncode -E'say unpack "H*", encode "utf8", pack "H*",
"621d4f87d3ae92b60932c96b7"'
621d4fc287c393c2aec292c2b60932c3896b70

So you have a UTF-8 problem somewhere. (c2 and c3 (or Â and Ã) showing
up unexpectedly is the giveaway here.) Looking at the code, I think it's
zbar which is converting 8859-1 to UTF-8; one way to test this is to
create a QR code containing 17 0xffs at ECC level L; this is the maximum
number of characters that will fit into a 21x21 QR code, so if the code
comes out bigger than that you know there are extra bytes in there
somewhere.

However, it's not unlikely that other QR code readers will do similar
conversions to UTF-8, or other stupid things. Depending on what you're
doing it might be safer to explicitly UTF-8-encode your data (all 8-bit
data can be represented in UTF-8) and then decode it on the other end.
Of course, this will make the codes a little larger than they need to
be.

[...]

> $ cat < 89br96tnpoogun68sfh1jkj1sb.perl
> use bytes;

You should not use 'bytes'. It doesn't ever do anything useful and
sometimes lets you look at parts of the perl internals you shouldn't be
looking at. In previous versions of perl the documentation was
unfortunately not very clear about that. The current version says

NOTICE

This pragma reflects early attempts to incorporate Unicode into perl
and has since been superseded. It breaks encapsulation (i.e. it
exposes the innards of how the perl executable currently happens to
store a string), and use of this module for anything other than
debugging purposes is strongly discouraged.

> use common::sense;
> use English;

You should not use English, it makes your code harder to read for anyone
who knows Perl, and teaches you bad habits.

>
> require Imager::QRCode;
> require IPC::Open2;
>
> sub rand_blob (;$) {

You should not use prototypes unless you need the special parsing
effects they cause.

> my ($len) = @_;
> $len
> //= 24;
> open (my $f, "<", "/dev/urandom")
> or die ($OS_ERROR);
> binmode ($f);
> my $s;
> die ($OS_ERROR)
> unless (read ($f, $s, $len) == $len);
> ## .
> $s;
> }

Unless you need cryptographic randomness (and since you're using
urandom, you don't), it would be better to use something like

sub rand_blob {

my ($len) = @_;
$len //= 24;

return join "", map chr rand 0xff, 0..$len;
}

Ben

Ivan Shmakov

unread,

Mar 13, 2013, 1:28:00 PM3/13/13

to

>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:

[Dropping news:comp.lang.perl.modules and news:alt.barcodes from
Followup-To:.]

[...]

>> $ cat < 89br96tnpoogun68sfh1jkj1sb.perl

>> use bytes;

> You should not use 'bytes'. It doesn't ever do anything useful and
> sometimes lets you look at parts of the perl internals you shouldn't
> be looking at.

Indeed, I've read the documentation. It was my understanding
that, in the nutshell, the "bytes" pragma makes Perl operate
strictly on octet sequences for its strings, instead of allowing
either strings of octets /or/ strings of Unicode characters.

Frankly, I do not see any harm in using this pragma /provided/
that the code doesn't switch it on and off at will.

The question on what setting do the loaded modules use remains
open, but for the specific example I've given (which uses no
text-processing modules) I'd expect the chances of running into
issues to be quite low.

[...]

>> use English;

> You should not use English, it makes your code harder to read for
> anyone who knows Perl, and teaches you bad habits.

? I may be having a bit too much Lisp background, but I've
always considered something_that_one_can_read to be a way better
identifier for a global than, say, ~.

Besides, there's a chance that the code I write will be read by
someone not quite knowing Perl.

[...]

>> sub rand_blob (;$) {

> You should not use prototypes unless you need the special parsing
> effects they cause.

Is there a practical reason to forgo the compile-time arguments'
type checking they offer? For me, code that fails to compile is
better than code that suddenly dies after running for hours.
(Which is still better than the code that dies at the wrong
place; or doesn't die, but silently gives a wrong result.)

[...]

> Unless you need cryptographic randomness (and since you're using
> urandom, you don't), it would be better to use something like

> sub rand_blob {
> my ($len) = @_;
> $len //= 24;
> return join "", map chr rand 0xff, 0..$len;
> }

ACK, thanks. (Although, my guess is that even if urandom(4) is
worse than random(4), Perl's rand is worse, randomness-wise,
still.)

Ivan Shmakov

unread,

Mar 14, 2013, 4:25:52 PM3/14/13

to

>>>>> Ben Morrow <b...@morrow.me.uk> writes:

[...]

> There is a Perl decoder based on zbar (Barcode::ZBar), though
> presumably it would behave the same as zbarimg.

... Or it may not. It definitely worths checking out.

[...]

> So you have a UTF-8 problem somewhere. (c2 and c3 (or Â and Ã)
> showing up unexpectedly is the giveaway here.) Looking at the code,
> I think it's zbar which is converting 8859-1 to UTF-8; one way to
> test this is to create a QR code containing 17 0xffs at ECC level L;
> this is the maximum number of characters that will fit into a 21x21
> QR code, so if the code comes out bigger than that you know there are
> extra bytes in there somewhere.

ACK, thanks! With qw (level L margin 0 size 2) being added to
the parameters, the code now gives (also using $ zbarimg --raw):

Blob: ffffffffffffffffffffffffffffffffff
Image: 42 by 42
Decoded: c3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bf0a
scanned 1 barcode symbols from 1 images in 0.05 seconds

Thus, unless there's some magic in the resulting QR code saying
that it's an ISO-8859-1-encoded string (I'm not familiar with QR
encoding, so can't tell if it's a sensible guess), zbarimg(1),
is indeed to blame, and perhaps the underlying library, too.

> However, it's not unlikely that other QR code readers will do similar
> conversions to UTF-8, or other stupid things. Depending on what
> you're doing it might be safer to explicitly UTF-8-encode your data
> (all 8-bit data can be represented in UTF-8) and then decode it on
> the other end. Of course, this will make the codes a little larger
> than they need to be.

In this case, there'd indeed be some benefit from using the
smallest-possible image. OTOH, I do not expect for the problem
of interoperability to arise anytime soon.

[...]

Ivan Shmakov

unread,

Mar 17, 2013, 1:57:58 PM3/17/13

to

>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:

>> I wonder if QR codes are suitable for encoding arbitrary octet
>> sequences (AKA 8-bit data)? I've tried the following Perl code, but
>> it appears that the resulting transformations aren't "8-bit clean."
>> Somehow, I suspect a QR::Imager fault, although zbarimg(1) may be
>> responsible. (Unfortunately, the Perl module itself doesn't provide
>> a decoder.)

> There is a Perl decoder based on zbar (Barcode::ZBar), though
> presumably it would behave the same as zbarimg.

... Indeed it does, which made me file Debian Bug#703234 [1].

Now, however, given that the Wikipedia article mentions
ISO-8859-1 as the default (?) encoding for 8-bit QR codes, the
issues zbarimg(1) and Barcode::ZBar have may be considered
separately.

Taking into account that different symbologies may (and do) use
different character to code mappings, it may be sensible for
libzbar to recode the barcode read into an UTF-8 string. Better
still is that Perl supports UTF-8 as its native character string
representation. What's wrong, however, is that the UTF-8 string
returned by libzbar to Perl is not properly marked as such, thus
resulting in the observed (and incorrect) behavior.

(The obvious workaround is to Encode::decode_utf8 () the
symbol's data returned by ->get_data ().)

OTOH, zbarimg(1) should probably respect the current locale's
encoding, instead of using UTF-8 unconditionally.

[1] http://bugs.debian.org/703234

[...]

Ben Morrow

unread,

Mar 18, 2013, 7:42:38 PM3/18/13

to

Quoth Ivan Shmakov <onei...@gmail.com>:

> >>>>> Ben Morrow <b...@morrow.me.uk> writes:
> >>>>> Quoth Ivan Shmakov <onei...@gmail.com>:
>
> >> I wonder if QR codes are suitable for encoding arbitrary octet
> >> sequences (AKA 8-bit data)? I've tried the following Perl code, but
> >> it appears that the resulting transformations aren't "8-bit clean."
> >> Somehow, I suspect a QR::Imager fault, although zbarimg(1) may be
> >> responsible. (Unfortunately, the Perl module itself doesn't provide
> >> a decoder.)
>
> > There is a Perl decoder based on zbar (Barcode::ZBar), though
> > presumably it would behave the same as zbarimg.
>
> ... Indeed it does, which made me file Debian Bug#703234 [1].

<pet peeve> The correct place to file a bug in a Perl module is in its
CPAN bug tracker, or, in this case, in the zbar Sourceforce tracker.
Filing a bug with some random distro is Not Helpful, since such reports
frequently don't find their way upstream.

> Now, however, given that the Wikipedia article mentions
> ISO-8859-1 as the default (?) encoding for 8-bit QR codes, the
> issues zbarimg(1) and Barcode::ZBar have may be considered
> separately.

The zbar source implies that some QR codes contain something called an
ECI which explicitly indicates the charset in use. It's not clear to me
without reading the spec (which apparently isn't freely available, grr)
how 'binary' QR codes with no ECI are supposed to be interpreted.

> Taking into account that different symbologies may (and do) use
> different character to code mappings, it may be sensible for
> libzbar to recode the barcode read into an UTF-8 string.

Well, that's only sensible if the bytes are always supposed to represent
characters. libzbar also does more than just recode 8859-1 -> UTF-8: if
I'm reading it right, it tries to guess the encoding, and if the encoded
data is already valid UTF-8 it will leave it alone.

> Better
> still is that Perl supports UTF-8 as its native character string
> representation. What's wrong, however, is that the UTF-8 string
> returned by libzbar to Perl is not properly marked as such, thus
> resulting in the observed (and incorrect) behavior.

This is a bug, yes.

> (The obvious workaround is to Encode::decode_utf8 () the
> symbol's data returned by ->get_data ().)
>
> OTOH, zbarimg(1) should probably respect the current locale's
> encoding, instead of using UTF-8 unconditionally.

I don't know about that: what if the data can't be represented in that
charset?

Ben

Ivan Shmakov

unread,

Mar 30, 2013, 7:02:41 AM3/30/13

to

>>>>> Ben Morrow <b...@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <onei...@gmail.com>:

>>>>> Ben Morrow <b...@morrow.me.uk> writes:

(Thanks for the comments regarding ZBar, BTW. I'm yet to check
its sources myself, but I've also discovered that it behaves
strangely not only for the octets having the most significant
bit set, but for the "plain old" \x0D = \r just as well.)

[...]

>>> There is a Perl decoder based on zbar (Barcode::ZBar), though
>>> presumably it would behave the same as zbarimg.

>> ... Indeed it does, which made me file Debian Bug#703234 [1].

> <pet peeve> The correct place to file a bug in a Perl module is in
> its CPAN bug tracker, or, in this case, in the zbar Sourceforce
> tracker.

BTW, there's a longstanding bug filed at the CPAN RT [2] (along
with a patch.) However, it appears to be filed against
libwww-perl, while it actually belongs to Net-HTTP.

The question is: how do I reassign it?

[2] https://rt.cpan.org/Public/Bug/Display.html?id=29468

> Filing a bug with some random distro is Not Helpful, since such
> reports frequently don't find their way upstream.

Yes. As long as an ideal world is considered, that is.

There're a few things to note, however. The general problems
with upstream may include:

* there's effectively no upstream;

* the code in the distribution may be extensively modified, or
improperly built, or be alleged to be; the upstream then may
discourage the users of "non-authorized" builds to report bugs
directly to them; consider, e. g.:

--cut: http://foo2zjs.rkkda.com/ --
*** DON'T USE the foo2zjs package from:

Ubuntu, SUSE, Mandrake/Manrivia, Debian, RedHat, Fedora, Gentoo,
Xandros, EEE PC, Linpus, MacOSX, or BSD!

*** Download it here and follow the directions below.
--cut: http://foo2zjs.rkkda.com/ --

(or the Joerg Schilling, albeit sufficiently different, case);

* the issue may indeed be specific to the distribution's build;
(naturally, building from the upstream sources for every bug
being I report just to check that it wasn't introduced by the
packagers is hardly an option.)

Personally, I tend to prefer either the Debian BTS, or the
CPAN RT, for these make it possible to file bugs via email,
/and/ are better compatible with Lynx (which happens to be my
primary browser) than most of the other BTS currently in use.
(I'm particularly fond of RT, although the version installed at
CPAN has certain surprising issue when it comes to the
compatibility with non-ECMAScript-enabled browsers.)

Alas, even for the Perl modules, the CPAN RT is not always the
preferred but tracker. Consider, e. g.:

--cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --
Please report issues via github at
https://github.com/gbarr/perl-Convert-ASN1/issues
--cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --

Lastly, given the developer- and user-base of Debian (especially
if the derivatives are included), I'd not call it "random."
That being said, I tend to agree that when the D-M in charge
fails to forward the request to the upstream, the reporter
generally should try to do it him- or herself.

(OTOH, even if D-M forwards the request, it may not have the
desired effect. Consider, e. g., Debian Bug#691221 [3].)

[3] http://bugs.debian.org/691221

[...]

--
FSF associate member #7257 http://hfday.org/