a UTF-8 complaint

9 views
Skip to first unread message

Rainer Weikusat

unread,
Dec 17, 2021, 2:16:13 PM12/17/21
to
perl absolutely urgently needs a "stop fucking with my binary data,
unicode morons, I need it like it is" pragma.

Background: UNIX filenames are abitrary sequences of bytes whose values
are neither 47 nor 0. And - coincidentally - UNIX filenames sometimes
need to be handled in applications. Imagine that.

E. Choroba

unread,
Dec 19, 2021, 5:42:15 PM12/19/21
to
Can you provide an example? Perl doesn't change from bytes to unicode without being told to do so.

Rainer Weikusat

unread,
Dec 20, 2021, 10:18:01 AM12/20/21
to
Bug in my code in this case. That's why I cancelled the
posting. Nevertheless, a way to mark something as "leave alone under all
circumstances" would be helpful. I've had cases of silent conversion of
binary data to UTF-8 because a string 'touched' another string with the
utf8 flag set but without bytes outside of 0 - 127 in the past.

Rainer Weikusat

unread,
Dec 20, 2021, 10:33:29 AM12/20/21
to
Contrived example showing that:

--------
my $bin = "\x80\x98";

my $un = "\N{U+16a2}";
$un =~ s/^./a/;

use Devel::Peek;

Dump($bin);

$bin .= $un; # should be possible to get a warning or error here
Dump($bin);
--------

In the actual code where this bit me, one of the strings had the utf8
flag set because it came (via DBD::Pg) from a postrgres table.
Reply all
Reply to author
Forward
0 new messages