Mojo::Exception and UTF-8 - again...

51 views
Skip to first unread message

Peter Valdemar Mørch

unread,
Nov 7, 2016, 6:35:34 AM11/7/16
to Mojol...@googlegroups.com
Hi,

Today we encountered an exception in a source file that wasn't in UTF-8. This caused first a warning and then an error in the morbo output. Apache2 showed a "502 Proxy Error: Error reading from remote server" from in our apache2+morbo setup.

The file on-disk is "binary", because it uses a perl source filter. Once given to the parser, it *is* UTF-8. The same situation could've been occurred in a 3rd-party library if it was encoded in other than [UTF-8, ASCII]. Something similar was previously discussed in an old thread, Molo::Exception generate warnings while read non-utf8 file, but no solution was presented there.

This occurs when an exception is encountered in a source file, and "Mojo::Exception->throw($@)" is called because of Mojolicious.pm's

local $SIG{__DIE__}
  = sub { ref $_[0] ? CORE::die $_[0] : Mojo::Exception->throw(shift) };

First, Mojo::Exception's "sub inspect" gave warnings because it guesses the source file from parsing $@ and opens the source file with:

next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
$self->_context($file->[1], [[<$handle>]]);

Next we get a "Malformed UTF-8 character" fatal error from Mojolicious (when generating output?) because of a s/// in Mojo::Util's xml_escape:

sub xml_escape {
  return $_[0] if ref $_[0] && ref $_[0] eq 'Mojo::ByteStream';
  my $str = shift // '';
  $str =~ s/([&<>"'])/$XML{$1}/ge;
  return $str;
}

It seems to be that the reading of non-UTF-8 data causes a warning, but the later s/// causes a fatal exception. The behavior I see is replicated by this snippet:

#!/usr/bin/perl -w
use strict;
open my $handle, '<:utf8', "binary.bin";
my $str = join('', <$handle>);
$str =~ s/([&<>"'])/foo/;

I'd like to suggest that reading a source file with UTF-8 problems be treated as if the file was unreadable. To that end, I suggest this patch (against current master HEAD 49dd3e7):

londo@peter:~/work/mojo> git diff
diff --git a/lib/Mojo/Exception.pm b/lib/Mojo/Exception.pm
index b218ee0..08759f5 100644
--- a/lib/Mojo/Exception.pm
+++ b/lib/Mojo/Exception.pm
@@ -20,7 +20,14 @@ sub inspect {
   # Search for context in files
   for my $file (@files) {
     next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
-    $self->_context($file->[1], [[<$handle>]]);
+    # If there are UTF-8 problems in the source file, don't store any context
+    eval {
+      use warnings 'FATAL' => 'utf8';
+      $self->_context($file->[1], [[<$handle>]]);
+    };
+    if ($@) {
+      next;
+    }
     return $self;
   }
 

For ASCII or UTF-8 encoded source files there is no change. For files with UTF-8 problems, they are simply not stored as _context(). But they also don't cause the entire application to crash. The best of both worlds.

Would that be acceptable? If so, should I just create a github PR? If not, can somebody suggest an alternative given that some source files are in fact not UTF-8?

Sincerely,

Peter

Peter Valdemar Mørch

unread,
Nov 7, 2016, 7:02:14 AM11/7/16
to Mojol...@googlegroups.com
I tried to create a short patch. Perhaps it became too short and this is better:

londo@peter:~/work/mojo> git diff
diff --git a/lib/Mojo/Exception.pm b/lib/Mojo/Exception.pm
index b218ee0..64a6899 100644
--- a/lib/Mojo/Exception.pm
+++ b/lib/Mojo/Exception.pm
@@ -20,7 +20,15 @@ sub inspect {

   # Search for context in files
   for my $file (@files) {
     next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
-    $self->_context($file->[1], [[<$handle>]]);
+    # If there are UTF-8 problems in the source file, don't store any context
+    my @lines = eval {

+      use warnings 'FATAL' => 'utf8';
+      <$handle>;

+    };
+    if ($@) {
+      next;
+    }
+    $self->_context($file->[1], [\@lines]);
     return $self;
   }


Now we're not ignoring any potentially unrelated errors from $self->_context(). I'll shut up for now. :-)

Peter
Reply all
Reply to author
Forward
0 new messages