sub xml_escape {
return $_[0] if ref $_[0] && ref $_[0] eq 'Mojo::ByteStream';
my $str = shift // '';
$str =~ s/([&<>"'])/$XML{$1}/ge;
return $str;
}
It seems to be that the reading of non-UTF-8 data causes a warning, but the later s/// causes a fatal exception. The behavior I see is replicated by this snippet:
#!/usr/bin/perl -w
use strict;
open my $handle, '<:utf8', "binary.bin";
my $str = join('', <$handle>);
$str =~ s/([&<>"'])/foo/;
I'd like to suggest that reading a source file with UTF-8 problems be treated as if the file was unreadable. To that end, I suggest this patch (against current master HEAD 49dd3e7):
londo@peter:~/work/mojo> git diff
diff --git a/lib/Mojo/Exception.pm b/lib/Mojo/Exception.pm
index b218ee0..08759f5 100644
--- a/lib/Mojo/Exception.pm
+++ b/lib/Mojo/Exception.pm
@@ -20,7 +20,14 @@ sub inspect {
# Search for context in files
for my $file (@files) {
next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
- $self->_context($file->[1], [[<$handle>]]);
+ # If there are UTF-8 problems in the source file, don't store any context
+ eval {
+ use warnings 'FATAL' => 'utf8';
+ $self->_context($file->[1], [[<$handle>]]);
+ };
+ if ($@) {
+ next;
+ }
return $self;
}
For ASCII or UTF-8 encoded source files there is no change. For files with UTF-8 problems, they are simply not stored as _context(). But they also don't cause the entire application to crash. The best of both worlds.
Would that be acceptable? If so, should I just create a github PR? If not, can somebody suggest an alternative given that some source files are in fact not UTF-8?