----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding
http://redmine.ruby-lang.org/issues/5486
Author: Nikolai Weibull
Status: Open
Priority: High
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.
If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.
I suspect that this is an issue that may appear in various other functions as well.
Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?
> Sorry, I can't understand your point.
> If you think there is a bug, would you show us the bug by code?
That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem. What I did was accidentally
create a file whose name was encoded in UTF-16. Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file. e.file? will
return false for this file, even though it’s a file. The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.
Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily
% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb
# -*- coding: utf-8 -*-
Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
^D
% ruby --version
ruby 2.0.0dev (2011-10-26 trunk 33526) [x86_64-darwin10.8.0]
% ruby a.rb
".", #<Encoding:UTF-8>, false
"..", #<Encoding:UTF-8>, false
"å", #<Encoding:UTF-8>, false
I guess the problem is that Ruby assumes that it can apply an encoding
to something that it gets from the filesystem when it would probably
be better to not do so. It should probably be BINARY or ASCII-8BIT
instead of UTF-8.
(It turns out that this example gave the same results in 1.8.7 (minus
the e.encoding), so perhaps I’m doing something else wrong.)
Trying to do
p File.file?('t/å'.encode('UTF-16LE'))
results in
in `file?': path name must be ASCII-compatible (UTF-16LE): "t/\u00E5"
(Encoding::CompatibilityError)
I give up.
(11/10/28 15:35), Nikolai Weibull wrote:
> Actually, it’s probably easier than that. It can be done on a HFS+
> filesystem (and probably any other, as well) just as easily
It's not true.
> % echo $LC_CTYPE
> UTF-8
> % mkdir t
> % touch t/å
> % cat > a.rb
> # -*- coding: utf-8 -*-
> Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
> File.file?(e) }
> ^D
`e' doesn't have directory prefix, "t/". It can't stat.
$ ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e, e.encoding, File.file?(e)}'
ruby 2.0.0dev (2011-10-25 trunk 33523) [universal.x86_64-darwin11.2.0]
".", #<Encoding:UTF-8>, false
"..", #<Encoding:UTF-8>, false
"å", #<Encoding:UTF-8>, true
--
Nobu Nakada
> (11/10/28 15:35), Nikolai Weibull wrote:
>> Actually, it’s probably easier than that. It can be done on a HFS+
>> filesystem (and probably any other, as well) just as easily
>
> It's not true.
>
>> % echo $LC_CTYPE
>> UTF-8
>> % mkdir t
>> % touch t/å
>> % cat > a.rb
>> # -*- coding: utf-8 -*-
>> Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
>> File.file?(e) }
>> ^D
>
> `e' doesn't have directory prefix, "t/". It can't stat.
Ouch, of course. How stupid of me. That explains why it didn’t work
under 1.8.7 either.
The point still remains valid on Windows, however:
% mkdir t
% touch t/→
% ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e,
e.encoding, File.file?(e)}'
ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
".", #<Encoding:Windows-1252>, false
"..", #<Encoding:Windows-1252>, false
"?", #<Encoding:Windows-1252>, false
Hm, I guess here the result of Dir.foreach is broken.
Here’s another case:
% ruby -v -rfind -e 'Find.find("t").each{ |e| printf "%p, %s, %p,
%p\n", e, e.dump, e.encoding, File.file?(e)}'
"t", "t", #<Encoding:UTF-8>, false
"t/?", "t/?", #<Encoding:ASCII-8BIT>, false
Equally broken, I guess.
Status changed from Open to Assigned
Assignee set to Nobuyoshi Nakada
----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding
https://bugs.ruby-lang.org/issues/5486
Author: Nikolai Weibull
Status: Assigned
Priority: High
Assignee: Nobuyoshi Nakada
Category: core
Target version:
ruby -v: -
rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.
If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.
I suspect that this is an issue that may appear in various other functions as well.
Category changed from core to M17N
Status changed from Assigned to Feedback
Priority changed from High to Low
Does this issue still occur?
----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding
https://bugs.ruby-lang.org/issues/5486
Author: Nikolai Weibull
Status: Feedback
Priority: Low
Assignee: Nobuyoshi Nakada
Category: M17N
Target version:
ruby -v: -
rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.
If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.
I suspect that this is an issue that may appear in various other functions as well.
Yes, it still occurs against trunk:
In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
on Mar.13,2012 18:03:04, <n...@bitwi.se> wrote:
> Yes, it still occurs against trunk:
>
> ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
It's not trunk...
It seems too old.
Regards,
--
U.Nakamura <u...@garbagecollect.jp>
How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.
hi Nikolai,
try compiling an updated version of trunk from this repository:
svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3
your version indicates it's from last year. here's a version string from
a recent compilation on my system:
% ./ruby --version
ruby 1.9.3p163 (2012-03-14 revision 35012) [x86_64-darwin11.3.0]
does that help at all?
It is ruby_1_9_3 branch, not trunk.
For trunk,
svn co http://svn.ruby-lang.org/repos/ruby/trunk
--
NARUSE, Yui <nar...@airemix.jp>
Argh, sorry. I ran the test with the incorrect PATH, after all. Yes,
this issue has been resolved. You can close it.
Status changed from Feedback to Closed
----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding
https://bugs.ruby-lang.org/issues/5486#change-24600
Author: Nikolai Weibull
Status: Closed
Priority: Low
Assignee: Nobuyoshi Nakada
Category: M17N
Target version:
ruby -v: -
rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.
If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.
I suspect that this is an issue that may appear in various other functions as well.