[ruby-core:40412] [ruby-trunk - Bug #5486][Open] rb_stat() doesn’t respect input encoding

3 views
Skip to first unread message

Nikolai Weibull

unread,
Oct 26, 2011, 9:00:14 AM10/26/11
to ruby...@ruby-lang.org

Issue #5486 has been reported by Nikolai Weibull.

----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding
http://redmine.ruby-lang.org/issues/5486

Author: Nikolai Weibull
Status: Open
Priority: High
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]


rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.


--
http://redmine.ruby-lang.org

Usaku NAKAMURA

unread,
Oct 28, 2011, 1:28:51 AM10/28/11
to ruby...@ruby-lang.org

Issue #5486 has been updated by Usaku NAKAMURA.


Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

Nikolai Weibull

unread,
Oct 28, 2011, 2:15:00 AM10/28/11
to ruby...@ruby-lang.org
On Fri, Oct 28, 2011 at 07:28, Usaku NAKAMURA <red...@ruby-lang.org> wrote:

> Sorry, I can't understand your point.
> If you think there is a bug, would you show us the bug by code?

That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem. What I did was accidentally
create a file whose name was encoded in UTF-16. Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file. e.file? will
return false for this file, even though it’s a file. The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.

Nikolai Weibull

unread,
Oct 28, 2011, 2:35:55 AM10/28/11
to ruby...@ruby-lang.org

Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb
# -*- coding: utf-8 -*-
Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
^D
% ruby --version
ruby 2.0.0dev (2011-10-26 trunk 33526) [x86_64-darwin10.8.0]
% ruby a.rb
".", #<Encoding:UTF-8>, false
"..", #<Encoding:UTF-8>, false
"å", #<Encoding:UTF-8>, false

I guess the problem is that Ruby assumes that it can apply an encoding
to something that it gets from the filesystem when it would probably
be better to not do so. It should probably be BINARY or ASCII-8BIT
instead of UTF-8.

(It turns out that this example gave the same results in 1.8.7 (minus
the e.encoding), so perhaps I’m doing something else wrong.)

Trying to do

p File.file?('t/å'.encode('UTF-16LE'))

results in

in `file?': path name must be ASCII-compatible (UTF-16LE): "t/\u00E5"
(Encoding::CompatibilityError)

I give up.

Nobuyoshi Nakada

unread,
Oct 28, 2011, 3:20:59 AM10/28/11
to ruby...@ruby-lang.org
Hi,

(11/10/28 15:35), Nikolai Weibull wrote:
> Actually, it’s probably easier than that. It can be done on a HFS+
> filesystem (and probably any other, as well) just as easily

It's not true.

> % echo $LC_CTYPE
> UTF-8
> % mkdir t
> % touch t/å
> % cat > a.rb
> # -*- coding: utf-8 -*-
> Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
> File.file?(e) }
> ^D

`e' doesn't have directory prefix, "t/". It can't stat.

$ ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e, e.encoding, File.file?(e)}'
ruby 2.0.0dev (2011-10-25 trunk 33523) [universal.x86_64-darwin11.2.0]


".", #<Encoding:UTF-8>, false
"..", #<Encoding:UTF-8>, false

"å", #<Encoding:UTF-8>, true

--
Nobu Nakada

Nikolai Weibull

unread,
Oct 28, 2011, 6:09:58 AM10/28/11
to ruby...@ruby-lang.org
On Fri, Oct 28, 2011 at 09:20, Nobuyoshi Nakada <no...@ruby-lang.org> wrote:

> (11/10/28 15:35), Nikolai Weibull wrote:
>> Actually, it’s probably easier than that.  It can be done on a HFS+
>> filesystem (and probably any other, as well) just as easily
>
> It's not true.
>
>> % echo $LC_CTYPE
>> UTF-8
>> % mkdir t
>> % touch t/å
>> % cat > a.rb
>> # -*- coding: utf-8 -*-
>> Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
>> File.file?(e) }
>> ^D
>
> `e' doesn't have directory prefix, "t/".  It can't stat.

Ouch, of course. How stupid of me. That explains why it didn’t work
under 1.8.7 either.

The point still remains valid on Windows, however:

% mkdir t
% touch t/→
% ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e,
e.encoding, File.file?(e)}'


ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

".", #<Encoding:Windows-1252>, false
"..", #<Encoding:Windows-1252>, false
"?", #<Encoding:Windows-1252>, false

Hm, I guess here the result of Dir.foreach is broken.

Here’s another case:

% ruby -v -rfind -e 'Find.find("t").each{ |e| printf "%p, %s, %p,
%p\n", e, e.dump, e.encoding, File.file?(e)}'
"t", "t", #<Encoding:UTF-8>, false
"t/?", "t/?", #<Encoding:ASCII-8BIT>, false

Equally broken, I guess.

Koichi Sasada

unread,
Mar 11, 2012, 3:27:13 AM3/11/12
to ruby...@ruby-lang.org

Issue #5486 has been updated by Koichi Sasada.

Status changed from Open to Assigned
Assignee set to Nobuyoshi Nakada


----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding

https://bugs.ruby-lang.org/issues/5486

Author: Nikolai Weibull
Status: Assigned
Priority: High
Assignee: Nobuyoshi Nakada
Category: core
Target version:
ruby -v: -


rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.


--
http://bugs.ruby-lang.org/

Nobuyoshi Nakada

unread,
Mar 11, 2012, 5:41:30 PM3/11/12
to ruby...@ruby-lang.org

Issue #5486 has been updated by Nobuyoshi Nakada.

Category changed from core to M17N
Status changed from Assigned to Feedback
Priority changed from High to Low

Does this issue still occur?


----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding

https://bugs.ruby-lang.org/issues/5486

Author: Nikolai Weibull
Status: Feedback
Priority: Low
Assignee: Nobuyoshi Nakada
Category: M17N


Target version:
ruby -v: -

rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.


--
http://bugs.ruby-lang.org/

Nikolai Weibull

unread,
Mar 13, 2012, 5:03:04 AM3/13/12
to ruby...@ruby-lang.org
On Sun, Mar 11, 2012 at 22:41, Nobuyoshi Nakada <no...@ruby-lang.org> wrote:
>
> Issue #5486 has been updated by Nobuyoshi Nakada.
>
> Category changed from core to M17N
> Status changed from Assigned to Feedback
> Priority changed from High to Low
>
> Does this issue still occur?

Yes, it still occurs against trunk:

U.Nakamura

unread,
Mar 14, 2012, 9:49:25 PM3/14/12
to ruby...@ruby-lang.org
Hello,

In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"


on Mar.13,2012 18:03:04, <n...@bitwi.se> wrote:
> Yes, it still occurs against trunk:
>
> ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

It's not trunk...
It seems too old.


Regards,
--
U.Nakamura <u...@garbagecollect.jp>


Nikolai Weibull

unread,
Mar 15, 2012, 12:24:04 AM3/15/12
to ruby...@ruby-lang.org
2012/3/15 U.Nakamura <u...@garbagecollect.jp>:

> Hello,
>
> In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
>    on Mar.13,2012 18:03:04, <n...@bitwi.se> wrote:
>> Yes, it still occurs against trunk:
>>
>> ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
>
> It's not trunk...
> It seems too old.

How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.

Trevor Wennblom

unread,
Mar 15, 2012, 1:13:55 AM3/15/12
to ruby...@ruby-lang.org

hi Nikolai,

try compiling an updated version of trunk from this repository:
svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

your version indicates it's from last year. here's a version string from
a recent compilation on my system:
% ./ruby --version
ruby 1.9.3p163 (2012-03-14 revision 35012) [x86_64-darwin11.3.0]

does that help at all?

NARUSE, Yui

unread,
Mar 15, 2012, 1:57:29 AM3/15/12
to ruby...@ruby-lang.org
2012/3/15 Trevor Wennblom <tre...@well.com>:

> try compiling an updated version of trunk from this repository:
>  svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

It is ruby_1_9_3 branch, not trunk.
For trunk,
svn co http://svn.ruby-lang.org/repos/ruby/trunk

--
NARUSE, Yui  <nar...@airemix.jp>

Nikolai Weibull

unread,
Mar 15, 2012, 3:33:16 AM3/15/12
to ruby...@ruby-lang.org

Argh, sorry. I ran the test with the incorrect PATH, after all. Yes,
this issue has been resolved. You can close it.

Yui NARUSE

unread,
Mar 15, 2012, 3:43:08 AM3/15/12
to ruby...@ruby-lang.org

Issue #5486 has been updated by Yui NARUSE.

Status changed from Feedback to Closed


----------------------------------------
Bug #5486: rb_stat() doesn’t respect input encoding

https://bugs.ruby-lang.org/issues/5486#change-24600

Author: Nikolai Weibull
Status: Closed


Priority: Low
Assignee: Nobuyoshi Nakada
Category: M17N
Target version:
ruby -v: -

rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.


--
http://bugs.ruby-lang.org/

Reply all
Reply to author
Forward
0 new messages