[webgen-users] Problems with non-USASCII file names

44 views
Skip to first unread message

Mikhail Yakshin

unread,
Oct 7, 2013, 11:27:52 AM10/7/13
to webgen...@rubyforge.org
Hello,

I'd like to report that using non-USASCII file names for .page file results in:

webgen encountered a problem:
invalid byte sequence in US-ASCII

It fails on File.fnmatch here:

def self.matches_pattern?(path, pattern, options =
File::FNM_DOTMATCH|File::FNM_CASEFOLD|File::FNM_PATHNAME)
pattern += '/' if path =~ /\/$/ && pattern !~ /\/$|^$/
(path.to_s.include?('#') ? pattern.include?('#') : true) &&
File.fnmatch(pattern, path, options)
end

--
WBR, Mikhail Yakshin
_______________________________________________
webgen-users mailing list
webgen...@rubyforge.org
http://rubyforge.org/mailman/listinfo/webgen-users

Thomas Leitner

unread,
Oct 7, 2013, 3:16:25 PM10/7/13
to webgen...@rubyforge.org
Hi,

On 2013-10-07 19:27 +0400 Mikhail Yakshin wrote:
> I'd like to report that using non-USASCII file names for .page file
> results in:
>
> webgen encountered a problem:
> invalid byte sequence in US-ASCII

could you please post the sample filename? I have tried with the
filename "hällöchen.page" which contains UTF-8 characters and it works
just fine.

What is the external encoding of your platform? You can find this out
by running

ruby -e 'puts Encoding.default_external'

I have also found the Ruby bug report
https://bugs.ruby-lang.org/issues/7911 which might be the reason.

Best regards,
Thomas

Mikhail Yakshin

unread,
Oct 7, 2013, 4:57:42 PM10/7/13
to webgen...@rubyforge.org
Hi,

>> I'd like to report that using non-USASCII file names for .page file
>> results in:
>>
>> webgen encountered a problem:
>> invalid byte sequence in US-ASCII
>
> could you please post the sample filename? I have tried with the
> filename "hällöchen.page" which contains UTF-8 characters and it works
> just fine.

I've attached a micro test case. "trophée" yields this error, although
"hällöchen" yields it too for me.

Tried on two Debian boxes with different ruby versions:

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-linux]

> What is the external encoding of your platform? You can find this out
> by running
>
> ruby -e 'puts Encoding.default_external'

It's UTF-8.

> I have also found the Ruby bug report
> https://bugs.ruby-lang.org/issues/7911 which might be the reason.

It might be the case. If I understand correctly, it's only fixed in
Ruby 2.0.x. If that's the problem, may be you should mention this in
requirements?

--
WBR, Mikhail Yakshin
webgen-bad-ascii-site.tar.bz2

Thomas Leitner

unread,
Oct 10, 2013, 3:08:16 PM10/10/13
to webgen...@rubyforge.org
On 2013-10-08 00:57 +0400 Mikhail Yakshin wrote:
> It might be the case. If I understand correctly, it's only fixed in
> Ruby 2.0.x. If that's the problem, may be you should mention this in
> requirements?

I have fixed this now by encoding the result of Path.append so that
this works on 1.9.3, too.

-- Thomas

Mikhail Yakshin

unread,
Oct 10, 2013, 9:28:03 PM10/10/13
to webgen...@rubyforge.org
Hi,

>> It might be the case. If I understand correctly, it's only fixed in
>> Ruby 2.0.x. If that's the problem, may be you should mention this in
>> requirements?
>
> I have fixed this now by encoding the result of Path.append so that
> this works on 1.9.3, too.

Thanks! Hope to see this fix on github :)

--
WBR, Mikhail Yakshin

Thomas Leitner

unread,
Oct 11, 2013, 3:23:16 AM10/11/13
to webgen...@rubyforge.org
On 2013-10-11 05:28 +0400 Mikhail Yakshin wrote:
> >> It might be the case. If I understand correctly, it's only fixed in
> >> Ruby 2.0.x. If that's the problem, may be you should mention this
> >> in requirements?
> >
> > I have fixed this now by encoding the result of Path.append so that
> > this works on 1.9.3, too.
>
> Thanks! Hope to see this fix on github :)

Done!

-- Thomas
Reply all
Reply to author
Forward
0 new messages