Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Unicode roadmap?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 318 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Roman Hausner  
View profile  
 More options Jun 13 2006, 5:13 pm
Newsgroups: comp.lang.ruby
From: Roman Hausner <roman.haus...@gmail.com>
Date: Wed, 14 Jun 2006 06:13:03 +0900
Local: Tues, Jun 13 2006 5:13 pm
Subject: Unicode roadmap?
In my opinion, Ruby is practically useless for many applications without
proper Unicode support. How a modern language can ignore this issue is
really beyond me.

Is there a plan to get Unicode support into the language anytime soon?

--
Posted via http://www.ruby-forum.com/.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yukihiro Matsumoto  
View profile  
 More options Jun 13 2006, 6:25 pm
From: Yukihiro Matsumoto <m...@ruby-lang.org>
Date: Wed, 14 Jun 2006 07:25:20 +0900
Local: Tues, Jun 13 2006 6:25 pm
Subject: Re: Unicode roadmap?
Hi,

In message "Re: Unicode roadmap?"
    on Wed, 14 Jun 2006 06:13:03 +0900, Roman Hausner <roman.haus...@gmail.com> writes:
|In my opinion, Ruby is practically useless for many applications without
|proper Unicode support. How a modern language can ignore this issue is
|really beyond me.

Define "proper Unicode support" first.

|Is there a plan to get Unicode support into the language anytime soon?

I'm planning enhancing Unicode support in 1.9 in a year or so
(finally).  But I'm not sure that conforms your definition of "proper
Unicode support".  Note that 1.8 handles Unicode (UTF-8) if your
string operations are based on Regexp.

                                                        matz.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pete  
View profile  
 More options Jun 13 2006, 6:34 pm
From: Pete <pe...@gmx.org>
Date: Wed, 14 Jun 2006 07:34:59 +0900
Local: Tues, Jun 13 2006 6:34 pm
Subject: Re: Unicode roadmap?

> Define "proper Unicode support" first.

having an unicode-equivalent for all methods of class String

like size, slice, upcase

E.g. I tried the unicode plugin... but, alas, who want's to write stuff
like 'normalize_KC' etc. if you just want the frickin' substring of a string?!

you need to read books on unicode just to properly use the plugin...

aargg :-((

Best regards
Peter

Yukihiro Matsumoto schrieb:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Logan Capaldo  
View profile  
 More options Jun 13 2006, 6:48 pm
From: Logan Capaldo <logancapa...@gmail.com>
Date: Wed, 14 Jun 2006 07:48:36 +0900
Local: Tues, Jun 13 2006 6:48 pm
Subject: Re: Unicode roadmap?

On Jun 13, 2006, at 6:34 PM, Pete wrote:

>> Define "proper Unicode support" first.

> having an unicode-equivalent for all methods of class String

> like size, slice, upcase

> E.g. I tried the unicode plugin... but, alas, who want's to write  
> stuff like 'normalize_KC' etc. if you just want the frickin'  
> substring of a string?!

def substring(str, start, len)
   md = str.match(/\A.{#{start}}(.{#{len}})/)
   md[1]
end

def strlength(str)
   n = 0
   str.gsub(/./m) { n += 1; $& }
   n
end

See! Regexps do everything!

Just you know, set $KCODE and use these methods and you are set!

(I am kidding... btw)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pete  
View profile  
 More options Jun 13 2006, 6:58 pm
From: Pete <pe...@gmx.org>
Date: Wed, 14 Jun 2006 07:58:07 +0900
Local: Tues, Jun 13 2006 6:58 pm
Subject: Re: Unicode roadmap?
 From the theoretical point of view this is quite interesting. Also I
understand the humor :-)

Performance and memory consumption should be breathtaking using regexp
just everywhere...

Also there are a ____few____ methods left :-)

As I am German the 'missing' unicode support is one of the greatest
obstacles for me (and probably all other Germans doing their stuff
seriously)...

Logan Capaldo schrieb:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 13 2006, 7:11 pm
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 08:11:49 +0900
Local: Tues, Jun 13 2006 7:11 pm
Subject: Re: Unicode roadmap?
From: Pete [mailto:pe...@gmx.org]
Sent: Wednesday, June 14, 2006 1:58 AM

> As I am German the 'missing' unicode support is one of the greatest
> obstacles for me (and probably all other Germans doing their stuff
> seriously)...

The same is for Russians/Ukrainians. In our programming communities question
"does the programming language supports Unicode as 'native'?" has very high
priority.

/BTW, here is one of the things where Python beats Ruby completely

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Moore  
View profile  
 More options Jun 13 2006, 7:56 pm
From: "James Moore" <bans...@banshee.com>
Date: Wed, 14 Jun 2006 08:56:29 +0900
Local: Tues, Jun 13 2006 7:56 pm
Subject: Re: Unicode roadmap?
I suspect the Japanese posters on this list can answer better than I can,
but my impression is that Unicode is, shall we say, not highly thought of
outside Europe and North America.  The way they dealt with "Chinese"
characters was apparently more than a bit of a hack, and just doesn't work
very well in the real world.  Reading some of the explanations for glyphs
versus characters in Unicode just makes you shake your head.  What were they
thinking?  Sure doesn't pass the smell test, although I'll be the first to
admit I haven't exactly thought deeply about the subject.

There's another problem with Japanese - I've got a friend who's been dealing
with some issues around the fact that Japanese apparently innovates new
characters on a regular basis, and everyone is expected to use the new
characters.  (I believe this is called gaiji).  The concept of a fixed
character set apparently just isn't a good idea to start with.

[Awaiting corrections from people who actually know something about this
topic :-)...]

 - James Moore


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Balmain  
View profile  
 More options Jun 13 2006, 8:12 pm
From: "David Balmain" <dbalmain...@gmail.com>
Date: Wed, 14 Jun 2006 09:12:15 +0900
Local: Tues, Jun 13 2006 8:12 pm
Subject: Re: Unicode roadmap?
On 6/14/06, James Moore <bans...@banshee.com> wrote:

There is a good summary of the han unification controversy on wikipedia;

    http://en.wikipedia.org/wiki/Han_unification


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mat Schaffer  
View profile  
 More options Jun 13 2006, 9:15 pm
From: Mat Schaffer <scha...@gmail.com>
Date: Wed, 14 Jun 2006 10:15:58 +0900
Local: Tues, Jun 13 2006 9:15 pm
Subject: Re: Unicode roadmap?
On Jun 13, 2006, at 7:56 PM, James Moore wrote:

> There's another problem with Japanese - I've got a friend who's  
> been dealing
> with some issues around the fact that Japanese apparently innovates  
> new
> characters on a regular basis, and everyone is expected to use the new
> characters.  (I believe this is called gaiji).  The concept of a fixed
> character set apparently just isn't a good idea to start with.

> [Awaiting corrections from people who actually know something about  
> this
> topic :-)...]

I have one Japanese person here who's never heard of this gaiji  
concept.  But it could be new and behind a generation gap of some  
kind.  They do sure like to add symbols where they can, though.  
Especially graphical star characters.  I see that a lot.
-Mat

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yukihiro Matsumoto  
View profile  
 More options Jun 13 2006, 10:37 pm
From: Yukihiro Matsumoto <m...@ruby-lang.org>
Date: Wed, 14 Jun 2006 11:37:19 +0900
Local: Tues, Jun 13 2006 10:37 pm
Subject: Re: Unicode roadmap?
Hi,

In message "Re: Unicode roadmap?"
    on Wed, 14 Jun 2006 08:11:49 +0900, "Victor Shepelev" <vshepe...@imho.com.ua> writes:

|From: Pete [mailto:pe...@gmx.org]
|Sent: Wednesday, June 14, 2006 1:58 AM
|> As I am German the 'missing' unicode support is one of the greatest
|> obstacles for me (and probably all other Germans doing their stuff
|> seriously)...
|
|The same is for Russians/Ukrainians. In our programming communities question
|"does the programming language supports Unicode as 'native'?" has very high
|priority.

Alright, then what specific features are you (both) missing?  I don't
think it is a method to get number of characters in a string.  It
can't be THAT crucial.  I do want to cover "your missing features" in
the future M17N support in Ruby.

                                                        matz.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 1:26 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 14:26:30 +0900
Local: Wed, Jun 14 2006 1:26 am
Subject: Re: Unicode roadmap?
From: Yukihiro Matsumoto [mailto:m...@ruby-lang.org]
Sent: Wednesday, June 14, 2006 5:37 AM

I suppose, all we (non-English-writers) need is to have all string-related
methods working. Just for now, I think about plain testing each string
method; also, some other classes can be affected by Unicode (possibly
regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes are
not: File.open with Russian letters in path don't finds the file.

More generally, it can make sense to have Unicode as the "base" mode; where
non-Unicode to stay "old, compatibility" mode.

Something like this.

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pål Bergström  
View profile  
 More options Jun 14 2006, 1:55 am
Newsgroups: comp.lang.ruby
From: "Pål Bergström" <p...@palbergstrom.com>
Date: Wed, 14 Jun 2006 14:55:14 +0900
Local: Wed, Jun 14 2006 1:55 am
Subject: Re: Unicode roadmap?

Roman Hausner wrote:
> In my opinion, Ruby is practically useless for many applications without
> proper Unicode support. How a modern language can ignore this issue is
> really beyond me.

> Is there a plan to get Unicode support into the language anytime soon?

I also think that this is very important.

--
Posted via http://www.ruby-forum.com/.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yukihiro Matsumoto  
View profile  
 More options Jun 14 2006, 2:34 am
From: Yukihiro Matsumoto <m...@ruby-lang.org>
Date: Wed, 14 Jun 2006 15:34:43 +0900
Local: Wed, Jun 14 2006 2:34 am
Subject: Re: Unicode roadmap?
Hi,

In message "Re: Unicode roadmap?"
    on Wed, 14 Jun 2006 14:26:30 +0900, "Victor Shepelev" <vshepe...@imho.com.ua> writes:

|I suppose, all we (non-English-writers) need is to have all string-related
|methods working. Just for now, I think about plain testing each string
|method;

In that sense, _I_ am one of the non-English-writers, so that I can
suppose I know what we need.  And I have no problem with the current
UTF-8 support.  Maybe that's because Japanese don't have cases in our
characters.  Or maybe I'm missing something.  Can you show us your
concrete problems caused by Ruby's lack of "proper" Unicode support?

|also, some other classes can be affected by Unicode (possibly
|regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes are
|not: File.open with Russian letters in path don't finds the file.

Strange.  Ruby does not convert encoding, so that there should be no
problem opening files, if you are using strings in the encoding your OS
expect.  If they are differ, you have to specify (and convert) them
properly, no matter how Unicode support is.

                                                        matz.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 2:56 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 15:56:02 +0900
Local: Wed, Jun 14 2006 2:56 am
Subject: Re: Unicode roadmap?
From: Yukihiro Matsumoto [mailto:m...@ruby-lang.org]
Sent: Wednesday, June 14, 2006 9:35 AM

> Hi,

> In message "Re: Unicode roadmap?"
>     on Wed, 14 Jun 2006 14:26:30 +0900, "Victor Shepelev"
> <vshepe...@imho.com.ua> writes:

> |I suppose, all we (non-English-writers) need is to have all string-
> related
> |methods working. Just for now, I think about plain testing each string
> |method;

> In that sense, _I_ am one of the non-English-writers,

Sorry, Matz, I know, of course. But I know too less about Japanese to see
how close our tasks are. Under "non-English-writers" I, maybe, had to say
"European languages" or so - which has common punctuations, LTR writing,
"words" and "whitespaces" and so on. I have almost no knowledge about
Japanese, Korean, Arabic, Hebrew people needs.

> so that I can
> suppose I know what we need.  And I have no problem with the current
> UTF-8 support.  Maybe that's because Japanese don't have cases in our
> characters.  Or maybe I'm missing something.  

Just what I've said above.

> Can you show us your
> concrete problems caused by Ruby's lack of "proper" Unicode support?

As mentioned in this topic, it's String#length, upcase, downcase,
capitalize.

BTW, does String#length works good for you?

Moreover, there seems to be some huge problems with pathes having Russian
letters; but I'm really not convinced, if Ruby really has to handle this.

> |also, some other classes can be affected by Unicode (possibly
> |regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes
> are
> |not: File.open with Russian letters in path don't finds the file.

> Strange.  Ruby does not convert encoding, so that there should be no
> problem opening files, if you are using strings in the encoding your OS
> expect.  If they are differ, you have to specify (and convert) them
> properly, no matter how Unicode support is.

Oh, it's a bit hard theme for me. I know Windows XP must support Unicode
file names; I see my filenames in Russian, but I have low knowledge of
system internals to say, are they really Unicode?

If not take in account those problems, the only String problems remains, but
they are so base core methods!

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Glaesemann  
View profile  
 More options Jun 14 2006, 3:08 am
From: Michael Glaesemann <g...@seespotcode.net>
Date: Wed, 14 Jun 2006 16:08:20 +0900
Local: Wed, Jun 14 2006 3:08 am
Subject: Re: Unicode roadmap?

On Jun 14, 2006, at 15:56 , Victor Shepelev wrote:

> As mentioned in this topic, it's String#length, upcase, downcase,
> capitalize.

Just to chime in, aren't upcase, downcase, and capitalize a locale/
localization issue rather than a Unicode-only issue per se? For  
example, different languages will have different rules for  
capitalization. Or am I wrong? Does Unicode in and of itself address  
these issues?

Granted, proper support for upcase, downcase, and capitalize is  
important, but I think it's a separate issue, part of m17n as a whole  
rather than support for Unicode in particular.

Michael Glaesemann
grzm seespotcode net


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vincent Isambart  
View profile  
 More options Jun 14 2006, 3:14 am
From: "Vincent Isambart" <vincent.isamb...@gmail.com>
Date: Wed, 14 Jun 2006 16:14:23 +0900
Local: Wed, Jun 14 2006 3:14 am
Subject: Re: Unicode roadmap?
Hi,

> As mentioned in this topic, it's String#length, upcase, downcase,
> capitalize.

> BTW, does String#length works good for you?

To have the length of a Unicode string, just do str.split(//).length,
or "require 'jcode'" at the beginning of your code.
For the other functions, try looking at the unicode library
http://www.yoshidam.net/Ruby.html#unicode

> > |also, some other classes can be affected by Unicode (possibly
> > |regexps, and pathes). Regexps seems to work fine (in my 1.9), but pathes
> > are
> > |not: File.open with Russian letters in path don't finds the file.

> > Strange.  Ruby does not convert encoding, so that there should be no
> > problem opening files, if you are using strings in the encoding your OS
> > expect.  If they are differ, you have to specify (and convert) them
> > properly, no matter how Unicode support is.

> Oh, it's a bit hard theme for me. I know Windows XP must support Unicode
> file names; I see my filenames in Russian, but I have low knowledge of
> system internals to say, are they really Unicode?

Windows XP does support Unicode file names, but I'm not sure you can
use them with Ruby (I do not use Ruby much under Windows). Try
converting the file names to your current locale, it should work if
the file names can be converted to it. What I mean is that Russian
file names encoded in the Windows Russian encoding should work on a
Russian PC.

Hope this helps,

Cheers,
Vincent ISAMBART


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yukihiro Matsumoto  
View profile  
 More options Jun 14 2006, 3:20 am
From: Yukihiro Matsumoto <m...@ruby-lang.org>
Date: Wed, 14 Jun 2006 16:20:11 +0900
Local: Wed, Jun 14 2006 3:20 am
Subject: Re: Unicode roadmap?
Hi,

In message "Re: Unicode roadmap?"
    on Wed, 14 Jun 2006 15:56:02 +0900, "Victor Shepelev" <vshepe...@imho.com.ua> writes:

|> Can you show us your
|> concrete problems caused by Ruby's lack of "proper" Unicode support?
|
|As mentioned in this topic, it's String#length, upcase, downcase,
|capitalize.

OK. Case is the problem.  I understand.

|BTW, does String#length works good for you?

I don't remember the last time I needed length method to count
character numbers.  Actually I don't count string length at all both
in bytes and characters in my string processing.  Maybe this is a
special case.  I am too optimized for Ruby string operations using
Regexp.

|Oh, it's a bit hard theme for me. I know Windows XP must support Unicode
|file names; I see my filenames in Russian, but I have low knowledge of
|system internals to say, are they really Unicode?

Windows 32 path encoding is a nightmare.  Our Win32 maintainers often
troubled by unexpected OS behavior.  I am sure we _can_ handle Russian
path names, but we need help from Russian people to improve.

                                                        matz.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 3:21 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 16:21:53 +0900
Local: Wed, Jun 14 2006 3:21 am
Subject: Re: Unicode roadmap?
From: Vincent Isambart [mailto:vincent.isamb...@gmail.com]
Sent: Wednesday, June 14, 2006 10:14 AM

> > As mentioned in this topic, it's String#length, upcase, downcase,
> > capitalize.

> > BTW, does String#length works good for you?

> To have the length of a Unicode string, just do str.split(//).length,
> or "require 'jcode'" at the beginning of your code.
> For the other functions, try looking at the unicode library
> http://www.yoshidam.net/Ruby.html#unicode

I know about it. But, theoretically speaking, such a "core" methods muts be
in core. Not?

Yes, they works. But I can't solve the problem: need Ruby Unicode support
include filenames operations?

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 3:24 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 16:24:44 +0900
Local: Wed, Jun 14 2006 3:24 am
Subject: Re: Unicode roadmap?
From: Michael Glaesemann [mailto:g...@seespotcode.net]
Sent: Wednesday, June 14, 2006 10:08 AM

> On Jun 14, 2006, at 15:56 , Victor Shepelev wrote:

> > As mentioned in this topic, it's String#length, upcase, downcase,
> > capitalize.

> Just to chime in, aren't upcase, downcase, and capitalize a locale/
> localization issue rather than a Unicode-only issue per se? For
> example, different languages will have different rules for
> capitalization.

Really? I know about two cases: European capitalization and no
capitalization.

But, really, you maybe right. I suppose, Florian Gross can say something
about German-specific capitalization issues.

> Granted, proper support for upcase, downcase, and capitalize is
> important, but I think it's a separate issue, part of m17n as a whole
> rather than support for Unicode in particular.

Maybe. Generally, sometimes I want Unicode, and sometimes (for "quick dirty"
scripts) I'll prefer capitalization and regexps "just work" with
Windows-1251 (one-byte Russian encoding).

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 3:29 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 16:29:15 +0900
Local: Wed, Jun 14 2006 3:29 am
Subject: Re: Unicode roadmap?
From: Yukihiro Matsumoto [mailto:m...@ruby-lang.org]
Sent: Wednesday, June 14, 2006 10:20 AM

I can confirm. But I'm afraid that some libraries I rely on use #length and
can break when #length doesn't work.

> |Oh, it's a bit hard theme for me. I know Windows XP must support Unicode
> |file names; I see my filenames in Russian, but I have low knowledge of
> |system internals to say, are they really Unicode?

> Windows 32 path encoding is a nightmare.  Our Win32 maintainers often
> troubled by unexpected OS behavior.  I am sure we _can_ handle Russian
> path names, but we need help from Russian people to improve.

In Russian encoding (Win-1251) and on Russian PC all works well. In Unicode
it doesn't, but I'm not convinced it must.

In any case, I'm ready to spend my time helping Ruby community (especially
in Russian/Ukrainian localization issues), because I really love the
language.

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marcus Andersson  
View profile  
 More options Jun 14 2006, 3:42 am
From: Marcus Andersson <m-li...@bristav.se>
Date: Wed, 14 Jun 2006 16:42:18 +0900
Local: Wed, Jun 14 2006 3:42 am
Subject: Re: Unicode roadmap?
Yukihiro Matsumoto skrev:
> Hi,

> In message "Re: Unicode roadmap?"
>     on Wed, 14 Jun 2006 06:13:03 +0900, Roman Hausner <roman.haus...@gmail.com> writes:
> |In my opinion, Ruby is practically useless for many applications without
> |proper Unicode support. How a modern language can ignore this issue is
> |really beyond me.

> Define "proper Unicode support" first.

I won't define "proper Unicode support" here.

But there must be a problem somewhere since pure-ruby Ferret doesn't
support UTF-8. You need to use the c-extension of Ferret to have it
support UTF-8 (which doesn't work on Windows yet :( ). I don't know if
that is just a sucky impl of Ferret or if it's Ruby that make it so.

Maybe Dave Balmain can enlighten us why UTF-8 doesn't work in the pure
Ruby version and what is needed of Ruby to make it work (if it's
actually Ruby's fault that is)?

My personal belief is that it should just work in a case like this if
data in is UTF-8 and search strings is UTF-8 without the lib author
and/or user having to do anything very special to make it work (apart
from specifying encoding). Am I wrong in this?

Regards,

Marcus


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dmitry Severin  
View profile  
 More options Jun 14 2006, 4:19 am
From: "Dmitry Severin" <dmitry.seve...@gmail.com>
Date: Wed, 14 Jun 2006 17:19:44 +0900
Local: Wed, Jun 14 2006 4:19 am
Subject: Re: Unicode roadmap?

Almost all typical tasks on Unicode can be handled with UTF8 support in
Regexp,  Iconv, jcode and $KCODE=u, and unicode[1] library (as in
unicode_hack[2]) :)
(but case-insensitive regexp don't work for non ASCII chars in Ruby 1.8,
that can be probably solved using latest Oniguruma).

But if you're looking for deeper level of "Unicode support", e.g. as
described in Unicode FAQ[3], those problems  aren't about handling Unicode
per se, but are rather L10N/I18N problems, such as locale dependent text
breaks,collation, formatting etc.
To deal with them from Ruby  take look at somewhat broken wrappers to ICU
library icu4r[4], g11n[5] and Ruby/CLDR[6].

And if you want Unicode as default String encoding and want to use national
chars in names for your vars/functions/classes in Ruby code, I believe, it
will never happen. :)

Links:
[1] http://www.yoshidam.net/Ruby.html
[2] http://julik.textdriven.com/svn/tools/rails_plugins/unicode_hacks/
[3] http://www.unicode.org/faq/basic_q.html#13
[4] http://rubyforge.org/projects/icu4r
[5] http://rubyforge.org/projects/g11n
[6] http://www.yotabanana.com/hiki/ruby-cldr.html


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Hodel  
View profile  
 More options Jun 14 2006, 4:22 am
From: Eric Hodel <drbr...@segment7.net>
Date: Wed, 14 Jun 2006 17:22:21 +0900
Local: Wed, Jun 14 2006 4:22 am
Subject: Re: Unicode roadmap?
On Jun 13, 2006, at 10:26 PM, Victor Shepelev wrote:

> Regexps seems to work fine (in my 1.9), but pathes are
> not: File.open with Russian letters in path don't finds the file.

On OS X multibyte filenames work:

$ cat x.rb
$KCODE = 'u'

puts File.read('Cyrillic_Я.txt')
$ cat Cyrillic_\320\257.txt
test file with Я!
$ ruby x.rb
test file with Я!
$ uname -a
Darwin kaa.jijo.segment7.net 8.6.0 Darwin Kernel Version 8.6.0: Tue  
Mar  7 16:58:48 PST 2006; root:xnu-792.6.70.obj~1/RELEASE_PPC Power  
Macintosh powerpc
$ ruby -v
ruby 1.8.4 (2006-05-18) [powerpc-darwin8.6.0]
$

--
Eric Hodel - drbr...@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Victor Shepelev  
View profile  
 More options Jun 14 2006, 4:26 am
From: "Victor Shepelev" <vshepe...@imho.com.ua>
Date: Wed, 14 Jun 2006 17:26:58 +0900
Local: Wed, Jun 14 2006 4:26 am
Subject: Re: Unicode roadmap?
From: Dmitry Severin [mailto:dmitry.seve...@gmail.com]
Sent: Wednesday, June 14, 2006 11:20 AM

> To: ruby-talk ML
> Subject: Re: Unicode roadmap?

> Almost all typical tasks on Unicode can be handled with UTF8 support in
> Regexp,  Iconv, jcode and $KCODE=u, and unicode[1] library (as in
> unicode_hack[2]) :)
> (but case-insensitive regexp don't work for non ASCII chars in Ruby 1.8,
> that can be probably solved using latest Oniguruma).

> But if you're looking for deeper level of "Unicode support", e.g. as
> described in Unicode FAQ[3], those problems  aren't about handling Unicode
> per se, but are rather L10N/I18N problems, such as locale dependent text
> breaks,collation, formatting etc.
> To deal with them from Ruby  take look at somewhat broken wrappers to ICU
> library icu4r[4], g11n[5] and Ruby/CLDR[6].

Thanks Dmitry!

> And if you want Unicode as default String encoding and want to use
> national
> chars in names for your vars/functions/classes in Ruby code, I believe, it
> will never happen. :)

Hmmm.. I've think Unicode IS defaul String encoding when $KCODE=u
Not?

V.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dmitry Severin  
View profile  
 More options Jun 14 2006, 4:38 am
From: "Dmitry Severin" <dmitry.seve...@gmail.com>
Date: Wed, 14 Jun 2006 17:38:40 +0900
Local: Wed, Jun 14 2006 4:38 am
Subject: Re: Unicode roadmap?

On 6/14/06, Victor Shepelev <vshepe...@imho.com.ua> wrote:

> Hmmm.. I've think Unicode IS defaul String encoding when $KCODE=u
> Not?

No. Current String implementation has no notion of "encoding" (Ruby String
is just a sequence of bytes) and $KCODE is just a hint for methods to change
their behaviour (e.g. in Regexp) and treat those bytes as text represented
in some encoding.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 318   Newer >
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google