Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unicode in Ruby and a Ruby Reference

0 views
Skip to first unread message

Mike McGavin

unread,
Dec 14, 2004, 2:33:17 AM12/14/04
to
Hi everyone.

I'm making my second big attempt in getting into Ruby, and I have a
couple of questions. I hope they don't sound too trivial.

1. I was wondering what the state is of Ruby and support for Unicode?
For instance, I'm coming mostly from Python which has a special
Unicode type that can be translated to various encodings on request.
I can't seem to find anything similar in Ruby. Does it exist
anywhere, or is it standard to deal with Unicode in a completely
different way, or is it something that hasn't been developed at this
point?

2. What are the most definitive references for Ruby and the standard
libraries that are available? I've found the reference at RubyCentral
to be very helpful (http://www.rubycentral.com/ref/), but it also
seems to be missing things here and there. On the other hand, it's
possible that I'm completely mis-reading it.

For instance, I found out about the Singleton module completely by
chance while reading this group. It certainly appears to work in my
Ruby 1.8.1 interpreter, but I can't seem to find it formally described
anywhere. I know what I've seen of it, but I don't know what else it
might have to it. I also wonder about all of the other things I could
be missing out on.


Thanks in advance for the help. I really like Ruby as a language and
I hope I'll be able to use it for some things later on. I'm just
interested to find out if these things are still in early stages of
development, or if I'm simply missing things.

Thanks.
Mike.


Yukihiro Matsumoto

unread,
Dec 14, 2004, 2:49:44 AM12/14/04
to
Hi,

In message "Re: Unicode in Ruby and a Ruby Reference"


on Tue, 14 Dec 2004 16:33:17 +0900, Mike McGavin <iiz...@gmail.com> writes:

|1. I was wondering what the state is of Ruby and support for Unicode?
| For instance, I'm coming mostly from Python which has a special
|Unicode type that can be translated to various encodings on request.
|I can't seem to find anything similar in Ruby. Does it exist
|anywhere, or is it standard to deal with Unicode in a completely
|different way, or is it something that hasn't been developed at this
|point?

Handing Unicode (UTF-8) is OK. Ruby's strings can contain any
sequence of bytes. Regex engine is aware of UTF-8 so that you can
use pattern match against Unicode characters. For encoding
conversion, iconv library is your friend.

This is weaker than Python, but does most of the jobs. We are working
on M17N Ruby (M17N stands for multilingualization), in which you can
handle many encodings (e.g. UTF-8, UTF-16, Big5, GBK, and much more)
without conversion.

matz.


Gavin Sinclair

unread,
Dec 14, 2004, 3:18:45 AM12/14/04
to
On Tuesday, December 14, 2004, 6:33:17 PM, Mike wrote:

> 2. What are the most definitive references for Ruby and the standard
> libraries that are available? I've found the reference at RubyCentral
> to be very helpful (http://www.rubycentral.com/ref/), but it also
> seems to be missing things here and there. On the other hand, it's
> possible that I'm completely mis-reading it.

What you are reading online is "Programming Ruby, 1ed", a book by Dave
Thomas and Andy Hunt. The second edition hit the shelves recently but
there's no online version. It's a purchase you won't regret, and it
describes all the standard libraries by example, and all the builtin
classes in detail (up to date with the latest Ruby).

Information about the standard library is also housed at

http://ruby-doc.org/stdlib

Cheers,
Gavin

Florian Gross

unread,
Dec 14, 2004, 4:35:38 AM12/14/04
to
Yukihiro Matsumoto wrote:

> In message "Re: Unicode in Ruby and a Ruby Reference"
> on Tue, 14 Dec 2004 16:33:17 +0900, Mike McGavin <iiz...@gmail.com> writes:
>
> |1. I was wondering what the state is of Ruby and support for Unicode?
> | For instance, I'm coming mostly from Python which has a special
> |Unicode type that can be translated to various encodings on request.
> |I can't seem to find anything similar in Ruby. Does it exist
> |anywhere, or is it standard to deal with Unicode in a completely
> |different way, or is it something that hasn't been developed at this
> |point?
>
> Handing Unicode (UTF-8) is OK. Ruby's strings can contain any
> sequence of bytes. Regex engine is aware of UTF-8 so that you can
> use pattern match against Unicode characters. For encoding
> conversion, iconv library is your friend.

However I think that this awareness is just where a code point begins
and ends. This might have changed with Onigurama, but "Ä"[/ä/i] used to
return nil.

Yukihiro Matsumoto

unread,
Dec 14, 2004, 5:16:55 AM12/14/04
to
Hi,

In message "Re: Unicode in Ruby and a Ruby Reference"

on Tue, 14 Dec 2004 18:37:21 +0900, Florian Gross <fl...@ccan.de> writes:

|However I think that this awareness is just where a code point begins
|and ends. This might have changed with Onigurama, but "Ä"[/ä/i] used to
|return nil.

Onigurama should aware of it, although I found a bug there.
I will fix soon. Thank you.

matz.


Austin Ziegler

unread,
Dec 14, 2004, 11:04:35 AM12/14/04
to
On Tue, 14 Dec 2004 19:12:18 +0900, Giulio Piancastelli
<giulio.pi...@gmail.com> wrote:
> How a literal Unicode character can be inserted in a Ruby String? I
> recall Java having the \uNNNN escaping, for example, but I wasn't able
> to find a similar mechanism for Ruby. (On the other hand, I'm aware of
> escaping for octal and hex character codes, e.g. \NNN and \xNN.)

\u4321 is a UTF-16BE encoding, so you would need to know the
equivalent UTF-8 encoding, e.g., \xe4\x8c\xa1.

-austin
--
Austin Ziegler * halos...@gmail.com
* Alternate: aus...@halostatue.ca


Mohammad Khan

unread,
Dec 14, 2004, 3:22:01 PM12/14/04
to

You can also buy the PDF version of this book from:
http://pragmaticprogrammer.com/shopsite_sc/store/html/index.html
which will cost $25.00, I think.

Thanks,
MOhammad

--

[mkhan@localhost local]$ make love
make: *** No rule to make target `love'. Stop.


Mike McGavin

unread,
Dec 14, 2004, 6:56:03 PM12/14/04
to
Hi again.

On Tue, 14 Dec 2004 20:33:14 +1300, Mike McGavin <iiz...@gmail.com> wrote:

> I'm making my second big attempt in getting into Ruby, and I have a
> couple of questions.

> [--snip--]

I just wanted to say thanks for all of the feedback from everyone
following my questions about the Ruby reference documentation and the
unicode questions. It's been very helpful, and I'll continue to
monitor the thread.

Thanks.
Mike.


Alexander Kellett

unread,
Dec 15, 2004, 4:04:49 AM12/15/04
to
just fyi:
http://www.rubycentral.com/book/lib_patterns.html
the patterns and standard lib sections contain some pretty nifty stuff.
also. "ri" totally rocks :)
Alex

Giulio Piancastelli

unread,
Dec 14, 2004, 5:07:31 AM12/14/04
to
How a literal Unicode character can be inserted in a Ruby String? I
recall Java having the \uNNNN escaping, for example, but I wasn't able
to find a similar mechanism for Ruby. (On the other hand, I'm aware of
escaping for octal and hex character codes, e.g. \NNN and \xNN.)
--
G.P.

0 new messages