Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Individual char values in a Unicode string

2 views
Skip to first unread message

Tim Bray

unread,
Sep 2, 2006, 12:45:28 AM9/2/06
to
I'm trying to figure out how to use [] String or jconv or something
to figure out the actual code-point values in a Unicode/UTF-8
string. For example, how can I write f such that

f('tö中') ==> [ 0x74, 0xf6, 0x4e2d ]

(hex just for clarity of course, I want numbers).

-Tim

Paul Lutus

unread,
Sep 2, 2006, 1:07:29 AM9/2/06
to
Tim Bray wrote:

Hex numbers are numbers. :)

To answer your question, you can extract bytes from a string:

#!/usr/bin/ruby

s = "this is a test"

i = 0
while (i < s.size)
puts s[i] # emits numbers, not characters
i += 1
end

Bu I don't think Ruby recognizes characters, Unicode or otherwise. So it may
not be able to interpret a mixture of Unicode and UTF/8 without explicit
code from the programmer.

--
Paul Lutus
http://www.arachnoid.com

Daniel Berger

unread,
Sep 2, 2006, 1:39:51 AM9/2/06
to

'tö中'.unpack("U*") => [116, 246, 20013]

Regards,

Dan

0 new messages