= Abstract
ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.
Project Site: http://rubyforge.org/projects/icu4r/
Download: http://rubyforge.org/frs/download.php/8116/icu4r-0.1.0.tar.gz
RDoc: http://icu4r.rubyforge.org/
= Install Notes
To build ICU4R you'll need GCC and ICU v3.4 libraries, which can be
downloaded from
http://ibm.com/software/globalization/icu/downloads.jsp
Build and install:
ruby extconf.rb && make && make check && make install
= Features
ICU4R is Ruby C-extension binding for ICU library.
It is NOT mirroring full ICU object hierarchy, but is rather set of simple
interfaces for some practically useful functionality, and provides:
- UString : String-like class with internal UTF16 storage;
- UCA rules for UString comparisons (<=>, casecmp);
- Unicode regular expressions;
- encoding(codepage) conversion;
- Unicode normalization;
- access to resource bundles, including ICU locale data;
- transliteration, also rule-based;
Bunch of locale-sensitive functions:
- upcase/downcase;
- string collation;
- string search;
- iterators over text line/word/char/sentence breaks;
- message formatting (number/currency/string/time);
- date and number parsing.
== DISCLAIMER ==
The code is slow and inefficient yet, can have many security and memory leaks,
bugs, inconsistent documentation, incomplete test suite. Use it at
your own risk.
Critics, bug reports, feature requests are welcome :)
WBR, Nikolai Lugovoi <meadow...@gmail.com>
Thanks, this is really interesting - not heard of the ICU library before.
There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.
But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:
SCIPIUS:~/installers/ruby/icu4r alex$ make
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
ustring.c: In function `icu_ustr_new_set':
ustring.c:169: warning: assignment discards qualifiers from pointer target type
ustring.c: In function `icu_reg_get_replacement':
ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c: In function `icu_ustr_substr':
ustring.c:2296: warning: unused variable `n'
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
ld: multiple definitions of symbol _rb_cUString
ustring.o definition of _rb_cUString in section (__DATA,__common)
fmt.o definition of _rb_cUString in section (__DATA,__common)
make: *** [ustring.bundle] Error 1
SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
HTH
alex
Could you try 0.1.1 release ?
http://rubyforge.org/frs/download.php/8168/icu4r-0.1.1.tar.gz
(Sorry for late response)
> Could you try 0.1.1 release ?
> http://rubyforge.org/frs/download.php/8168/icu4r-0.1.1.tar.gz
thanks for this, it compiles fine on OS X 10.3 (see below), but segfaults when I run the ruby test with
dyld: ruby Undefined symbols:
___gxx_personality_v0
Trace/BPT trap
Let's take it off-list unless this rings any bells for anyone
alex
SCIPIUS:~/icu4r alex$ make clean; make; ruby test/test_ustring.rb
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
Usually C++ code compiled by different versions of gcc linked together.
Check that all the stuff and the libraries it links with are compiled
with the same gcc.
Thanks
Michal