iconv problems with different machines

Raymond O'Connor

unread,

Dec 5, 2007, 5:25:16 AM12/5/07

to

Hi,

I have the following piece of code:

ic = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
puts ic.iconv("Aüthor")

1. on my local machine (OSX 10.5) when I run this, I get the output:
A"uthor

2. when I run this same code on my debian server (via rake executed
through a capistrano task) I get the output: A?thor

3. when I run this same code on my debian server (via irb), I get:
Author

Both 1 and 3 are acceptable output to me, however I cant figure out how
to get my program to output the correct result on my server when I run
it through a capistrano task. Is there some environment variable I need
to set? From reading other posts, I've tried adding at the top of my
file:
$KCODE = "u"
require 'jcode'
ENV['LANG'] = 'en_US.UTF-8'
ENV['LC_CTYPE'] = 'en_US.UTF-8'

still doesn't fix the issue. Any help would be greatly appreciated.

Thanks,
Ray
--
Posted via http://www.ruby-forum.com/.

Raymond O'Connor

unread,

Dec 5, 2007, 6:14:24 AM12/5/07

to

Actually I found some other posts about this same issue from awhile
ago... Appears there's no solution.

I stopped using the iconv library and instead switched to the iconv
system command and that seems to work. Not the best solution, but at
least it works....

Xavier Noria

unread,

Dec 5, 2007, 6:27:14 AM12/5/07

to

On Dec 5, 2007, at 12:14 PM, Raymond O'Connor wrote:

> Actually I found some other posts about this same issue from awhile
> ago... Appears there's no solution.
>
> I stopped using the iconv library and instead switched to the iconv
> system command and that seems to work. Not the best solution, but at
> least it works....

I have not been able to understand where is exactly the difference,
but looks like depending on the system/version/something the
transliteration tables are just different. At ASPgems we wrote this
hand-crafted normalizer which we know is portable for sure (note that
it uses Rails #chars and does a bit more stuff, but you see the idea):

def self.normalize(str)
return '' if str.nil?
n = str.chars.downcase.strip.to_s
n.gsub!(/[àáâãäåāă]/, 'a')
n.gsub!(/æ/, 'ae')
n.gsub!(/[ďđ]/, 'd')
n.gsub!(/[çćčĉċ]/, 'c')
n.gsub!(/[èéêëēęěĕė]/, 'e')
n.gsub!(/ƒ/, 'f')
n.gsub!(/[ĝğġģ]/, 'g')
n.gsub!(/[ĥħ]/, 'h')
n.gsub!(/[ììíîïīĩĭ]/, 'i')
n.gsub!(/[įıĳĵ]/, 'j')
n.gsub!(/[ķĸ]/, 'k')
n.gsub!(/[łľĺļŀ]/, 'l')
n.gsub!(/[ñńňņŉŋ]/, 'n')
n.gsub!(/[òóôõöøōőŏŏ]/, 'o')
n.gsub!(/œ/, 'oe')
n.gsub!(/ą/, 'q')
n.gsub!(/[ŕřŗ]/, 'r')
n.gsub!(/[śšşŝș]/, 's')
n.gsub!(/[ťţŧț]/, 't')
n.gsub!(/[ùúûüūůűŭũų]/, 'u')
n.gsub!(/ŵ/, 'w')
n.gsub!(/[ýÿŷ]/, 'y')
n.gsub!(/[žżź]/, 'z')
n.gsub!(/\s+/, ' ')
n.delete!('^ a-z0-9_/\\-')
n
end

-- fxn

Raymond O'Connor

unread,

Dec 5, 2007, 3:05:58 PM12/5/07

to

Hi Xavier,

I like that solution even better. Thanks for sharing!

Best,
Ray

marc

unread,

Dec 5, 2007, 4:43:15 PM12/5/07

to

Raymond O'Connor said...

I've found a lot of bugs with the MRI Iconv and now only use it with
JRuby - which, I suspect, uses the Java SE convertors.

--
Cheers,
Marc