French sentences appearing weird in Rails Website

19 views
Skip to first unread message

UA

unread,
May 15, 2013, 7:30:14 AM5/15/13
to Ruby on Rails: Talk
I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

I am thinking on following lines:

2. str = str.gsub('"', '')

3. **Need to add a line which replaces \\ in the str above to just
\**

4. str = str.force_encoding("iso-8859-1")

5. str = str.encode('UTF-8')

In step 3, I was thinking of something like

str = str.gsub(/\\\\/, "\\")

OR somehow if possible push output of puts or a similar function back
to str example:

> puts str

---

French: 3. Combien de r\xC3\xA9gions y a-t-il au Cameroon?

English: 3. How many regions are there in Cameroon?

but even that works. Can someone please assist?

tamouse mailing lists

unread,
May 16, 2013, 4:46:12 AM5/16/13
to rubyonra...@googlegroups.com
On Wed, May 15, 2013 at 6:30 AM, UA <ritv...@gmail.com> wrote:
> I have a Rails app. One of my clients is importing French Text which
> is appearing weirdly. Check below example:
>
> 1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
> au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"
>
> Can someone assist please?

Wow, this took a while to suss out. I really hate character encodings
and translations, but here we are.

So, the problem basically lies in the fact that the encoded character
is doubly escaped:

irb(main):159:0> '\xC3\xA9'
=> "\\xC3\\xA9"

whereas the other characters are escaped just once:

irb(main):160:0> "\n"
=> "\n"
irb(main):161:0> "\""
=> "\""

what I came up with seems sort of kludgy:

1. Double escape the singly-escaped characters:

irb(main):166:0> new_str = str.gsub(/\"/,'\\"').gsub(/\n/,'\\n')
=> "--- \\nFrench: \\\"3. Combien de r\\xC3\\xA9gions y a-t-il au
Cameroon?\\\"\\nEnglish: 3. How many regions are there in
Cameroon?\\n"

2. Run it through an eval:

irb(main):167:0> eval "new_str = \"#{new_str}\""
=> "--- \nFrench: \"3. Combien de régions y a-t-il au

Matt Jones

unread,
May 16, 2013, 9:23:44 AM5/16/13
to rubyonra...@googlegroups.com


On Wednesday, 15 May 2013 07:30:14 UTC-4, UA wrote:
I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

    1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il
au Cameroon?\"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?


Where is this text coming from? Because that string looks like YAML, complete with the opening "---". \xC3\xA9 is the UTF-8 encoding of codepoint U+00E9, "small letter e with acute", something you'd expect in French text.

If you do `YAML.load(str)` in 1.9 or higher, this is what appears:

irb: YAML.load(str)
===> {"French"=>"3. Combien de régions y a-t-il  au Cameroon?", "English"=>"3. How many regions are there in Cameroon?"}

--Matt Jones

tamouse mailing lists

unread,
May 16, 2013, 11:30:01 PM5/16/13
to rubyonra...@googlegroups.com

That's what I thought originally, too.

When I copied the OP's string as written, and fed it to YAML.load, it flubbed the translation, reversing thebyte order. As far as I can tell, I have UTF-8 set everywhere.   

So I'm not sure why it works for you but not for me...

Reply all
Reply to author
Forward
0 new messages