Help with str.sub(pattern, replacement) and french characters

138 views
Skip to first unread message

Mitchell Gould

unread,
Jan 23, 2011, 10:04:58 AM1/23/11
to rails...@googlegroups.com
I am trying to search my database which has product names with french
accents. They are encoded using the html entity codes such as &eactue;
etc.

If a user enters a word with a french accent in the search box I must
convert it to the html entity code so it can be found in the database.

So I thought to use str.sub(pattern, replacement) => new_str

However when I try this using product.sub('é','é') for example it
results in the following:

find(:all, :select => 'product_id, name', :order => "name", :conditions
=> ["name like ? and locale =?", "%#{product.sub('é','é')}%",
I18n.locale])

When I enter 'é' in the seach box I get the following:

SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')

so it does not replace the 'é' with '&eactue'

But if I change the letter from 'é' to 'e' and do a search for 'e' I
get the following:

SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')

so the replacement works.

Can anyone explain why it won't work for the character with french
accent?

Thank you in advance.
Mitch

--
Posted via http://www.ruby-forum.com/.

Henrik Nyh

unread,
Jan 23, 2011, 10:27:32 AM1/23/11
to rails...@googlegroups.com
Ideally, you shouldn't have HTML entities in the database. If you need them in your HTML (and you don't, if you set an explicit encoding, except for things like &<>) then you should add them outside the database.

If you do have "é" stored in the database, not as an entity, I believe MySQL's "LIKE" will be accent-insensitive by default, unless you use "COLLATE utf8_bin" (google for details).

Note that if you use "sub", you will only replace the first occurrence in the string. You probably want "gsub".

And if you do something like "bléh".sub("é", "&eacute;") and it doesn't replace the "é", the issue could be how the "é" is represented. In UTF-8, accented characters can be represented either composed as a single glyph ("latin small letter e with acute") or decomposed as two glyphs: "latin small letter e" + "combining acute accent". So if your string contains the first type of é and your sub/gsub tries to replace the other type, it won't work. You can normalize the string to ensure everything is composed or decomposed, but it would be better not to have entities in the database.


--
You received this message because you are subscribed to the Google Groups "rails-i18n" group.
To post to this group, send email to rails...@googlegroups.com.
To unsubscribe from this group, send email to rails-i18n+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rails-i18n?hl=en.


Mitchell Gould

unread,
Jan 23, 2011, 6:14:43 PM1/23/11
to rails...@googlegroups.com
Henrik --- wrote in post #976928:

> Ideally, you shouldn't have HTML entities in the database. If you need
> them
> in your HTML (and you don't, if you set an explicit encoding, except for
> things like &<>) then you should add them outside the database.
>
> If you do have "" stored in the database, not as an entity, I believe

> MySQL's "LIKE" will be accent-insensitive by default, unless you use
> "COLLATE utf8_bin" (google for details).
>
> Note that if you use "sub", you will only replace the first occurrence
> in
> the string. You probably want "gsub".
>
> And if you do something like "blh".sub("", "&eacute;") and it doesn't
> replace the "", the issue could be how the "" is represented. In UTF-8,

> accented characters can be represented either composed as a single glyph
> ("latin small letter e with acute") or decomposed as two glyphs: "latin
> small letter e" + "combining acute accent". So if your string contains
> the
> first type of and your sub/gsub tries to replace the other type, it

> won't
> work. You can normalize the string to ensure everything is composed or
> decomposed, but it would be better not to have entities in the database.

Ok I will take your advise and remove the html entities from the
database.
The reason I put them in was because even with explicit encoding I was
not getting the characters to show properly. I was getting a black
triangle with a question mark.

Could you assist me on how to encode the web page so that it shows the
accents. I thought you just use UTF-8?

Thanks for your help. I really appreciate it.

Andrés Mejía

unread,
Jan 23, 2011, 6:19:04 PM1/23/11
to rails...@googlegroups.com
Yes, encode the file in UTF-8 and add a tag like this on your head section:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


Henrik Nyh

unread,
Jan 23, 2011, 6:20:55 PM1/23/11
to rails...@googlegroups.com
On Mon, Jan 24, 2011 at 00:14, Mitchell Gould <li...@ruby-forum.com> wrote:
Ok I will take your advise and remove the html entities from the
database.
The reason I put them in was because even with explicit encoding I was
not getting the characters to show properly. I was getting a black
triangle with a question mark.

Could you assist me on how to encode the web page so that it shows the
accents. I thought you just use UTF-8?

Check what your browser thinks the encoding is. Check that UTF-8 is declared in the HTTP headers or a meta element (and if they disagree, I'm not entirely sure what goes - research that). http://htmlpurifier.org/docs/enduser-utf8.html has some info.

Also ensure the font you're using can handle that glyph. I would guess most fonts can display é. But if everything else looks right, try some standard font like Times and see what happens.

Mitchell Gould

unread,
Jan 24, 2011, 2:39:12 AM1/24/11
to rails...@googlegroups.com
I removed some HTML entities from my database to test the effect. I made
sure my web page is UTF-8 encoded.

Now instead of "électronique" I get name: "\xC9lectronic" where
the"\xC9" displays like a black triangle with a "?" in it.

I also changed the font to times.

I read up and learned that MYSQL might be delivering the characters in a
format other than UTF-8.

I changed my database, table, and field to be UTF-8.

I still get the same problem as stated above.

What gives?

Thanks in advance

MItch

Mitchell Gould

unread,
Jan 24, 2011, 2:55:03 AM1/24/11
to rails...@googlegroups.com
Hi,
I figured it all out. I need to explicitly tell Rails that the database
is

using utf8 encoding by putting the following in the database.yml file

encoding: utf8

now it displays perfectly.

I hope this is still in line with best practices as I don't want to mess
this up again.

Thanks

MItch

top 3.

unread,
Aug 9, 2013, 12:19:22 AM8/9/13
to rails...@googlegroups.com
منتديات توب عرب للمبدعين احساس منتديات عامة
افضل المنتديات العربية
http://www.top3rab.mrsaal.com
http://www.top3rab.mrsaal.com/forums/toparab10
http://www.top3rab.mrsaal.com/forums/toparab14
http://www.top3rab.mrsaal.com/forums/toparab41
http://www.top3rab.mrsaal.com/forums/toparab47
http://www.top3rab.mrsaal.com/forums/toparab54
http://www.top3rab.mrsaal.com/forums/toparab49
http://www.top3rab.mrsaal.com/forums/toparab49
http://www.top3rab.mrsaal.com/forums/toparab48
http://www.top3rab.mrsaal.com/forums/toparab48
http://www.top3rab.mrsaal.com/forums/toparab50
http://www.top3rab.mrsaal.com/forums/toparab51
http://www.top3rab.mrsaal.com/forums/toparab52
http://www.top3rab.mrsaal.com/forums/toparab53
http://www.top3rab.mrsaal.com/forums/toparab53
http://www.top3rab.mrsaal.com/forums/toparab55
http://www.top3rab.mrsaal.com/forums/toparab56
http://www.top3rab.mrsaal.com/forums/toparab57
http://www.top3rab.mrsaal.com/forums/toparab57
http://www.top3rab.mrsaal.com/forums/toparab58
http://www.top3rab.mrsaal.com/forums/toparab59
http://www.top3rab.mrsaal.com/forums/toparab60
http://www.top3rab.mrsaal.com/forums/toparab61
http://www.top3rab.mrsaal.com/forums/toparab62
http://www.top3rab.mrsaal.com/forums/toparab62
http://www.top3rab.mrsaal.com/forums/toparab64
http://www.top3rab.mrsaal.com/forums/toparab65
http://www.top3rab.mrsaal.com/forums/toparab66
http://www.top3rab.mrsaal.com/forums/toparab67
http://www.top3rab.mrsaal.com/forums/toparab67
http://www.top3rab.mrsaal.com/forums/toparab68
http://www.top3rab.mrsaal.com/forums/toparab69
http://www.top3rab.mrsaal.com/forums/toparab69
http://www.top3rab.mrsaal.com/forums/toparab70/
http://www.top3rab.mrsaal.com/forums/toparab71
http://www.top3rab.mrsaal.com/forums/toparab72
http://www.top3rab.mrsaal.com/forums/toparab72
http://www.top3rab.mrsaal.com/forums/toparab73
http://www.top3rab.mrsaal.com/forums/toparab74
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab76
http://www.top3rab.mrsaal.com/forums/toparab76
http://www.top3rab.mrsaal.com/forums/toparab77
http://www.top3rab.mrsaal.com/forums/toparab78
http://www.top3rab.mrsaal.com/forums/toparab79
http://www.top3rab.mrsaal.com/forums/toparab79
http://www.top3rab.mrsaal.com/forums/toparab80
http://www.top3rab.mrsaal.com/forums/toparab80
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab42
http://www.top3rab.mrsaal.com/forums/toparab43
http://www.top3rab.mrsaal.com/forums/toparab43
http://www.top3rab.mrsaal.com/forums/toparab45
http://www.top3rab.mrsaal.com/forums/toparab46
http://www.top3rab.mrsaal.com/forums/toparab44
http://www.top3rab.mrsaal.com/forums/toparab44
Reply all
Reply to author
Forward
0 new messages