Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Foreign characters behaving oddly

0 views
Skip to first unread message

Matthew White

unread,
Jul 16, 2007, 5:31:14 PM7/16/07
to
Hello,
I have a website that is supposed to grab a French word, and return the
English translation. The front-end has an AJAX script, that dynamically
POST's the value to the backend:

function post() {
var string = document.getElementById("string").value;
var poststr = "string=" + encodeURI( string );
makePOSTRequest('dict.eng.php', poststr);
}

Then the backend takes the script, and queries a database for 30 words most
like that word:

$query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%') ORDER
BY fr LIMIT 30";
$query = mysql_query($query);

If I enter in a word like "bonjour", the script returns the words that are
most like bonjour. A word with a special character, like "français", will
return no values, even though it is in the dictionary. The page is in
UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
please point me in the right direction?

Matt


eisenstein

unread,
Jul 17, 2007, 4:44:51 AM7/17/07
to

Try to define first the connection encoding as utf8 (SET NAMES utf8),
before doing any db-transaactions.
Be aware that not all php-functions can handle unicode data strings:

mysql_query("SET NAMES 'utf8'");

eisenstein

Markus

unread,
Jul 17, 2007, 1:01:37 PM7/17/07
to
Matthew White schrieb:

Your Ajax function does encodeURI( string ) - do you decode it somewhere
before you do the database query? You can check this with var_dump($string).

Anyway, as you do a POST request, I would actually try to go without
urlencoding the string (this is needed with the GET method).
Javascript's encodeURI()/decodeURI() and PHP's urlencode()/urldecode()
may have different behaviours. I'd rather try to have your function send
the data as UTF-8. In a normal form, this would be done with the
accept-charset="UTF-8" attribute in the form tag; I don't know wether
this also works when sending data with your method.

HTH
Markus

Matthew White

unread,
Jul 17, 2007, 2:31:56 PM7/17/07
to
I added that query right after calling the database, and it now works fine,
but here is a problem- "français" returns three matches:
français
française
françaises

Why is "ç" being substituted for "ç", even when I pass each returned string
through htmlentities()?

Matt

"eisenstein" <stefan....@gmail.com> wrote in message
news:1184661891.7...@d30g2000prg.googlegroups.com...

Matthew White

unread,
Jul 17, 2007, 2:33:05 PM7/17/07
to
Well, the AJAX script passes the string correctly, because the PHP script
picks it up without problem. The issue seems to be with MySQL (see
eisenstein's post above).

"Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
news:469cedd6$1...@news.cybercity.ch...

Allodoxaphobia

unread,
Jul 17, 2007, 7:25:19 PM7/17/07
to
On Mon, 16 Jul 2007 21:31:14 GMT, Matthew White posted:

> Subject: Foreign characters behaving oddly

I need to mention that around here the foreign characters behave quite
normally. It's the local characters that seem to be behaving oddly.
Don't *even* get me started about the elected characters.

Rik

unread,
Jul 17, 2007, 8:02:53 PM7/17/07
to
On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com> wrote:

> I added that query right after calling the database, and it now works
> fine,
> but here is a problem- "français" returns three matches:
> français
> française
> françaises
>
> Why is "ç" being substituted for "ç", even when I pass each returned
> string
> through htmlentities()?

Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
iconv to ensure all internal encoding is in utf8?

http://www.php.net/iconv
--
Rik Wasmus

Matthew White

unread,
Jul 17, 2007, 8:29:41 PM7/17/07
to
I tried the iconv, both for internal and external, but to no avail. I also
added in the mysql_query that set UTF-8, and I have also set htmlentities
with the third argument of "utf-8". The output is still corrupted.

Matt

"Rik" <luiheid...@hotmail.com> wrote in message
news:op.tvmvq3jkqnv3q9@metallium...

Markus

unread,
Jul 18, 2007, 3:19:50 AM7/18/07
to
Matthew White schrieb:

> "Rik" <luiheid...@hotmail.com> wrote in message
> news:op.tvmvq3jkqnv3q9@metallium...
>> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com> wrote:
>>
>>> I added that query right after calling the database, and it now works
>>> fine,
>>> but here is a problem- "français" returns three matches:
>>> français
>>> française
>>> françaises
>>>
>>> Why is "ç" being substituted for "ç", even when I pass each returned
>>> string
>>> through htmlentities()?
>>
>> Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
>> iconv to ensure all internal encoding is in utf8?
>>
>> http://www.php.net/iconv

> I tried the iconv, both for internal and external, but to no avail. I


> also added in the mysql_query that set UTF-8, and I have also set
> htmlentities with the third argument of "utf-8". The output is still
> corrupted.

It looks like your string is in UTF-8 encoding, but the output is
converted to Latin-1 or whatever. Check the following points:

1. All scripts (PHP, HTML) are in UTF-8 encoding

2. Send UTF-8 header to the browser:
header('Content-Type: text/html; charset=UTF-8');

3. Set also the appropriate Meta tag in the HTML source (should not be
necessary if correct header is sent, but you never know about browsers):
<meta http-equiv="content-type" content="text/html;charset=UTF-8">


BTW, Please get used to bottom-posting when you correspond with
newsgroups and mailing lists (add your answer below the text you quote,
rather than above as you do in normal e-mail).

HTH
Markus

Matthew White

unread,
Jul 18, 2007, 10:07:35 AM7/18/07
to
"Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
news:469db6f9$1...@news.cybercity.ch...

I had already made sure of the first and last, but I did add the header() to
my PHP file. It has made no difference in the output.

Matt

Good Man

unread,
Jul 18, 2007, 11:58:53 AM7/18/07
to
"Matthew White" <mgw...@msn.com> wrote in
news:H8pni.8062$fj5.7565@trnddc08:

> "Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
> news:469db6f9$1...@news.cybercity.ch...
>> Matthew White schrieb:
>>
>>> "Rik" <luiheid...@hotmail.com> wrote in message
>>> news:op.tvmvq3jkqnv3q9@metallium...
>>>> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com>
>>>> wrote:
>>>>
>>>>> I added that query right after calling the database, and it now
>>>>> works fine,
>>>>> but here is a problem- "français" returns three matches:
>>>>> français
>>>>> française
>>>>> françaises
>>>>>
>>>>> Why is "ç" being substituted for "ç", even when I pass each
>>>>> returned string
>>>>> through htmlentities()?
>

> I had already made sure of the first and last, but I did add the
> header() to my PHP file. It has made no difference in the output.

Sorry to see this struggle go on for days!

I know some versions of MySQL were buggy with mixing collation types,
and perhaps that is a clue to your problem. Have you looked into using
COLLATE in your SQL query? Not sure if its the right tree to bark up,
but hey, its another tree:
http://dev.mysql.com/doc/refman/5.1/en/charset-collations.html

and then further back,
http://dev.mysql.com/doc/refman/5.1/en/charset.html

good luck


Markus

unread,
Jul 19, 2007, 8:29:07 AM7/19/07
to
Matthew White schrieb:

> "Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
> news:469db6f9$1...@news.cybercity.ch...
>> Matthew White schrieb:
>>
>>> "Rik" <luiheid...@hotmail.com> wrote in message
>>> news:op.tvmvq3jkqnv3q9@metallium...
>>>> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com>
>>>> wrote:
>>>>
>>>>> I added that query right after calling the database, and it now
>>>>> works fine,
>>>>> but here is a problem- "français" returns three matches:
>>>>> français
>>>>> française
>>>>> françaises
>>>>>
>>>>> Why is "ç" being substituted for "ç", even when I pass each
>>>>> returned string
>>>>> through htmlentities()?
[...]

>> It looks like your string is in UTF-8 encoding, but the output is
>> converted to Latin-1 or whatever. Check the following points:
>>
>> 1. All scripts (PHP, HTML) are in UTF-8 encoding
>>
>> 2. Send UTF-8 header to the browser:
>> header('Content-Type: text/html; charset=UTF-8');
>>
>> 3. Set also the appropriate Meta tag in the HTML source (should not be
>> necessary if correct header is sent, but you never know about browsers):
>> <meta http-equiv="content-type" content="text/html;charset=UTF-8">
[...]

>
> I had already made sure of the first and last, but I did add the
> header() to my PHP file. It has made no difference in the output.

Hum... if you don't find the solution in the links posted by Good Man,
you could try to add

ini_set('default_charset', 'utf-8');

to your PHP script (somewhere at the top); but I also think it is rather
a MySQL issue now. BTW, which MySQL version do you use?

One possible reason is that the db contents, that existed before you
added mysql_query("SET NAMES 'utf8'"), are now returned distorted, as
you entered them without telling the DB they are UTF-8, so "ç" was
stored as "ç", which will now be returned in proper UTF-8 encoding. To
test this, make the same test with data you entered after you added the
"SET NAMES" query.

Anyway, if this is the case, it is likely that your original problem
re-arises with all data entered with proper SET NAMES setting!

Nis Jørgensen

unread,
Jul 19, 2007, 8:34:52 AM7/19/07
to
Matthew White skrev:

> I added that query right after calling the database, and it now works fine,
> but here is a problem- "français" returns three matches:
> français
> française
> françaises
>
> Why is "ç" being substituted for "ç", even when I pass each returned
> string
> through htmlentities()?

Htmlentities will interpret what comes from the database as iso-8859-1,
while it is in fact utf-8.

Either use
htmlentities($myvar, ENT_QUOTES, 'utf-8')
or
htmlspecialchars($myvar)

I recommend the second option - if your output is utf-8, you should
hardly ever need htmlentities.

Nis

Matthew White

unread,
Jul 19, 2007, 3:02:29 PM7/19/07
to
"Matthew White" <mgw...@msn.com> wrote in message
news:CsRmi.2399$s25.1211@trndny04...

Retracing my steps, I opened up the MySQL database, only to find those
values were corrupted. After adding in mysql_query("SET NAMES 'utf8'") to
the script that parses the dictionary file, I was able to make everything
work well. Thanks for everyone's help!

Matt

0 new messages