function post() {
var string = document.getElementById("string").value;
var poststr = "string=" + encodeURI( string );
makePOSTRequest('dict.eng.php', poststr);
}
Then the backend takes the script, and queries a database for 30 words most
like that word:
$query = "SELECT * FROM dictionary WHERE fr like ('" . $string . "%') ORDER
BY fr LIMIT 30";
$query = mysql_query($query);
If I enter in a word like "bonjour", the script returns the words that are
most like bonjour. A word with a special character, like "français", will
return no values, even though it is in the dictionary. The page is in
UTF-8, and the database, tables, and fields are all utf8_bin. Can anyone
please point me in the right direction?
Matt
Try to define first the connection encoding as utf8 (SET NAMES utf8),
before doing any db-transaactions.
Be aware that not all php-functions can handle unicode data strings:
mysql_query("SET NAMES 'utf8'");
eisenstein
Your Ajax function does encodeURI( string ) - do you decode it somewhere
before you do the database query? You can check this with var_dump($string).
Anyway, as you do a POST request, I would actually try to go without
urlencoding the string (this is needed with the GET method).
Javascript's encodeURI()/decodeURI() and PHP's urlencode()/urldecode()
may have different behaviours. I'd rather try to have your function send
the data as UTF-8. In a normal form, this would be done with the
accept-charset="UTF-8" attribute in the form tag; I don't know wether
this also works when sending data with your method.
HTH
Markus
Why is "ç" being substituted for "ç", even when I pass each returned string
through htmlentities()?
Matt
"eisenstein" <stefan....@gmail.com> wrote in message
news:1184661891.7...@d30g2000prg.googlegroups.com...
"Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
news:469cedd6$1...@news.cybercity.ch...
> Subject: Foreign characters behaving oddly
I need to mention that around here the foreign characters behave quite
normally. It's the local characters that seem to be behaving oddly.
Don't *even* get me started about the elected characters.
> I added that query right after calling the database, and it now works
> fine,
> but here is a problem- "français" returns three matches:
> français
> française
> françaises
>
> Why is "ç" being substituted for "ç", even when I pass each returned
> string
> through htmlentities()?
Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
iconv to ensure all internal encoding is in utf8?
http://www.php.net/iconv
--
Rik Wasmus
Matt
"Rik" <luiheid...@hotmail.com> wrote in message
news:op.tvmvq3jkqnv3q9@metallium...
> "Rik" <luiheid...@hotmail.com> wrote in message
> news:op.tvmvq3jkqnv3q9@metallium...
>> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com> wrote:
>>
>>> I added that query right after calling the database, and it now works
>>> fine,
>>> but here is a problem- "français" returns three matches:
>>> français
>>> française
>>> françaises
>>>
>>> Why is "ç" being substituted for "ç", even when I pass each returned
>>> string
>>> through htmlentities()?
>>
>> Well, it's clearly not interpreted as UTF8 as it should be. Maybe use
>> iconv to ensure all internal encoding is in utf8?
>>
>> http://www.php.net/iconv
> I tried the iconv, both for internal and external, but to no avail. I
> also added in the mysql_query that set UTF-8, and I have also set
> htmlentities with the third argument of "utf-8". The output is still
> corrupted.
It looks like your string is in UTF-8 encoding, but the output is
converted to Latin-1 or whatever. Check the following points:
1. All scripts (PHP, HTML) are in UTF-8 encoding
2. Send UTF-8 header to the browser:
header('Content-Type: text/html; charset=UTF-8');
3. Set also the appropriate Meta tag in the HTML source (should not be
necessary if correct header is sent, but you never know about browsers):
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
BTW, Please get used to bottom-posting when you correspond with
newsgroups and mailing lists (add your answer below the text you quote,
rather than above as you do in normal e-mail).
HTH
Markus
I had already made sure of the first and last, but I did add the header() to
my PHP file. It has made no difference in the output.
Matt
> "Markus" <derernst@NO#SP#AMgmx.ch> wrote in message
> news:469db6f9$1...@news.cybercity.ch...
>> Matthew White schrieb:
>>
>>> "Rik" <luiheid...@hotmail.com> wrote in message
>>> news:op.tvmvq3jkqnv3q9@metallium...
>>>> On Tue, 17 Jul 2007 20:31:56 +0200, Matthew White <mgw...@msn.com>
>>>> wrote:
>>>>
>>>>> I added that query right after calling the database, and it now
>>>>> works fine,
>>>>> but here is a problem- "français" returns three matches:
>>>>> français
>>>>> française
>>>>> françaises
>>>>>
>>>>> Why is "ç" being substituted for "ç", even when I pass each
>>>>> returned string
>>>>> through htmlentities()?
>
> I had already made sure of the first and last, but I did add the
> header() to my PHP file. It has made no difference in the output.
Sorry to see this struggle go on for days!
I know some versions of MySQL were buggy with mixing collation types,
and perhaps that is a clue to your problem. Have you looked into using
COLLATE in your SQL query? Not sure if its the right tree to bark up,
but hey, its another tree:
http://dev.mysql.com/doc/refman/5.1/en/charset-collations.html
and then further back,
http://dev.mysql.com/doc/refman/5.1/en/charset.html
good luck
Hum... if you don't find the solution in the links posted by Good Man,
you could try to add
ini_set('default_charset', 'utf-8');
to your PHP script (somewhere at the top); but I also think it is rather
a MySQL issue now. BTW, which MySQL version do you use?
One possible reason is that the db contents, that existed before you
added mysql_query("SET NAMES 'utf8'"), are now returned distorted, as
you entered them without telling the DB they are UTF-8, so "ç" was
stored as "ç", which will now be returned in proper UTF-8 encoding. To
test this, make the same test with data you entered after you added the
"SET NAMES" query.
Anyway, if this is the case, it is likely that your original problem
re-arises with all data entered with proper SET NAMES setting!
Htmlentities will interpret what comes from the database as iso-8859-1,
while it is in fact utf-8.
Either use
htmlentities($myvar, ENT_QUOTES, 'utf-8')
or
htmlspecialchars($myvar)
I recommend the second option - if your output is utf-8, you should
hardly ever need htmlentities.
Nis
Retracing my steps, I opened up the MySQL database, only to find those
values were corrupted. After adding in mysql_query("SET NAMES 'utf8'") to
the script that parses the dictionary file, I was able to make everything
work well. Thanks for everyone's help!
Matt