Character encoding

2 views
Skip to first unread message

Kosta Kontos

unread,
Jul 16, 2008, 3:03:14 PM7/16/08
to Php Object Generator
Hi guys

I've read through all the posts I can find in this group that cover
specifying a character encoding.

Most notably, you suggest executing "SET NAMES 'UTF8'" before every
client connection, which can be done by editing the class.database.php
file.

Although this solution is great for people who don't have access to
the mysql server settings, I'm wondering how you would advise this
problem be solved byr those of us who do have access.

For example, I went and edited my.cnf and added the following under
[client]:

...
[client]
...
default-character-set = utf8
...

Then I restarted mysql server, and when I check all the server
settings for character sets, I see that character_set_client,
character_set_connection and character_set_results are all set to
utf8.

Adding an additional line under [mysqld]: ... character_set_server =
utf8 ... also changed character_set_server and character_set_database
to utf8, effectively enforcing utf8 across the board.

Does this mean that I do not need to put that line in the
class.database.php file? I hope this is the case, but could you please
confirm.

Thanks for an awesome piece of software.

Kosta Kontos

unread,
Jul 16, 2008, 7:23:15 PM7/16/08
to Php Object Generator
Ok after plenty of testing with international characters I can safely
say that one still needs to add the following line of code to
class.database.php just after the connection is established:

mysql_query('SET CHARACTER SET utf8;');

... otherwise your UTF8 encoded character(s) will end up getting
encoded twice since you've now configured mysql's default-character-
set to UTF8.


I was testing this by pasting international characters in my form and
submitting (which in turn uses POG in the background to persist to the
database). All my UTF8 characters above 127 were being UTF8 encoded
for a second time, resulting in funky new characters. This not only
meant my database was storing seemingly garbage data, but I also had
to decode information everytime I pulled it out of the database for
display in my PHP.

But using the above mysql_query solved the problem. Now my mysql
server uses UTF8 by default whenever I create a new database / table /
field. And my updated POG database class initiates each connection
with UTF8 charset so there is no double-encoding taking place. In
other words, what you submit in your form is what gets stored in the
database (unless you submit a \ or ' or " or > or <, which all get
escaped, correctly so). Then when you pull it out, you may want to use
htmlspecialchars($string, ENT_QUOTES) to escape \ or ' or " or < or >
- but otherwise that's it. Every other international character gets
displayed perfectly.

I hope this proves to be useful to someone. I am still wrapping my
brain around this issue so I hope I haven't done more harm than good
by trying to explain my findings :-)

Joel

unread,
Jul 18, 2008, 10:44:38 AM7/18/08
to Php Object Generator
thanks for sharing...
Reply all
Reply to author
Forward
0 new messages