Cái này mình trích từ 1 bài viết về Unicode Support ra , có thể là có
ích cho bạn :
Về phía Mysql thì cần lưu ý :
* What character set and collation is used on database, table and
fields (remember, it can be heterogeneous)
* What character set is used during a client session (and remember it
can be changed using SET NAME)
Về phía PHP :
* PHP doesn't take encoding in account. A string in PHP is and will
always be a sequence of bytes.
* It's easy when you're using strings coming from the database (we
saw how to control the encoding above). But when you type directly
strings in the code, what is the encoding used?
* The answer is : the encoding of the file, and this is controlled by
your favorite code editor.
* Good thing to do now, go to the preferences of your text editor and
be sure that the encoding used is UTF-8.
* You should be able to know at every moment what is the encoding of
the string you're manipulating in PHP.
Về phía Apache và HTML :
When a browser receive some HTML content, how does it know which
encoding is used? There's 3 steps for it :
1. Check the Content-type HTTP header : This header gives the type
of the content, and can also precise which encoding is used
* Apache is in charge of sending the HTTP content-type
header
* Good news, we can manipulate this using header function in
PHP
* Even better, we can set a default value in the php.ini
file for all content sent by it
2. If the header is missing, the browser then check the content-
type meta tag in the HTML document
* We need to be sure that no special character appear before
this tag...
* But we can easily modify this in the templates
3. If it's missing also, the browser will try to auto-detect the
character set used in the page (UTF-8 is easily auto-detected because
of its properties)
* Note that web applications are often serving different type of
content (CSV, PDF, etc...) through Apache
* For these special output, the HTTP header is the only
solution, and should be taken care of manually in the code.
The other part concerning HTML is concerning user input (forms). We
need to be able to control the encoding that will be used for sending
the data of a form back to the server. Here is how the browser is
deciding :
* It looks for an accept-encoding attribute in the form tag, if
it's set, it will use this value as the encoding to send the data
* If it's not set, the browser will use the encoding of the page
to send the data
* Best practice in this case is to always specify the excepted
encoding in accept-encoding attribute of the form, even if it's not
mandatory
Hope it useful ^^