Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

utf-8 data

1 view
Skip to first unread message

Question Boy

unread,
Nov 8, 2009, 10:12:50 PM11/8/09
to
If I wish to save user input into a utf-8 database (tables) how would
I go about that? Do I need to convert the input? Does this also mean
I need to convert it back when displaying database info to the user?

Does anyone know of a good web-article on the subject that I could
initiate myself better to the subject matter.

Thank you for your guidance on the matter!

QB

"Álvaro G. Vicario"

unread,
Nov 9, 2009, 4:13:09 AM11/9/09
to
Question Boy escribi�:

> If I wish to save user input into a utf-8 database (tables) how would
> I go about that? Do I need to convert the input?

Only if it's *not* UTF-8 already.


> Does this also mean
> I need to convert it back when displaying database info to the user?

Only if your page does not use UTF-8.


> Does anyone know of a good web-article on the subject that I could
> initiate myself better to the subject matter.
>
> Thank you for your guidance on the matter!

--
-- http://alvaro.es - �lvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programaci�n web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

Willem Bogaerts

unread,
Nov 9, 2009, 4:46:45 AM11/9/09
to
> Does anyone know of a good web-article on the subject that I could
> initiate myself better to the subject matter.

I found this one useful:
http://www.joelonsoftware.com/articles/Unicode.html

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Erwin Moller

unread,
Nov 9, 2009, 6:24:16 AM11/9/09
to
Question Boy schreef:


Hi,

A few things that spring to mind:
(Based on Postgres)

1) Create the tables in the database to store the TEXT as UTF-8.
(In Postgres this can be done by setting the encoding to UTF-8 when
creating the database. But you can finetune this per table, and even per
column if needed.)
2) Deliver your HTML documents in UTF-8 encoding by setting the right
headers in PHP (Header, not META tag, allthough it is wise to set the
META tag too. In case somebody saves the document, the metatags will
tell the unknown client the encoding)
3) Your forms inside that document will inherit the UTF-8 encoding, but
if you wwant you can even set the encoding on a per-form basis.
4) When receiving the data in PHP, you can simply input it in your
database straight from the $_POST (of course, make it safe to insert by
running the appropriate functions to avoid problems and SQL injection.)
5) If you must validate the data (before inserting/updating), be sure
you run the right functions. So instead of strlen() you must use
mb_strlen().
Here are a few:
http://nl3.php.net/manual-lookup.php?pattern=mb_*&lang=en

6) Set internal encoding for the project (websitewide) to UTF-8.
Do this in php.ini or do it above every script with:
mb_internal_encoding("UTF-8");
7) When you return textdata(in utf8) to the client (browser eg) make
sure you run it through htmlspecialchars with the third parameter set to
utf-8.
8) I don't save the PHP sourcecode in UTF-8. It gave me more trouble
than fun when I tried doing that.
9) As always, try to validate your code at W3C.
http://validator.w3.org/
But that has little to do with UTF-8 in itself, it helped me sometimes
to find problems.

The above approach worked like a charm for me on Postgres.
Many people use mysql, in that case make sure you do the right things
for mysql too, but I cannot help you with that.

Regards,
Erwin Moller

--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare

Captain Paralytic

unread,
Nov 9, 2009, 6:34:14 AM11/9/09
to
On 9 Nov, 11:24, Erwin Moller

The manual search does not support wildcards (i.e. *) so your search
is better as
http://nl3.php.net/manual-lookup.php?pattern=mb_&lang=en

Erwin Moller

unread,
Nov 9, 2009, 7:11:13 AM11/9/09
to
Captain Paralytic schreef:

Hi,

It does for me.
???

PHP site even suggested that * search itself (to my surprise).
When I was typing mb_ in the searchbox, it suggested mb_* which I selected.

Now I am curious. If you click:
http://nl3.php.net/manual-lookup.php?pattern=mb_*&lang=en
what do you get as a result?

Gordon

unread,
Nov 9, 2009, 7:51:57 AM11/9/09
to

If the HTML page that contains the form and the HTML page that
displays the output are also using UTF-8 as their encoding then no
conversion should be necessary. As UTF-8 is a superset of ADCII,
converting Latin-1 pages to UTF 8 should be fairly trivial.

Captain Paralytic

unread,
Nov 9, 2009, 8:06:35 AM11/9/09
to
On 9 Nov, 12:11, Erwin Moller

<Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> Captain Paralytic schreef:
>
> > On 9 Nov, 11:24, Erwin Moller
> > <Since_humans_read_this_I_am_spammed_too_m...@spamyourself.com> wrote:
> >> Here are a few:http://nl3.php.net/manual-lookup.php?pattern=mb_*〈=en

>
> > The manual search does not support wildcards (i.e. *) so your search
> > is better as
> >http://nl3.php.net/manual-lookup.php?pattern=mb_〈=en

>
> Hi,
>
> It does for me.
> ???
>
> PHP site even suggested that * search itself (to my surprise).
> When I was typing mb_ in the searchbox, it suggested mb_* which I selected.
>
> Now I am curious. If you click:http://nl3.php.net/manual-lookup.php?pattern=mb_*〈=en

> what do you get as a result?
We're getting a bit off-topic here, but I can't go off-group because
you don't have a real email addy in the header.
However for mb_, I get:
=================================================
PHP Function List

Sorry, but the function mb_ is not in the online manual. Perhaps you
misspelled it, or it is a relatively new function that hasn't made it
into the online documentation yet. The following are the 20 functions
which seem to be closest in spelling to mb_ (really good matches are
in bold). Perhaps you were looking for one of these:

mb_ereg
mb_decode_numericentity
mb_preferred_mime_name
mb_detect_order
mb_strlen
mb_strcut
mb_regex_set_options
mb_output_handler
mb_ereg_search_getregs
mb_eregi_replace
mb_internal_encoding
mb_eregi
mb_substr_count
mb_ereg_search_init
mb_stristr
mb_strstr
mb_strrchr
mb_http_input
mb_strtoupper
mb_encode_numericentity
If you want to search the entire PHP website for the string "mb_",
then click here.

For a quick overview over all documented PHP functions, click here.
=================================================
And for mb_*, I get:
=================================================
PHP Function List

Sorry, but the function mb_* is not in the online manual. Perhaps you
misspelled it, or it is a relatively new function that hasn't made it
into the online documentation yet. The following are the 20 functions
which seem to be closest in spelling to mb_* (really good matches are
in bold). Perhaps you were looking for one of these:

mb_ereg
mb_split
mb_eregi
mb_substr
mb_strpos
mb_strcut
mb_strlen
mb_strstr
mb_strrchr
mb_stristr
mb_strrpos
maxdb_init
maxdb_ping
mb_stripos
maxdb_kill
maxdb_info
maxdb_stat
maxdb_error
mb_language
mb_strrichr
If you want to search the entire PHP website for the string "mb_*",
then click here.

For a quick overview over all documented PHP functions, click here.
=================================================

Satyakaran

unread,
Nov 9, 2009, 8:32:04 AM11/9/09
to
Good data.

The only thing you missed is about editor setting. when you save
something ...

//
http://www.satya-weblog.com

Message has been deleted

Erwin Moller

unread,
Nov 9, 2009, 9:09:16 AM11/9/09
to
Satyakaran schreef:

> Good data.
>
> The only thing you missed is about editor setting. when you save
> something ...
>

Hi,

Thanks.
But I didn't miss that.
I don't save my sourcecode as UTF-8. ;-)

8) I don't save the PHP sourcecode in UTF-8. It gave me more trouble
than fun when I tried doing that.

Regards,
Erwin Moller

>
>
>
>
> //
> http://www.satya-weblog.com

Erwin Moller

unread,
Nov 9, 2009, 9:11:44 AM11/9/09
to

Hi,

Yes, you are evidently right.
I stand corrected.

I saw that list of function and hadn't read the accompanying text:

> Sorry, but the function mb_* is not in the online manual. Perhaps you
> misspelled it, or it is a relatively new function that hasn't made it
> into the online documentation yet.

Make me wonder why the PHP website suggested that mb_* in the first place.

Captain Paralytic

unread,
Nov 9, 2009, 9:35:33 AM11/9/09
to
On 9 Nov, 14:11, Erwin Moller
He he, it wasn't the web site that suggested that, it was your
browser! You have obviously typed that in a box called pattern (maybe
even on this page) before.

Michael Fesser

unread,
Nov 9, 2009, 5:24:06 PM11/9/09
to
.oO(Erwin Moller)

>5) If you must validate the data (before inserting/updating), be sure
>you run the right functions. So instead of strlen() you must use
>mb_strlen().

The MultiByte extension allows function overloading, so you can still
use the normal strlen() etc. This would be preferred, since it makes
moving to a PHP with native Unicode support (6?) much easier.

>8) I don't save the PHP sourcecode in UTF-8. It gave me more trouble
>than fun when I tried doing that.

Seems you used the wrong editor/IDE. ;-)

I store my PHP code, HTML, CSS etc. in UTF-8 and with *nix line endings
since years. Never had any real problems.

I would have trouble with _not_ storing my PHP code with UTF-8, for
example when there are some hard-coded values with special chars, which
should appear on a UTF-8 encoded HTML page.

>The above approach worked like a charm for me on Postgres.
>Many people use mysql, in that case make sure you do the right things
>for mysql too, but I cannot help you with that.

Works there the same. Just define the correct charset and collation on
all the columns where you need it (not necessarily on all of them),
define the connection encoding after you established the connection to
the DB (the first query should be "SET NAMES utf8") and you're done.

Micha

0 new messages