Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to make mulitlingual website?

1 view
Skip to first unread message

madsgor...@gmail.com

unread,
Jun 27, 2005, 6:23:31 PM6/27/05
to
Hi

I need to make at mulitlingual website, with php amd mysql, and I am
looking for tutorils or books that explains about how to best do this.
Hop ypu have some suggestions as to were I can find information about
this.

Mads

Riddic

unread,
Jun 27, 2005, 9:18:59 PM6/27/05
to
madsgor...@gmail.com wrote:

You could look into existing projects that offer it, phpMyFAQ for example
(www.phpmyfaq.de). Every bit of "Text" that pops up on the website later it
reads from a (big) associative array, specified by files (one for each
language) in the /lang/-directory. Depending on which language you choose,
the specific file gets included.

madsgor...@gmail.com

unread,
Jun 28, 2005, 2:54:43 AM6/28/05
to
Should one place all the text in one big table, or is i better to place
it in several tabels. Should one make 4 colums one for each language,
or 4 rows one for each language?

Jeff North

unread,
Jun 28, 2005, 3:22:03 AM6/28/05
to
On 27 Jun 2005 23:54:43 -0700, in comp.lang.php
"madsgor...@gmail.com" <madsgor...@gmail.com> wrote:

I use:
CREATE TABLE `usr_languages` (
`pageName` varchar(25) default '',
`itemNo` int(11) default '0',
`en` text,
`fr` text,
`it` text
) TYPE=MyISAM COMMENT='different languages';

only because I can immediately see if I have not include text for a
particular language and a particular entry.

pageName: which page this text appears on
itemNo: order of items to be read and placed on page.
en/fr/it: the different languages.

I haven't optimised this as yet so the 'text' field is probably
overkill and a varchar(255) would suffice.
---------------------------------------------------------------
jnor...@yourpantsyahoo.com.au : Remove your pants to reply
---------------------------------------------------------------

Jerry Stuckle

unread,
Jun 28, 2005, 8:57:15 AM6/28/05
to

Think - normalize, normalize, normalize.

Think - 4 columns - one for each language. What happens if you need to
add a fifth language? How much code will you have to change?

Several tables - again, if it's one for each language, how much code
will you have to change? Probably not as much. But in either case you
have to change the database.

One big table - I doubt it will be "big". "Big" in database-ese often
means terabytes. A couple of megabytes is nothing.

I would put things in one table with a layout similar to:

id: integer
language: char(2)
Actual info: text

Integer is not an auto-increment. Rather, it is some id you generate
for each page. The combination of id and language (2 character ICANN
country codes works great in most cases) is used for the primary key.

In you code for a specific page, the id never changes, the language is
based on what they select.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

madsgor...@gmail.com

unread,
Jun 29, 2005, 1:57:05 AM6/29/05
to
Ok, thanks. Have you seen a tutorial / book or something like that
about the subject?

madsgor...@gmail.com

unread,
Jun 29, 2005, 2:03:37 AM6/29/05
to
Ok, thanks, I am going to try it. Have any of you seen a tutorial /
book or something like that about the subject? - perhaps from wrox.

madsgor...@gmail.com

unread,
Jun 29, 2005, 3:22:41 AM6/29/05
to
Is what you are suggesting something along these lines?

or do you agree with Jeff?

CREATE TABLE `usr_languages` (
`pageName` varchar(25) default '',

`id` int(11) default '0',
`language` char(2),
`Actual_info` text,
)

Chung Leong

unread,
Jun 29, 2005, 4:16:10 PM6/29/05
to
Well, it all depends on what your site is and what the languages are.
Localization is a very large topic. You should spend some time looking
at the big picture first before diving into the technical nitty-gritty.


For example, if your site accepts user input, you have to think about
how to deal with the differences in notation and format. Just yesterday
I accidently wrote out a check for $9800 because the application
doesn't understand that 98,00 means 98.00.

Page layout can also be a challenge. A layout that works for one
language might not for another. In English, for example, you could have
something like

I would like to buy a [LIST BOX]

A page design that requires the list box label to be on the left side
would be bad, as the object might need to appear at the beginning or
the middle of the sentence in other languages. Sometimes a problem
could be as simple as not being able to fit a word into a given space.

The difficulty of handling multiple languages in a single codebase is
why people often build a separate site per language instead.

Jerry Stuckle

unread,
Jun 29, 2005, 6:36:21 PM6/29/05
to

No, I do not agree with Jeff. His design, while it works, makes the code very
difficult to modify should he ever want to add a new language. To do so would
require changing the database - and an examination of all the code which
accesses the database. No, all the code would not HAVE to be modified - but all
would have to be inspected.

Normalizing the database helps with these effects. You don't need to change
column names in your queries, for instance. It means if you have a bad language
value (i.e. no sw(ahili) column), you get no data back instead of a MySQL query
error.

The method you quoted is much clearer - except you really don't need an "id"
column. Rather, the primary key should be "pageName, language".

I don't have any tutorials in mind - but if you do a google search on "database
normalization" you should find several.

Jeff North

unread,
Jun 29, 2005, 8:00:24 PM6/29/05
to
On Wed, 29 Jun 2005 17:36:21 -0500, in comp.lang.php Jerry Stuckle
<jstu...@attglobal.net> wrote:

>| madsgor...@gmail.com wrote:
>| > Is what you are suggesting something along these lines?
>| >
>| > or do you agree with Jeff?
>| >
>| > CREATE TABLE `usr_languages` (
>| > `pageName` varchar(25) default '',
>| > `id` int(11) default '0',
>| > `language` char(2),
>| > `Actual_info` text,
>| > )
>| >
>|
>| No, I do not agree with Jeff. His design, while it works, makes the code very
>| difficult to modify should he ever want to add a new language. To do so would
>| require changing the database - and an examination of all the code which
>| accesses the database. No, all the code would not HAVE to be modified - but all
>| would have to be inspected.

Each method has it's pros and cons. Neither is better than the other.

>| Normalizing the database helps with these effects. You don't need to change
>| column names in your queries, for instance. It means if you have a bad language
>| value (i.e. no sw(ahili) column), you get no data back instead of a MySQL query
>| error.
>|
>| The method you quoted is much clearer - except you really don't need an "id"
>| column. Rather, the primary key should be "pageName, language".

If you have 20 fields on a webpage, how would you work out what text
goes where?
NB: where I use this is on forms where there are lots of prompts,
tooltips and other textual information.



>| I don't have any tutorials in mind - but if you do a google search on "database
>| normalization" you should find several.

---------------------------------------------------------------

R. Rajesh Jeba Anbiah

unread,
Jun 30, 2005, 1:08:30 AM6/30/05
to
Chung Leong wrote:
<snip reply>

Question flagged for FAQ. Chung, could you please make a FAQ entry?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com

madsgor...@gmail.com

unread,
Jun 30, 2005, 2:55:51 AM6/30/05
to
Hi

I do not understand all of it, for exsample you write

>Think - normalize, normalize, normalize.

>Think - 4 columns - one for each >language. What happens if you need to
>add a fifth language? How much code >will you have to change?

Which is what Jeff has one column for each language, but if what you
are suggesting is

CREATE TABLE `usr_languages` (
`pageName` varchar(25) default '',

`language` char(2),
`Actual_info` text,
)

then you would get rows for each language.

I rellay find it amazing that wrox or reiley do not seem to have a book
covering this specific issue.

I have done a lot of googling on the subject but the best I have found
is short forum answers.

such as

<?php // config.inc.php

$default_lang = 'en';
$language_codes = array('en', // English
'es', // Spanish
'fr'); // French

function set_lang()
{
if(isset($_GET['lang']) &&
in_array($_GET['lang'], $GLOBALS['language_codes']))
{
$GLOBALS['lang'] = $_GET['lang'];
}
else
{
$GLOBALS['lang'] = $GLOBALS['default_lang'];
}
}

set_lang();
require_once("lang.{$lang}.inc.php"); // include proper resource file
?>

<?php // lang.en.inc.php
$hello_str = "Hello, world!";
$submit_str = "Submit";
$language_names = array("en" => "English",
"es" => "Spanish",
"fr" => "French");
?>

<?php // lang.es.inc.php
$hello_str = "?Hola, el mundo!";
$submit_str = "Som?tase";
$language_names = array("en" => "Ingl?s",
"es" => "Espa?ol",
"fr" => "Franc?s");
?>

<?php // lang.fr.inc.php
$hello_str = "Bonjour, le monde!";
$submit_str = "Soumettre";
$language_names = array("en" => "Anglais",
"es" => "Espagnol",
"fr" => "Fran?ais");
?>


<?php // main.php

require_once('config.inc.php');
// this require_once() call sets the appropriate language
// and includes the proper resource file.
// The main application never outputs anything to the user,
// it only outputs string variables defined in the resource files.

echo $hello_str;


echo "<hr>\n"; // HTML can be output by the application,
// since it's not language-dependent.

// we'll include a short form to let the user change languages.
echo "<form action=\"{$_SERVER['PHP_SELF']}\" method=\"get\">\n";
echo " <select name=\"lang\">\n";
foreach($language_codes as $lang_key)
{
// our form will preselect the current language
$selected = ($lang == $lang_key)? " selected=\"\"" : "";
echo " <option value=\"{$lang_key}\"{$selected}>"
. $language_names[$lang_key]
. "</option>\n";
}
echo " </select>\n";
echo " <input type=\"submit\" value=\"{$submit_str}\" />\n";
echo "</form>\n";

?>

AND

<?php // strings.en.php English strings
$greeting = "Hello";
?>


<?php // strings.es.php Spanish strings
$greeting = "Hola";
?>


<?php // strings.gr.php Greek strings
$greeting = "Yia sou";
?>


<?php // index.php Main script

$lang = 'en'; // Default to English;

if (isset($_GET['lang']))
{
$lang = $_GET['lang']
}

require("strings.{$lang}.php");

echo $greeting. "\n";

echo "View this page in:\n";
echo '<a href="?lang=en">English</a>\n';
echo '<a href="?lang=es">Spanish</a>\n';
echo '<a href="?lang=es">Greek</a>\n';

?>

Jerry Stuckle

unread,
Jun 30, 2005, 11:12:49 AM6/30/05
to
Jeff North wrote:
> On Wed, 29 Jun 2005 17:36:21 -0500, in comp.lang.php Jerry Stuckle
> <jstu...@attglobal.net> wrote:
>
>
>>| madsgor...@gmail.com wrote:
>>| > Is what you are suggesting something along these lines?
>>| >
>>| > or do you agree with Jeff?
>>| >
>>| > CREATE TABLE `usr_languages` (
>>| > `pageName` varchar(25) default '',
>>| > `id` int(11) default '0',
>>| > `language` char(2),
>>| > `Actual_info` text,
>>| > )
>>| >
>>|
>>| No, I do not agree with Jeff. His design, while it works, makes the code very
>>| difficult to modify should he ever want to add a new language. To do so would
>>| require changing the database - and an examination of all the code which
>>| accesses the database. No, all the code would not HAVE to be modified - but all
>>| would have to be inspected.
>
>
> Each method has it's pros and cons. Neither is better than the other.

Then why is normalization so important? A properly normalized database is
almost without exception better than an unnormalized one.


>
>
>>| Normalizing the database helps with these effects. You don't need to change
>>| column names in your queries, for instance. It means if you have a bad language
>>| value (i.e. no sw(ahili) column), you get no data back instead of a MySQL query
>>| error.
>>|
>>| The method you quoted is much clearer - except you really don't need an "id"
>>| column. Rather, the primary key should be "pageName, language".
>
>
> If you have 20 fields on a webpage, how would you work out what text
> goes where?
> NB: where I use this is on forms where there are lots of prompts,
> tooltips and other textual information.

How do you do it with your page?

In his case, he just wants to store the entire page in a table. This works fine
for it.

If you have several fields, normalization is even more important. You could add
a third column to pagename and language - that being "field". Each row would be
identified by a pagename, a language and a field. It now becomes quite easy to
add or delete languages, fields and rows.

>
>
>>| I don't have any tutorials in mind - but if you do a google search on "database
>>| normalization" you should find several.
>
>
> ---------------------------------------------------------------
> jnor...@yourpantsyahoo.com.au : Remove your pants to reply
> ---------------------------------------------------------------

Jerry Stuckle

unread,
Jun 30, 2005, 11:48:00 AM6/30/05
to
madsgor...@gmail.com wrote:
> Hi
>
> I do not understand all of it, for exsample you write
>
>
>>Think - normalize, normalize, normalize.
>
>
>>Think - 4 columns - one for each >language. What happens if you need to
>>add a fifth language? How much code >will you have to change?
>
>
> Which is what Jeff has one column for each language, but if what you
> are suggesting is

Yes, and adding a new language means you need to modify the database itself.

>
> CREATE TABLE `usr_languages` (
> `pageName` varchar(25) default '',
> `language` char(2),
> `Actual_info` text,
> )
>
> then you would get rows for each language.
>

Yes. And adding a new language means NO database changes.

> I rellay find it amazing that wrox or reiley do not seem to have a book
> covering this specific issue.
>

Not really. Relational database design has been rather limited in its
application. And normalization is actually quite a simple concept to grasp
(although not always as easy to implement in complicated instances).

I used to teach a DB2 course (corporate clients). It was a five day course
which included normalization. The normalization section was less than an hour
long plus a short paper lab. Not really enough material for a book.

> I have done a lot of googling on the subject but the best I have found
> is short forum answers.
>

You won't find normalization answers in this newsgroup. However, a quick google
search turned up among others:

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html
http://en.wikipedia.org/wiki/Database_normalization
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node1.html

The first is the easiest to understand and covers the first three levels of
normalization. There are fourth and fifth normal designs, but most databases
don't go that far - third normal is the general rule.

The second goes through the fourth and fifth normal forms also, but isn't quite
as detailed as the first. The third one covers a lot more theory and is harder
to understand - but I included it for those who might be interested.

<code snipped>

You can do it this way. But can I make another suggestion? Perhaps instead of
trying to dynamically build what is basically static text - just have different
directories for each language, and different pages? Yes, its some duplication -
but if you 're not constantly updating a lot of pages, it might be easier.

For instance -
http://www.mysite.com/en - English
http://www.mysite.com/fr - trench
http://www.mysite.com/sp - Spanish

And so on. I've done this for multilingual sites (English/Spanish) and it works
well. Yes, it means if you need to make a change you need to change multiple
pages - but once you have the first page working the rest go quite quickly.

Otherwise, if you do want this to be dynamic, I suggest you place all
language-dependent things in the database. For instance, if you want a page
which says "hello", you could design your database such as:

language char(2)
pagename varchar(255)
fieldnum smallint
data text

And your entries could be something like:
'en', 'hellopage', 1, 'Hello, world!'
'sp', 'hellopage', 1, '?Hola, el mundo!'
'fr', 'hellopage', 1, 'Bonjour, le monde!'

However, if you're going to have multiple items such as for a select box, you
might want to consider more tables:

1st table
language char(2)
pagename varchar(255)
fieldnum smallint
fieldkey int

2nd table
fieldkey int
sequence smallint
data text

Or
1st table
pagename varchar(255)
fieldid smallint
language char(2)
fieldkey int

2nd table
fieldkey int
sequence smallint
data text

Or even

1st table
pagename varchar(255)
language char(2)
pagekey int

2nd table
pagekey int
fieldid smallint
sequence smallint
data text

Any of the three would work (and there are more) - but personally I'd lean
towards the third one based on this limited discussion. However, as I looked
more into the details, I might select one of the other two.

Also, in your way of doing things, if you want to carry the language between
pages, I might suggest using a session variable. I like it better.

Jeff North

unread,
Jun 30, 2005, 1:38:06 PM6/30/05
to
On Thu, 30 Jun 2005 10:12:49 -0500, in comp.lang.php Jerry Stuckle
<jstu...@attglobal.net> wrote:

>| Jeff North wrote:
>| > On Wed, 29 Jun 2005 17:36:21 -0500, in comp.lang.php Jerry Stuckle
>| > <jstu...@attglobal.net> wrote:

[snip]

>| >>| No, I do not agree with Jeff. His design, while it works, makes the code very
>| >>| difficult to modify should he ever want to add a new language. To do so would
>| >>| require changing the database - and an examination of all the code which
>| >>| accesses the database. No, all the code would not HAVE to be modified - but all
>| >>| would have to be inspected.
>| >
>| > Each method has it's pros and cons. Neither is better than the other.
>|
>| Then why is normalization so important? A properly normalized database is
>| almost without exception better than an unnormalized one.

Normalisation is good.

But there are databases that are more efficient when not normalised.

http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ssceqpop.mspx
and scroll to "Consider Database De-normalization".

IOW there are times when you should normalise and there are times when
you shouldn't normalise.

Now lets throw the cat amongst the pidgeons :-)
Here is a simplified example of my data:
en fr it sp
---------------------------------------------------
Hello bonjour ciao hola
Goodbye au revoir arrivederci adiós
password mot de passe parola d'accesso contraseña

Now, is there *really* any repeating data?
Yeah, I know, it can be argued either way.

>| >>| Normalizing the database helps with these effects. You don't need to change
>| >>| column names in your queries, for instance. It means if you have a bad language
>| >>| value (i.e. no sw(ahili) column), you get no data back instead of a MySQL query
>| >>| error.
>| >>|
>| >>| The method you quoted is much clearer - except you really don't need an "id"
>| >>| column. Rather, the primary key should be "pageName, language".
>| >
>| > If you have 20 fields on a webpage, how would you work out what text
>| > goes where?
>| > NB: where I use this is on forms where there are lots of prompts,
>| > tooltips and other textual information.
>|
>| How do you do it with your page?

The id field or looking for a particular phrase in the field.

>| In his case, he just wants to store the entire page in a table. This works fine
>| for it.

True. But (there's always a but isn't there :-) ) what if there are
multiple areas on the page? What if the page is a form with multiple
fields?

A (slightly) different table structure will need to be used.
I've shown one method, you've shown another.
Which is better - depends upon your requirements.

Lets consider another common layout.
You have a page with 3 columns.
Column 1 is the menu.
Column 2 is the content.
Column 3 is adverts specific to that country.

In the above case neither of our table designs would be suitable.

>| If you have several fields, normalization is even more important. You could add
>| a third column to pagename and language - that being "field". Each row would be
>| identified by a pagename, a language and a field. It now becomes quite easy to
>| add or delete languages, fields and rows.

I think you are making the assumption that I'm returning all fields
from the table, I'm not.

(JScript example as I'm not proficent in php)
Entire page contents:
sql ="SELECT pageName,id," + Session("UserLanguage") + " FROM
usr_language WHERE PageName=" + currentPage + " ORDER BY id";

Single item:
sql = "SELECT " + Session("UserLanguage") + " FROM usr_language WHERE
en like '" + phrase + "%'";

As I've said previously "Each method has it's pros and cons. Neither


is better than the other."

Jerry Stuckle

unread,
Jun 30, 2005, 10:05:30 PM6/30/05
to
Jeff North wrote:
>
>
> Normalisation is good.
>
> But there are databases that are more efficient when not normalised.
>
> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ssceqpop.mspx
> and scroll to "Consider Database De-normalization".
>
> IOW there are times when you should normalise and there are times when
> you shouldn't normalise.
>

Jeff,

Yes, I am QUITE familiar with database normalization and denormalization. I've
been doing relational databases for over 20 years, starting with DB2 on
mainframes. (I was working for IBM at the time). I've designed hundreds of
databases, from small one or two tables ones to databases with dozens of tables
and multiple foreign keys in each table. The largest I remember doing had
upwards of 80 tables, over 600 columns and, when populated, exceeded five
terabytes in size.

About the only *valid* reason for denormalizing is performance. The more you
normalize, the more you need to join tables, causing potential performance
problems. This is why most relational database never go beyond 3rd normal form.
In larger database, some tables might only be at 2nd normal form, creating a
hybrid "2.5 normal" database.

But proper normalization also limits changes to the database structure. No, in
your case you do not have any duplicate data at this time. However, you do have
poor design because adding a new language requires you to change the database
structure. This should also be avoided, because changes to the database
structure potentially affect every program using the database - and all code
using the database must be examined. This can be a long process when there is a
lot of code. My solution prevents this.

And yes, you could have duplicate data. For instance - you might have a message
"I'm sorry, this item is not available in your country" in French and Spanish,
but not English or Italian. No, my design won't cure that either - which is why
it's not a 3rd normal design. But this I didn't think it would be necessary in
this instance - the amount of duplicate data, if any, is minimal.


> Now lets throw the cat amongst the pidgeons :-)
>
>

> True. But (there's always a but isn't there :-) ) what if there are
> multiple areas on the page? What if the page is a form with multiple
> fields?
>
> A (slightly) different table structure will need to be used.
> I've shown one method, you've shown another.
> Which is better - depends upon your requirements.
>
> Lets consider another common layout.
> You have a page with 3 columns.
> Column 1 is the menu.
> Column 2 is the content.
> Column 3 is adverts specific to that country.
>
> In the above case neither of our table designs would be suitable.
>

I also gave a slightly more complex design (3 of them, actually) which do solve
that problem. And they are normalized.

>
>
> I think you are making the assumption that I'm returning all fields
> from the table, I'm not.
>

No, but the database still fetches the entire row before selecting which ones to
return.

Jeff North

unread,
Jul 1, 2005, 6:33:33 AM7/1/05
to
On Thu, 30 Jun 2005 21:05:30 -0500, in comp.lang.php Jerry Stuckle
<jstu...@attglobal.net> wrote:

>| Jeff North wrote:
>| >
>| >
>| > Normalisation is good.
>| >
>| > But there are databases that are more efficient when not normalised.
>| >
>| > http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ssceqpop.mspx
>| > and scroll to "Consider Database De-normalization".
>| >
>| > IOW there are times when you should normalise and there are times when
>| > you shouldn't normalise.
>| >

I bow before the infallible god of databases.

EOT

---------------------------------------------------------------

Chung Leong

unread,
Jul 1, 2005, 11:19:06 AM7/1/05
to
Yeah, when I have time I'll put together something. It's a massive
topic but I guess I can at least point of some of the pitfalls.

People too often underestimate the difficulty of localization. They
think all they have to do is extract text strings from their
applications and everything will go hunky dory. Really, the manner by
which you store text is the least of your problems. The solutions
proposed in this thread all assume that you can always find an exact
equivalence of a sentence in every language, when that's not really the
case. In Chinese, for instance, there is no equivalence of the English
"thank you"--you use a different phrase depending on someone gave you
something or did something for you. Heck, there's no simple way to
translate "yes" and "no".

0 new messages