localization interface?

89 views
Skip to first unread message

pipfros...@gmail.com

unread,
Mar 29, 2019, 4:45:10 PM3/29/19
to PHP Framework Interoperability Group
Hi,

Are there any plans to publish a localization interface - for things like BCP 47 tags for identifying languages, timezone identifiers, standardized date and time formats based on language, etc.?

pipfros...@gmail.com

unread,
Mar 30, 2019, 4:33:44 PM3/30/19
to PHP Framework Interoperability Group
This is kind of what I am thinking.

$error = "What the hell are you doing? Read the fine manual. Everyone knows that you can not overwrite /etc/shadow from a web application.";
$foobar = new classThatImplements('\Psr\Localization\Translation');
$error = $foobar->translateString($error, $bcp47);

It would potentially allow applications to load their own translations into such a class w/o needing to implement the backend themselves and then retrieve them as needed, possibly even a class that is good at compensating for translations the web application does not have translations for, written by programmers who understand localization and create implementations of the interface.

Many programmers do not really understand internationalization but could write their web (or other) PHP applications to a standardized interface, like how WordPress plugin developers can just use __('string') and do not have to worry about it. And it would allow system administrators to choose the implementation they want, possibly even commercial implementations that will attempt to fetch translated messages from translation software when translations do not exist.

Navarr Barnier

unread,
Mar 30, 2019, 9:57:12 PM3/30/19
to PHP Framework Interoperability Group
I feel like this would be a good idea, for managing the backend of getting the translations.

For formatting there's something native in PHP

Oscar Otero

unread,
Mar 31, 2019, 5:42:02 AM3/31/19
to php...@googlegroups.com
I did suggest something like that some time ago, but it did not gain enough interest. 
There’s my initial proposal:

I’d happy if anyone want to work on this spec.


El 31 mar 2019, a las 3:57, Navarr Barnier <nav...@gtaero.net> escribió:

I feel like this would be a good idea, for managing the backend of getting the translations.

For formatting there's something native in PHP

--
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.
To post to this group, send email to php...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/php-fig/1fdfacc8-d59b-4b7e-9040-ea936e4f3a4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Makarov

unread,
Mar 31, 2019, 8:15:12 PM3/31/19
to PHP Framework Interoperability Group
Hello Oscar,

If you need some expertise, I can be part of the working group.

The document is a good start but there are some controversial and incorrect parts:

1. The message argument description should not specify that it is the text to translate. In fact, it is an ID to find the translation for. The ID could be message itself or just an ID. In Yii 2 we're using message itself that is, as well, used as a fallback in case translation isn't there. In Java it's common to use dot-separated IDs instead of real messages.
2. "The text to translate (in singular)" for plurals should not mention that it's in the singular. Different implementations may use different strings as message IDs.
3. It's stated that Yii doesn't have a way to handle plurals but in fact, it has one of the most advanced message translation layers that includes plurals support: https://www.yiiframework.com/doc/guide/2.0/en/tutorial-i18n#plural
4. The current interface is limiting usage in a bad way forcing you to concatenate the final message from many parts by not including an argument for passing parameters. It is extremely inconvenient to be limited to static messages without placeholders.
5. There could be multiple plurals in the message.

Instead, I'd suggest a single method for the interface:

public function translate($message, $parameters = [], $context = null);

This way it covers regular messages:

translate('Hello!')

Plurals:

translate('{n} cats', ['n' => 10])

Parameter replacing:

translate('Hello, {username}! Thank you for participating in {eventname}.', ['username' => 'Oscar', 'eventname' => 'PHP-FIG meeting'])

etc.

On Sunday, March 31, 2019 at 12:42:02 PM UTC+3, Oscar Otero wrote:
I did suggest something like that some time ago, but it did not gain enough interest. 
There’s my initial proposal:

I’d happy if anyone want to work on this spec.
El 31 mar 2019, a las 3:57, Navarr Barnier <nav...@gtaero.net> escribió:

I feel like this would be a good idea, for managing the backend of getting the translations.

For formatting there's something native in PHP

--
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php...@googlegroups.com.

Alexander Makarov

unread,
Mar 31, 2019, 8:27:56 PM3/31/19
to PHP Framework Interoperability Group
I think I must explain why I propose adding parameters. It is because choosing a message by its ID is implemented differently in, for instance, gettext and ICU. In ICU it's a single message including all the variants in it so items number isn't required to get the message by its ID. For gettext there are multiple messages so items number is actually required to get the correct message.

Another thing that should be explicitly stated in the interface name is if that's about final translation or just getting a raw string for further processing. If that's getting raw messages I'd suggest renaming interface:

interface MessageSource
{
    public function getMessage($id, $parameters = [], $context = null);

Mikko Rantalainen

unread,
Apr 1, 2019, 12:52:23 AM4/1/19
to PHP Framework Interoperability Group
On Saturday, 30 March 2019 22:33:44 UTC+2, pipfros...@gmail.com wrote:
This is kind of what I am thinking.

$error = "What the hell are you doing? Read the fine manual. Everyone knows that you can not overwrite /etc/shadow from a web application.";
$foobar = new classThatImplements('\Psr\Localization\Translation');
$error = $foobar->translateString($error, $bcp47);

It would potentially allow applications to load their own translations into such a class w/o needing to implement the backend themselves and then retrieve them as needed, possibly even a class that is good at compensating for translations the web application does not have translations for, written by programmers who understand localization and create implementations of the interface.

I find this kind of interface problematic in practice. One major issue that especially translations have is that there needs to be a tool that can extract *all* strings that need to be translated and as I see it, you either have something like gettext where tools extract the strings from the source code or the programmers need to manually collect all strings into some file and refer those strings using some kind of IDs in the source code.

I've used both variants in the history and I feel that the gettext style is far easier for the programmers. Both are about equally hard for the translators.

The variant you suggest above needs a tool that can statically extract all strings by evaluating the source code which would require quite complex tool.

The only thing currently missing from the gettext support is ability to override the "_" method which is currently hardwired to gettext(). If We had ability to map function "_" to some custom class and in addition have "__" mapped to ngettext and "___" mapped to dcngettext by default but still mappable to something else, I guess we would have a winner.

As for the localization names and float/monetary formatting, those should have more similar interface. Currently every part currently needs different interface.

TimeZone support is pretty good already. Olson database just needs more information.

-- 
Mikko

Alexander Makarov

unread,
Apr 1, 2019, 7:40:47 AM4/1/19
to PHP Framework Interoperability Group
Yes, the tool is necessary as well as fallbacks mechanics. We have all that in Yii and I can not imagine working with translations without these extra tools. Extracting translations is not that complex btw. I don't think that sticking to gettext is a good idea. ICU/intl is far superior in flexibility.

Oscar Otero

unread,
Apr 1, 2019, 1:12:37 PM4/1/19
to php...@googlegroups.com
Hi, Alexander.
Thanks for your availability to work on this. I don’t have much time currently but can help as a part of the working group if finally this psr goes forward.

About your comments:

- 1 & 2. You’re right. The message argument is the ID of the message. Sometimes it can be used as fallback text (like in gettext), but it’s out of the spec. In fact, the spec says it must return NULL if no translation was found.
- 3, 5. Yes, plural support is very important. I’m the maintainer of https://github.com/oscarotero/Gettext that can handle plurals of any language. Anyway, for implementations not supporting plurals, we can divide the spec in two different interfaces or throw an exception.
- 4. The parameters are not part of the spec for various reasons:
  - You’re mixing placeholder parameters with the counter argument used for plurals. They are different things and the counter argument should be typed.
  - This force us to define a placeholder format that will be incompatible with most existing i18n implementations. ({user}, :user, %user%, etc…)
  - There’s no need to concatenate messages. You can use sprintf, strtr, or any other method to search/replace placeholders for final values. And I think this should be out of the scope of this spec. For example, is better sprintf($foobar->translate(‘Hello, %s), $user) than $foobar->translate(‘Hello %s’, [‘user’]). By dividing the translation and placeholders in two different functionalities, we have a more flexible spec.
  - We may need to transform these parameters to fit with the current locale settings (decimal separators for numbers, date formats, etc). In fact, after my initial proposal, there was some discussions to include the FormatterInterface (or something like this) in order to combine both worlds (for example gettext and Intl extension). Sorry, this conversation was lost because it was in a slack channel from long time ago and old messages are removed due limit free plan.








-- 
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.

To post to this group, send email to php...@googlegroups.com.

Alexander Makarov

unread,
Apr 2, 2019, 12:33:29 PM4/2/19
to PHP Framework Interoperability Group
That won't work for intl/ICU:

1. It doesn't require number of items to get translation string for plurals.
2. Selecting sub-string for plurals happens at the same stage as formatting a message. These cannot be separated.
To unsubscribe from this group and stop receiving emails from it, send an email to php...@googlegroups.com.

Oscar Otero

unread,
Apr 2, 2019, 1:23:20 PM4/2/19
to php...@googlegroups.com
Why not?

- The translator returns the translations for each id
- The formatter insert the real data in the message

Let’s say we use gettext:

```
$n = count($likes);

$title = $translator->translate(‘Welcome to :sitename’);
$message = $translator->translatePlural(‘You have %s likes’, $n);

echo $formatter->format($title, [‘:sitename’ => ‘PHP Fig’]);
echo $formatter->format($message, [$n]);
```


Now using MessageFormatter from the intl extension:

```
$message = $translator->translate(’Today is {0, date, full}’);

echo $formatter->format($message, [$datetime]);
```

Here, the translator returns the message that can be in any language (english, spanish, italian, etc) and the formatter is what execute MessageFormatter->format() internally.

Of course, if you only need english, you may need only the formatter, but if your project is multilanguage, you need a way to get the unformatted messages in each language (intl extension does not provide this feature, AFAIK)





To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.

To post to this group, send email to php...@googlegroups.com.

Alexander Makarov

unread,
Apr 2, 2019, 1:49:22 PM4/2/19
to PHP Framework Interoperability Group
Because plurals aren't done this way in intl and because you are getting gettext-specific thing (separate parameter for plural messages) in the interface.

1. For intl passing $translator->translatePlural(‘You have %s likes’, $n) makes no sense. It basically does not care about $n when obtaining a message.
2. intl has its own message source, it makes little sense storing messages in gettext's PO/MO: https://www.php.net/manual/en/class.resourcebundle.php
3. Like in gettext there could be message storages that may obtain messages differently for plurals / ordinals etc. These may depend on parameters different than a number of items.
To unsubscribe from this group and stop receiving emails from it, send an email to php...@googlegroups.com.

To post to this group, send email to php...@googlegroups.com.

Oscar Otero

unread,
Apr 2, 2019, 2:14:22 PM4/2/19
to php...@googlegroups.com
Seems like I’m trying to create a spec based in gettext, but not, I want a spect that can fit in all use cases.

1. For intl passing $translator->translatePlural(‘You have %s likes’, $n) makes no sense. It basically does not care about $n when obtaining a message.
Ok, you don’t have to use it. Currently there are many libraries not supporting plurals, that is why it’s a different function. In fact, in some languages the messages can change not only for plurals but also genres (male and female) and this is something not included in most libraries.

2. intl has its own message source, it makes little sense storing messages in gettext's PO/MO: https://www.php.net/manual/en/class.resourcebundle.php
Ok, I didn’t know it (I’m not very familiar with intl extension). But I don’t see why $translator->translate() cannot be implemented with ResourceBundle class internally. Even in intl the formatters and message storing are in different classes.

3. Like in gettext there could be message storages that may obtain messages differently for plurals / ordinals etc. These may depend on parameters different than a number of items.
Can you provide an example of this?

If you prefer to create a different proposal to illustrate your vision, it would be clearer to me.

To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.

To post to this group, send email to php...@googlegroups.com.

Alexander Makarov

unread,
Apr 3, 2019, 7:12:55 PM4/3/19
to PHP Framework Interoperability Group
I've made some more research on common and not so common formats.
There's a good overview at https://docs.transifex.com/formats/introduction and you're right that between all these formats there two main variations:

1. The ones that rely on formatting for plurals and many other things.
2. The ones that have special handling for plurals at message format level.

There are no other parameter-based variations. So now I agree that both methods are part of the interface.

Another thing that is in more than a single format has is support for arrays of strings i.e. you may obtain an array by message id, not a string (see Android XML).

Alexander Makarov

unread,
Apr 3, 2019, 7:17:00 PM4/3/19
to PHP Framework Interoperability Group
How about moving https://gist.github.com/oscarotero/33e3af8741045c2a1a5a89310571cbdb to WG github repository? I'm interested in proposing changes and putting together a strong meta.

Oscar Otero

unread,
Apr 4, 2019, 3:33:15 PM4/4/19
to php...@googlegroups.com
Great that we agree. About Android XML, I’d rather return always strings, so a workaround could be specify the array index in the id. For example $translator->translate(‘minutes_count[0]’) or something similar, depending of the implementation. We'll see.

How about moving https://gist.github.com/oscarotero/33e3af8741045c2a1a5a89310571cbdb to WG github repository? I'm interested in proposing changes and putting together a strong meta.

That would great. Feel free to copy the gist to a new github repository and work on changes. But, AFAIK, the policy of fig is work in repositories in the php-fig organization.


To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.

To post to this group, send email to php...@googlegroups.com.

Larry Garfield

unread,
Apr 4, 2019, 5:01:18 PM4/4/19
to 'Alexander Makarov' via PHP Framework Interoperability Group
On Thu, Apr 4, 2019, at 2:33 PM, Oscar Otero wrote:
> Great that we agree. About Android XML, I’d rather return always
> strings, so a workaround could be specify the array index in the id.
> For example $translator->translate(‘minutes_count[0]’) or something
> similar, depending of the implementation. We'll see.
>
> > How about moving https://gist.github.com/oscarotero/33e3af8741045c2a1a5a89310571cbdb to WG github repository? I'm interested in proposing changes and putting together a strong meta.
>
> That would great. Feel free to copy the gist to a new github repository
> and work on changes. But, AFAIK, the policy of fig is work in
> repositories in the php-fig organization.

The preference if there is enough interest is to form a working group early rather than later and then work out of a repo in the FIG organization directly.

It definitely sounds like there's potential here for a good PSR. Localization is definitely something that is hard for stand-alone libraries to do in a way that's going to play nice with different frameworks, but at the same time many/most won't have much text to begin with aside from exceptions. Still, potentially worthwhile.

If you wanted to form a working group and make a proposal to the Core Committee, what I would recommend is:

* Find an editor; it could be one of the people in this thread or someone who has a crapton of experience in dealing with translation systems (or preferably both).
* Find 3-5 additional people with experience in this area. Bonus points if they maintain a translation library, or the translation subsystem of some major framework/application. Extra bonus points if they speak some language other than English/French/Spanish natively (since things like plurals get really weird in many non-Romance languages that a native EFS speaker wouldn't even think about).
* Put forth that list of people here with a baseline proposal (NOT a spec, just a goal and scope for a spec) and solicit interest from potential Sponsors. You probably won't have much trouble getting one of the CC to Sponsor this one.
* The Sponsor can coordinate calling a vote to approve the WG.

To the topic itself, I've seen two models of translation: One inlines English in the code and uses that as the key to translate to other languages; the other inlines some arbitrary lookup key, and then that key is used to look up every language, English included. I don't know which is more common/better, just that I've seen both. I would also suggest that the developer experience of a library author writing a library should be paramount, as if it's not easy for them to do, they won't do it, and it will be very hard to retrofit later.

Though it pains me to say, it's probably worth reaching out to someone in the Drupal translation team to participate. They have a *lot* of really deep work in translation nuance that most wouldn't, and would be very useful here in defining the problem scope if nothing else.

--Larry Garfield

Nikolaos Dimopoulos

unread,
Apr 4, 2019, 6:59:52 PM4/4/19
to PHP Framework Interoperability Group
I would be interested in helping out with this if a WG was to be formed. I can also reach out to Crowdin who I have a really good relationship with to help us with localization nuances.

/vr Nikos Dimopoulos

Paulo Vitor Bettini de Albuqerque Lima

unread,
Apr 5, 2019, 1:22:34 AM4/5/19
to php...@googlegroups.com
I can help, I work in a company that is a Pimcore partner. And translations are a core feature of Pimcore. Also, I am portugues native speaker, and I speak English, Spanish, Italian and Dutch (learning). 

--
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.
To post to this group, send email to php...@googlegroups.com.

Alexander Makarov

unread,
Apr 5, 2019, 2:23:39 PM4/5/19
to PHP Framework Interoperability Group
Done: https://github.com/psr-i18n-wg/message-translation/blob/master/message-translation-draft.md

About experience Larry mentioned:

1. I'm maintaining Yii i18n/L10N layer.
2. Had experience with Android.
3. Native Russian speaker so know about complicated plurals.

Mikko Rantalainen

unread,
Apr 5, 2019, 2:39:09 PM4/5/19
to php...@googlegroups.com
On Fri, 5 Apr 2019, 21:23 'Alexander Makarov' via PHP Framework Interoperability Group, <php...@googlegroups.com> wrote:
Done: https://github.com/psr-i18n-wg/message-translation/blob/master/message-translation-draft.md

About experience Larry mentioned:

1. I'm maintaining Yii i18n/L10N layer.
2. Had experience with Android.
3. Native Russian speaker so know about complicated plurals.

I think it's very good to have people with experience with complex plurals.

About the suggested API: how about using plurar form as the identifier for the string to translate? Rationale for this style is that it's more clear to the programmer that the exact count is not yet known.

One thing that I have learned is that some languages define plurar forms for floats, too. If I remember correctly the use plurar for non-integer numbers. I guess some other language has different rules for the same thing. Should this kind of features be supported by this API? (I know that gettext doesn't support non-integers.)

Another important thing is to strongly suggest that translations are defined one sentence at a time (instead of couple of words or fragments of a single sentence). This is important for languages that have strong word other because different parts of the sentence may need to be changed compared to some other language.

-- 
Mikko

Reply all
Reply to author
Forward
0 new messages