Question about string localization

368 views
Skip to first unread message

Stephen Cheng

unread,
May 28, 2013, 11:30:33 AM5/28/13
to chromi...@chromium.org
I wonder how the id for each string is generated inside the translation files *.xtb. For example:
 
<translation id="6676384891291319759">访问互联网</translation>
 
The id looks like some kind of hash to me. I can't find similar ids in the original grd file. So how do we map the English strings to the translated version?
 
The reason I am asking this is: I manually modified some strings in chromium_strings.grd and I also manually modified the translated version in the corresponding xtb file. After that, I build chromium again. Then I am surprised to find out that the modified English string no longer gets translated. If the translation is based on the hash of the English string, I will certainly break it. Is there any other tool I am not aware of which manages the language files more properly?

Jói Sigurðsson

unread,
May 28, 2013, 11:32:36 AM5/28/13
to flas...@gmail.com, Chromium-dev
The ID is a hash of the "presentable" contents of the English string.
To explain, if you have a message like <message
name="ID_SOMETHING">Hello <ph
name="USERNAME">%s<ex>Stephen</ex></ph></message>, then the
presentable contents are "Hello USERNAME".

You can use the [ grit xmb ] tool to output a new .xmb file after you
modify chromium_strings.grd. In this file you will find your modified
message and its hash ID.

Cheers,
Jói
> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
>
>
>

Stephen Cheng

unread,
May 28, 2013, 2:21:51 PM5/28/13
to Jói_Siguresson, Chromium-dev
Thanks for you reply, Jói. The hash-based mapping system makes modification a lot more difficult. I wanted to change the Chromium brand name to some other name I choose. Do a search & replace in both the grd files and xtb files obviously wouldn't work since the hash value has changed. Any good idea how to get this done relatively easily?

The grit you mentioned only generates xmb files. How to regenerate those .xtb files?

Stephen.

======= You wrote on 2013-05-28 £o=======

>The ID is a hash of the "presentable" contents of the English string.
>To explain, if you have a message like <message
>name="ID_SOMETHING">Hello <ph
>name="USERNAME">%s<ex>Stephen</ex></ph></message>, then the
>presentable contents are "Hello USERNAME".
>
>You can use the [ grit xmb ] tool to output a new .xmb file after you
>modify chromium_strings.grd. In this file you will find your modified
>message and its hash ID.
>
>Cheers,
>Jói
>
>
>On Tue, May 28, 2013 at 4:30 PM, Stephen Cheng <flas...@gmail.com> wrote:
>> --
>> --
>> Chromium Developers mailing list: chromi...@chromium.org
>> View archives, change email options, or unsubscribe:
>> http://groups.google.com/a/chromium.org/group/chromium-dev
>>
>>
>>

= = = = = = = = = = = = = = = = = = = =

Jói Sigurðsson

unread,
May 28, 2013, 3:01:25 PM5/28/13
to Stephen Cheng, Chromium-dev
You can find the hash for your new message in the file generated by [
grit xmb ], then search for that hash value in the existing .xtb
files.

AFAIK, we don't have public tools available to generate .xtb files,
but the format is pretty self explanatory. See
tools/grit/grit/xmb_writer.py and tools/grit/grit/xtb_reader.py (IIRC)
for further details of the formats and to find the hash implementation
(but you shouldn't need to if you just want to modify one string).

Cheers,
Jói

Stephen Cheng

unread,
May 28, 2013, 3:22:34 PM5/28/13
to Jói_Siguresson, Chromium-dev
Jói, Thanks for your clarification. It'd not be huge hassle to generate the xtb files via custom code once I have the hash. But I just wanted to check to avoid reinventing the wheel.
After digging a little bit into the source code of grit, I found the first half of md5 hash is used for the string fingerprint. It is probably easier for me to generate the hash directly instead of relying on the xmb files.

Best Regards
Stephen Cheng

======= You wrote on 2013-05-28 £o=======

>You can find the hash for your new message in the file generated by [
>grit xmb ], then search for that hash value in the existing .xtb
>files.
>
>AFAIK, we don't have public tools available to generate .xtb files,
>but the format is pretty self explanatory. See
>tools/grit/grit/xmb_writer.py and tools/grit/grit/xtb_reader.py (IIRC)
>for further details of the formats and to find the hash implementation
>(but you shouldn't need to if you just want to modify one string).
>

Stephen Cheng

unread,
May 28, 2013, 3:37:23 PM5/28/13
to Jói_Siguresson, Chromium-dev
However, it is hard for me to understand what is the reason not to use the name parameter of the message tags to map the strings, Such as IDS_PRODUCT_NAME. It's unique, human-readable and excellent for bi-directional lookup. Or we can use some other automatically-assigned unique numerical ids. Hashing isn't really the best option in this scenario.

Jói Sigurðsson

unread,
May 28, 2013, 3:54:47 PM5/28/13
to Stephen Cheng, Chromium-dev
Hashing avoids duplication of translation work (while still allowing
you to make identical messages that get different translations, using
the meaning= attribute). It is also very useful as an automatic
identifier e.g. when messages are extracted from resources that do not
have the name parameter. An example of this is when an HTML file that
a .grd points to is broken down into translateable messages.

If you'd like to dig into the details of GRIT's design, feel free to
peruse https://code.google.com/p/grit-i18n/wiki/DesignOverview and
other documents on the project's site. There is also a GRIT-specific
discussion list there which is probably a more appropriate place if
you wish to continue a discussion of GRIT's design choices.

Cheers,
Jói
Reply all
Reply to author
Forward
0 new messages