Issue 8 in pseudolocalization-tool: Tool mangles more than desired with ICU plural patterns

11 views
Skip to first unread message

pseudolocal...@googlecode.com

unread,
Jul 30, 2014, 12:28:52 AM7/30/14
to pseudolocal...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 8 by trejkaz: Tool mangles more than desired with ICU plural
patterns
http://code.google.com/p/pseudolocalization-tool/issues/detail?id=8

Suppose you have a string like this:

duplicatesRemovedFragment={0,plural,one{{0} duplicate removed}other{{0}
duplicates removed}}

In version 0.2 it gets mangled to this:

duplicatesRemovedFragment={0,plural,one{{0} \u202Eduplicate\u202C
\u202Eremoved\u202C}\u202Eother\u202C{{0} \u202Eduplicates\u202C
\u202Eremoved\u202C}}

Oddly, the "one" keyword remains untouched (suggesting that the tool does
somehow understand that it's a special keyword) yet the "other" keyword has
been mangled, so at runtime, you get this error:

Missing 'other' keyword in plural pattern in "{0,plural,one{{0} du ..."


--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

pseudolocal...@googlecode.com

unread,
Aug 6, 2014, 10:29:42 PM8/6/14
to pseudolocal...@googlegroups.com
Updates:
Status: NeedInfo
Owner: j...@jaet.org

Comment #1 on issue 8 by j...@jaet.org: Tool mangles more than desired with
So is this using the Pseudolocalizer command-line tool? If so, what
arguments are you passing? I assume this is in a .properties file?

Without the details to reproduce it, my guess would be in the hack for
parsing MessageFormat patterns is insufficient at
http://code.google.com/p/pseudolocalization-tool/source/browse/trunk/java/com/google/i18n/pseudolocalization/format/JavaProperties.java#142

pseudolocal...@googlecode.com

unread,
Aug 7, 2014, 6:36:40 AM8/7/14
to pseudolocal...@googlegroups.com

Comment #2 on issue 8 by trejkaz: Tool mangles more than desired with ICU
plural patterns
http://code.google.com/p/pseudolocalization-tool/issues/detail?id=8

Yeah, We're using .properties files, and yeah, that regex seems like it
would stop at the first }, which explains why it modified the next word
after it. It should probably permit matched pairs of {} but then there is
the other issue of the stuff within the innermost {} wanting to be mangled,
which seems like it could get rather complex.

I ended up making my own tool anyway, due to this and other issues, and
ended up using ANTLR to parse the formats, because there were a whole host
of weird edge cases which I found hard to do with regexes.

pseudolocal...@googlecode.com

unread,
Aug 7, 2014, 11:28:12 AM8/7/14
to pseudolocal...@googlegroups.com
Updates:
Status: Accepted
Labels: Component-Logic Usability

Comment #3 on issue 8 by j...@jaet.org: Tool mangles more than desired with
It's sad that it is easier to write your own tool instead of patching this
one.

This example is a bit tricky, because even with proper parsing, normally
everything in a placeholder would not be localizable at all. However, in
the case of ICU4J plural/choice formats, localizable text occurs in within
the placeholder itself. So, you really have to know that this is an ICU4J
plural/choice format in order to make that text localizable. I suppose we
can allow for generic MessageFormat-type messages and if it appears to
match plural/choice treat it specially - instead of a Placeholder, generate
a VariantFragment tree instead.
Reply all
Reply to author
Forward
0 new messages