We've gotten a few requests to have the 1.46.0
release. This will be the last release before porting the
project over to GitHub (baring any any push back).
One feature I would like to add before the 1.46.0
release is automated plural form generation for the new message
format filter. I may need to make some other adjustments to the
filter as I get feedback from our team.
Are there are any other high priority bugs that should to be fixed?
Jim
--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/0288ace8-22b6-6f9f-76ab-06cdfdb54a02%40gmail.com.
--
Hi Mihai! Very timely post. I am designing the
plural forms auto expansion now. I think I have a third
alternative between producing more TU's and adding
segments.
I plan to have the plural expansion a filter option. When enabled the filter will detect the plurals in the source, replace the source text if needed, then extract the expanded message string. From the perspective of downstream steps the source content came from the original file so everything proceeds normally (if with the standard Okapi IPipeline).
I don't like the idea of adding extra segments to a TextUnit. This breaks with the "standard" of filters create TextUnits and the Segmenter creates segments. This has implications for split/merge in the workbench and other operations like merge.
In summary the current design will have the ICU message filter will produce Group and TextUnit events.
Not sure of the full implications for Okapi and Xliff 2.2 support of plurals - but I'm sure we can convert back and forth with enough metadata.
Please let me know asap of any questions! Any sample code our other info would be appreciated.
BTW: Do you have any thoughts on handling Gender? I know ICU doesn't have built-in support, but maybe the CLDR has info on gender and we could do some type of "expansion"?
Also, if you have any code or ideas for ICU message string validation I would be interested to add it to the filter. Diagnostics are going to be important to detect badly internationalized strings.
cheers,
Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/CAK69zb%3DntUt4Vkf_J_e%2B3muzfwFBf%2B%2B8xDx0u5uO0RpoPtZPEw%40mail.gmail.com.
BTW: I'm looking for the best protobuf file UI viewer. Give it a pb file and navigate it visually. Found a few but most are ugly and old. I thought I remembered seeing some kind of editor/debugger tool a while back.
A workaround would be to use standard JSON for
design and debugging (lot's of tools for JSON)
Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/CAK69zb%3DntUt4Vkf_J_e%2B3muzfwFBf%2B%2B8xDx0u5uO0RpoPtZPEw%40mail.gmail.com.
Mihai - I'm coding up the plural expansion feature now. But I've run into an issue with strings with multiple embedded PLURAL/SELECTORDINAL groups. Parser does a good job of pulling these out into an AST. But wondering if you have a good algorithm you could share that would adjust the string with the new target plurals (aks plural expansion)?
branch is:
https://bitbucket.org/okapiframework/okapi/branch/plural_expansion
@Test
public void testWithEmbeddedPluralMessage() throws Exception {
String message = "{0, plural,one {You have {1, plural, one {# apple} other {# apples}}} other {You and # others have {1, plural, one {# apple} other {# apples}}}}";
try (MessageFormatParser p = new MessageFormatParser()) {
p.parse(message);
assertEquals(message, p.toString());
}
}
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/CAK69zb%3DntUt4Vkf_J_e%2B3muzfwFBf%2B%2B8xDx0u5uO0RpoPtZPEw%40mail.gmail.com.
Excellent! Now that I've spent some time working with different
message strings I can see this will be really helpful. I'll try to
integrate this into the current parser and AST if possible.
Jim
Ah, I wish I had seen MessagePatternUtil before. I'll refactor
our Parser to use it vs my custom Token's. That will make it
trivial to incorporate your code.
Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/5d5eaff0-eb9d-4fe9-93d7-f612df0fb560%40gmail.com.
I'm going to create a PR for my Message Filter changes. I'd really like to incorporate Mihai's code and refactor to use MessagePatternUtil - but I can do that as a separate PR.
The only other feature I would like to get out is the EnumSet for
Property - that will be another PR.
There are a few OpenXML bugs on the top of the issue list. If we can get some of those in 1.46.0 that would be great, but not essential.
I think we can release at the end of the month - is that OK for everybody?
Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/CAGRYq4ha02ztOeM5yT3Bx5aPEkwr8zFrcXUu6zNrBEw_pkL6-Q%40mail.gmail.com.