Integration Path for Unicode Inflection in MF2

94 views
Skip to first unread message

baha bouali

unread,
Mar 25, 2025, 4:51:01 PMMar 25
to Message Format Working Group

Hi message-format-wg,

I'm currently working on this issue #87 (Integrate Unicode Inflection into MessageFormatter 2.0) and would appreciate some clarification regarding the integration process. George Rhoten has already provided valuable insights that clarified many aspects of this matter, but the vision is still unclear on how Integrating inflection into MF2 should be approached. 


  • Should the integration process primarily involve developing inflection rules directly within the MF2 itself, where we’d design and implement grammatical logic from the ground up (with necessary adjustments to existing toolchains)?  


  • Or would the effort focus on adapting MF2 to interface with an external API or library(e.g., Morphun) and hence delegating core inflection logic externally?  


  • Alternatively, is it geared towards a hybrid model—where certain rules are natively implemented, while others are resolved through external calls?  


I recognize this integration will be a learning process for everyone, but hearing your perspectives would help tremendously. 


Thank you for your time and consideration.



Tim Chevalier

unread,
Mar 25, 2025, 5:05:33 PMMar 25
to message-...@chromium.org

Hi, Baha --

In my mind, the path to implementing inflection in MF2 would be to define a set of custom functions implementing this functionality. The implementations of the custom functions could call out to an external API or library like you suggest; if those libraries are sufficient and available for use with different programming languages, I see no need to rewrite the code for MF2 purposes.

Note that custom functions can't be written in the MF2 syntax itself, but rather have to be written in the underlying implementation language, such as JavaScript and C++.

The way that MF2 factors out most functionality to separate functions can be a little tricky to understand at first, so if you need pointers, please ask for them!

Cheers,

Tim

--
You received this message because you are subscribed to the Google Groups "Message Format Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to message-format...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/message-format-wg/2fd8bc81-01a5-48c2-8fa7-11d99bf79b25n%40chromium.org.

Addison Phillips

unread,
Mar 25, 2025, 5:27:54 PMMar 25
to message-...@chromium.org

Hi Baha,

(Thanks, Tim, for your response)

The most likely form, as Tim suggests, for inflection support to take would be as selector or formatter functions (or both). It's unclear to me whether these would be in their own namespace, in the Unicode namespace (`u:`), or as default functions.

The goal would be for these functions to use the existing MF2 syntax to express inflection opportunities in a consistent manner. We would only add features to MF2 if it were absolutely necessary.

The MF Working Group uses design documents to discuss proposed functionality. I would suggest that we begin with a design document. I'd be happy to have a phone/Slack call with you and others about the details of how to proceed.

regards,

Addison

-- 
Addison Phillips
Chair (W3C I18N WG, MessageFormat WG)

Internationalization is not a feature.
It is an architecture.
Message has been deleted
Message has been deleted
Message has been deleted

Addison Phillips

unread,
Mar 25, 2025, 7:54:36 PMMar 25
to baha bouali, Message Format Working Group

Hi Baha,

Such a call should be with whomever (I assume you would be included, since you initiated the thread) from the inflection WG is interested in working on MF integration, plus one or more representatives from MFWG.

thanks,

Addison

On 3/25/2025 3:43 PM, baha bouali wrote:

[Sorry for any duplicate messages, my reply-all messages are being deleted]

Thank you Tim for your valuable insights. 



To ensure I align with MF2's existing functionality, could you kindly share any relevant documents or code examples demonstrating how MF2 typically implements and calls standalone functions? I believe the inflection rules would follow similar patterns.



Thank you Addison for responding, I truly appreciate you proposing a discussion to further discuss different ideas. To confirm, is this call intended to include my participation? If so, I would be delighted to join, contribute to the conversation, and learn from you.



I'm genuinely excited to move this forward under your guidance and appreciate the time you're investing in this effort.

Mark Davis Ⓤ

unread,
Mar 25, 2025, 8:19:40 PMMar 25
to Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
I think any approach should look at breaking up into milestones, each time advancing the ball. The syntax below is illustrative, not a concrete proposal. The 
  1. Add options to request that a given placeholder be transformed by given static grammatical features.
    1. {$city :inflect case=dative}
    2. {$size :inflect case=dative noun-class=masculine-animate}
  2. Add functions to extract grammatical features (typically from input parameters)
    1. .input $restaurant-nc = {$restaurant :noun-class}
    2. Show use of this in selections.
  3. Extend #1 to work with variables
    1. There is a{$size :inflect noun-class=$restaurant-nc} {$restaurant} in {$city :inflect case=dative}
  4. Work out how to deal with grammatical features that govern whole noun phrases, like definiteness.
  5. ...

Ariel Gutman

unread,
Mar 26, 2025, 5:28:05 AMMar 26
to Mark Davis Ⓤ, Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
Hi people,

I'm not replying here often, but this deserves my comment:

To make inflection more streamlined, I would recommend the integration of dependency grammar labels within the format specification, instead of relying (solely) on variables. 
We have done something similar in Google, and I also made such a proposal for the Abstract Wikipedia project.

So the third example in Mark's email would look something like this:

There is a {nummod:$size} {root:$restaurant} in {$city :inflect case=dative}

where it is understood that "nummod" labels a numeric modifier of the root noun (the details of what this means in practice has to be defined per language, for instance it could mean agreement in gender/number/case etc.).

This has several advantages over piping variables in several slots:
1) It is in general less verbose.
2) It is directly related to the linguistic analysis of the sentence, so there is less chance of forgetting to pipe one variable at the needed place.
3) Since the operational definition of each dependency label is given elsewhere (per language) it alleviates the author of the template from remembering all aspects of a specific agreement pattern.
4) In some cases, it allows reusing the same templates across different languages. E.g. just the part {nummod:$size} {root:$restaurant} could be used for any language where the numeral modifier precedes the head noun, even if different agreement rules are used. 

I hope this helps,
Ariel



Eemeli Aro

unread,
Mar 26, 2025, 9:33:53 AMMar 26
to Ariel Gutman, Mark Davis Ⓤ, Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
First of all, it would be useful to get a set of example messages for which inflection is necessary or beneficial, preferably without any syntax representation, only the input data and the output string, along with some classification of how common a message type might be. This would be a significant help not only in evaluating alternatives, but also would provide the foundation for a test suite.

Regarding the space within which we need to find a solution: During MF2 formatting, each placeholder is formatted individually, with only read-only access to the formatting context. This means that an annotation on one placeholder cannot have an effect on the formatting of another placeholder.

However, we are not limited to formatting directly to a string, and this sounds like a space where it becomes interesting to consider formatting the message first to something like a sequence of parts, and then apply changes to that sequence before producing the final formatted string (or other representation).

Here's an example of what's possible currently with the JS implementation:

import { MessageFormat } from 'messageformat';
const src = 'There is a {$size :string u:id=size} {$place :string u:id=place} in {#inflect name=$city case=dative/}.';
const mf = new MessageFormat('en', src, { bidiIsolation: 'none' });
mf.formatToParts({ size: 'small', place: 'restaurant', city: 'Prague' });

resulting in this value:
[
  { type: 'text', value: 'There is a ' },
  { type: 'string', locale: 'en', value: 'small', id: 'size' },
  { type: 'text', value: ' ' },
  { type: 'string', locale: 'en', value: 'restaurant', id: 'place' },
  { type: 'text', value: ' in ' },
  {
    type: 'markup',
    kind: 'standalone',
    name: 'inflect',
    options: { name: 'Prague', case: 'dative' }
  },
  { type: 'text', value: '.' }
]

This is making use of the following MF2 features:
  1. Formatting to a non-string target, so that part-specific metadata can be retained in the results.
  2. "Markup" formatting, to show how something like inflection could be applied on the formatted results, rather than as a part of the formatting.
  3. The special u:id option, the value of which is retained in the resulting formatted part (independently of the function on the placeholder).
One possible improvement that I see for the above is the limitation of a single "u:id" field having the ability of effectively passing through formatting. We could consider either adding another similar field ("u:class" for the CSS connection?), or a whole category of e.g. i: prefixed options that allow multiple values to be passed through.

We also lack in MF2 a way to communicate that the message as a whole ought to be formatted as a sentence. This matters e.g. for the Finnish translation, which reorders the parts as "Prahassa on pieni ravintola." if the dative form is available, or as the clumsier "Kaupungissa Praha on pieni ravintola." if it isn't; "kaupungissa" means "in the city", and it's only capitalized because of the sentence start.


Nebojša Ćirić Ꙉ

unread,
Mar 26, 2025, 12:58:34 PMMar 26
to Eemeli Aro, Ariel Gutman, Mark Davis Ⓤ, Addison Phillips, message-...@chromium.org, inflection-team

Mark Davis Ⓤ

unread,
Mar 26, 2025, 4:01:43 PMMar 26
to Ariel Gutman, Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
This is an excellent point. 

> There is a {nummod:$size} {root:$restaurant} in {$city :inflect case=dative}

I think this is along the lines of what had popped up in earlier message format discussions, which was something like:

There is a{$size :agree-with source=xx} {$restaurant u:id:xx} in {$city :inflect case=dative}

That is, rather than specify how agreement is to take place, just specify that $size (eg "large", "small", ...) is to be inflected to agree with some other field. (Again, exact syntax may vary.)

Mark Davis Ⓤ

unread,
Mar 26, 2025, 4:06:47 PMMar 26
to Eemeli Aro, Ariel Gutman, Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
Regarding the space within which we need to find a solution: During MF2 formatting, each placeholder is formatted individually, with only read-only access to the formatting context. This means that an annotation on one placeholder cannot have an effect on the formatting of another placeholder.

I think that is a current restriction, but not a necessary restriction. That is, we had talked before about having mechanisms that would allow agreement between two different placeholders, where one affects another. George also makes a good point in that we have to be careful to not allow circular references (just unidirectional references with no cycles).

And as George says, reordering is outside the scope of message format.

On Wed, Mar 26, 2025 at 6:33 AM Eemeli Aro <eem...@gmail.com> wrote:

Mihai Nita

unread,
Mar 27, 2025, 3:28:14 PMMar 27
to Mark Davis Ⓤ, Eemeli Aro, Ariel Gutman, Addison Phillips, message-...@chromium.org, Nebojša Ćirić Ꙉ
Another way to implement such an agreement is by chaining custom functions.

.local gi = ${restaurant :custom:getGramarInfo from:$restaurant}
There is a{$size :gramatical_info=$gi} {$restaurant} in {$city :inflect case=dative}

I am not saying it is a better option than pros-processing the parts.

But it is an option with the existing mechanisms (custom functions).

Post-processing the formatted parts is outside the area that the mf2 spec covers
(as in: the spec says you return parts, what you do with them it's on you).

Mihai

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages