Followup regarding ICU4C MessageFormat 2.0: Function Composition

11 views
Skip to first unread message

Tim Chevalier

unread,
Jul 1, 2025, 4:27:54 PMJul 1
to icu-d...@unicode.org, icu-team >> ICU team

Hello,

The design doc for MF2 function composition was approved in the 2025-06-26 TC meeting:

https://docs.google.com/document/d/1nIYDyaTqB6nChhvoSVxBkRfBAiPchlN4anvaAma9WRc/edit?tab=t.0#heading=h.bglkyqcugfir

Another design doc,  "ICU4C: MF2 bidirectional isolation strategies" was approved in a previous TC meeting (a few months back):

https://docs.google.com/document/d/1hbcSd0dfsQyIYK45CZmgM9ybXzNdHSfIPTVp-J6YL58/edit?tab=t.0

I prepared a pull request that implements both designs: https://github.com/unicode-org/icu/pull/3536

tl;dr:

Since the design was just approved a few days ago and nothing depends on this API yet, I hope that the following minor changes, which are helpful in following the MessageFormat spec for bidi isolation, can be accepted. If there are no objections by July 14, I'll assume that the changes are acceptable.

Details:

I chose to address both the function composition changes and the bidi changes in a single PR because the bidi changes rely on adding metadata to resolved values, which is best done with the new definition of resolved values provided by the function composition design doc.

In the process of implementing the designs, I forgot to reflect the following changes back into the design docs. To motivate the changes, it's best to look at the spec ( https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#handling-bidirectional-text ), but in short: Resolved values (FunctionValues) have both an "input directionality" (as annotated in the text of the message) and an "output directionality" (as determined by the particular function handler that computed the value). Both need to be accessible by the message formatter in order to perform bidi isolation for the entire message.

The three changes are:

1. Two new enum types are added to the message2 namespace, UMFDirectionality and UMFBidiOption:

   /**
     * Used to represent the directionality of a message, where
     * the AUTO setting has been resolved based on locale.
     *
     * @internal ICU 78 technology preview
     * @deprecated This API is for technology preview only.
     */
     typedef enum UMFDirectionality {
         /**
          * Denotes a left-to-right message.
          *
          * @internal ICU 78 technology preview
          * @deprecated This API is for technology preview only.
          */
         U_MF_DIRECTIONALITY_LTR = 0,
         /**
          * Denotes a right-to-left message.
          *
          * @internal ICU 78 technology preview
          * @deprecated This API is for technology preview only.
          */
         U_MF_DIRECTIONALITY_RTL,
         /**
          * Denotes a message with unknown directionality.
          *
          * @internal ICU 78 technology preview
          * @deprecated This API is for technology preview only.
          */
         U_MF_DIRECTIONALITY_UNKNOWN
     } UMFDirectionality;

This is based on the MessageValue example in the spec, which includes a directionality() method:

directionality(): 'LTR' | 'RTL' | 'unknown'

     /**
         * Used to denote the directionality of the input to a function.
         *
         * See https://github.com/unicode-org/message-format-wg/blob/main/spec/u-namespace.md#udir
         *
         * @internal ICU 78 technology preview
         * @deprecated This API is for technology preview only.
         */
        typedef enum UMFBidiOption {
            /**
             * Left-to-right directionality.
             *
             * @internal ICU 78 technology preview
             * @deprecated This API is for technology preview only.
             */
            U_MF_BIDI_OPTION_LTR = 0,
            /**
             * Right-to-left directionality.
             *
             * @internal ICU 78 technology preview
             * @deprecated This API is for technology preview only.
             */
            U_MF_BIDI_OPTION_RTL,
            /**
             * Directionality determined from expression contents.
             *
             * @internal ICU 78 technology preview
             * @deprecated This API is for technology preview only.
             */
            U_MF_BIDI_OPTION_AUTO,
            /**
             * Directionality inherited from the message without
             * requiring isolation of the expression value.
             * (Default when no u:dir option is present.)
             *
             * @internal ICU 78 technology preview
             * @deprecated This API is for technology preview only.
             */
            U_MF_BIDI_OPTION_INHERIT
        } UMFBidiOption;

This is based on the allowable values for the u:dir option as specified in https://github.com/unicode-org/message-format-wg/blob/main/spec/u-namespace.md#udir

Note that both these types are different from the UMFBidiContext type in the design doc (which is still used).

2. The return type of the getDirection() method on FunctionValue is changed from UBiDiDirection to UMFDirectionality.

3. A new getDirectionAnnotation() method is added to FunctionValue:

            /**
             * Returns the directionality that this value was annotated with.
             *
             * This is distinct from the directionality of the formatted text.
             * See the description of the "Default Bidi Strategy",
             * https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#handling-bidirectional-text
             * for further context.
             *
             * @return A UMFBidiOption indicating the directionality that
             *         this value was annotated with.
             *
             * @internal ICU 78 technology preview
             * @deprecated This API is for technology preview only.
             */
            virtual UMFBidiOption getDirectionAnnotation() const;

The algorithm in the spec takes into account the directionality of the message as a whole, the directionality of each individual resolved value (FunctionValue in the implementation), and the directionality that each resolved value's expression was annotated with using the u:dir option (see https://github.com/unicode-org/message-format-wg/blob/main/spec/u-namespace.md ). Thus, the latter two pieces of information need to be accessible using methods on FunctionValue. The design doc already specifies a getDirection() method that denotes the directionality of the resolved value, but the getDirectionAnnotation() method, which returns the directionality from the u:dir option, was missing.

The two direction-related properties have different allowable values, and thus two different enum types are required to represent their possible values.

Note: I'll be on leave from July 2-July 14 and won't see replies until I get back.

Thanks,

Tim



Reply all
Reply to author
Forward
0 new messages