Unescaped closing angle bracket in ITS-excluded target content

27 views
Skip to first unread message

Manuel Souto Pico

unread,
Mar 25, 2024, 7:09:09 AM3/25/24
to okapi-users
Dear all,

I would like to report a bug I have found while using the XML filter together with ITS filter properties.

My content has some phrases and terms in angle brackets, e.g. <foo>. These expressions are encoded as &lt;foo&gt; in the source XML file. I know this is probably a very bad practice, but that's what the authors do.

Basically the problem is that the closing character becomes unescaped in nodes excluded by ITS's locale filter, which breaks the application consuming the translation.

In other words, source file has:

            <label key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6" its:localeFilterList="en-PH" its:localeFilterType="exclude">
                <text>&lt;I don’t know&gt;</text>
            </label>
            <label key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6" its:localeFilterList="en-PH" its:localeFilterType="include">
                <text>&lt;I don’t know&gt;</text>
            </label>  
           
Translation into en-PH produces:

            <label its:localeFilterList="en-PH" its:localeFilterType="exclude" key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6">
                <text>&lt;I don’t know></text>
            </label>
            <label its:localeFilterList="en-PH" its:localeFilterType="include" key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6">
                <text>&lt;Je ne sais pas&gt;</text>
            </label>  
           
Expected result was:

            <label its:localeFilterList="en-PH" its:localeFilterType="exclude" key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6">
                <text>&lt;I don’t know&gt;</text>
            </label>
            <label its:localeFilterList="en-PH" its:localeFilterType="include" key="ST801KAZ_ST801Q0104KAZ_54c6c210509bc38e7b8bea748938937e_6">
                <text>&lt;Je ne sais pas&gt;</text>
            </label>
           
Notice the difference depending on whether the label is excluded or included for the target language of the project (which is en-PH).

I got this issue while using the XML filter in OmegaT via the filter plugin, but I could reproduce it if I create an XLIFF file with Rainbow (also attached).

For your convenience, I'm attaching:

- the source file (STQ.xml)
- the target file
- the filter parameters file
- the OmegaT package including all of the above
- the target XLIFF file

I could not create an OmegaT project in Rainbow 1.45
Utilities > Translation Kit Creation. I selected the OmegaT Project, but the outcome is Generic XLIFF (but this is a different bug).

I will write a ticket as soon as someone else can confirm they can reproduce the problem.

Thanks.

Cheers, Manuel
okapi_unescaped_gt_test_OMT.omt
rainbow_unescaped_gt_test.zip

Michel Farhi-chevillard

unread,
Mar 25, 2024, 12:13:11 PM3/25/24
to okapi-users
Hi, 

I don't have time to look at this in detail, but there is a flag in the FPRM file for escaping or not escaping the > char.
https://okapiframework.org/wiki/index.php/XML_Filter#escapeGT

Could that help you?

Thanks.

Manuel Souto Pico

unread,
Mar 29, 2024, 5:59:25 AM3/29/24
to Michel Farhi-chevillard, okapi-users
Hi Michel,

Thanks for your quick note. I'm aware of that option, we have it enabled (escapeGT="yes", in the okf...@oat.fprm that I shared).

The problem seems to be that this option only works for included nodes, that's the problem.

Cheers, Manuel

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/5cc6a63e-8469-4b92-86b3-3e4cb9ac2e6dn%40googlegroups.com.

Manuel Souto Pico

unread,
Oct 3, 2024, 7:56:14 AM10/3/24
to Michel Farhi-chevillard, okapi-users
Hi there, 


This is a show stopper issue.

Thanks.
Cheers, Manuel

Manuel Souto Pico

unread,
Oct 3, 2024, 12:43:36 PM10/3/24
to okapi-users, Yves Savourel
Dear all, 

Apparently Denis cannot see the ticket while it has status "submitted". Could the status be changed to something that lets him see it and post to it? 
Thanks.

Cheers, Manuel

yves.s...@gmail.com

unread,
Oct 3, 2024, 2:20:31 PM10/3/24
to Manuel Souto Pico, okapi-users

Set to open now.

Manuel Souto Pico

unread,
Oct 3, 2024, 5:45:29 PM10/3/24
to yves.s...@gmail.com, okapi-users
Yay! Thanks, Yves.
Cheers, Manuel

Mihai Nita

unread,
Oct 3, 2024, 7:27:37 PM10/3/24
to Manuel Souto Pico, Yves Savourel, okapi-users
The `>` represented "as is", non-escaped, is valid xml:

The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within comments, processing instructions, or CDATA sections. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;". The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be so represented when it appears in the string "]]>", when that string is not marking the end of a CDATA section.


Sure, it would be nice for Okapi to offer it as an option. 
But the bug is really in the consumer of the file.

Mihai

Manuel Souto Pico

unread,
Oct 8, 2024, 5:26:10 PM10/8/24
to Mihai Nita, Yves Savourel, okapi-users
Mihai Nita <mih...@gmail.com> escreveu (sexta, 4/10/2024 à(s) 01:27):
The `>` represented "as is", non-escaped, is valid xml:

Nobody said it's not. The problem was that the escapeGT option didn't work for excluded nodes.
Reply all
Reply to author
Forward
0 new messages