Fwd: Extraneous tags which are not escaped in output

54 views
Skip to first unread message

Manuel Souto Pico

unread,
Jul 3, 2023, 9:42:52 AM7/3/23
to okapi-users
Dear all,

I have got a surprising issue today using the Okapi XML filter.

Please consider the following simple source XML file to translate:

<file xmlns="http://www.imsglobal.org/xsd/imsqti_v2p2">
    <item>
        <label key="123">
            <text>foo <em>bar</em> qux</text>
        </label>
        <label key="456">
            <text>foo bar</text>
        </label>
        <label key="789">
            <text>foo baz qux</text>
        </label>      
    </item>
</file>

I translate the file like so:


Seg1: Translation is okay
Seg2: Translation is okay too but I add an "extraneous" tag there (for comparison, see below)
Seg3: I insert the fuzzy match coming from segment 1, but I forget to remove the tag (simulating what the user did)

When I generate the target file, I get:

<?xml version="1.0" encoding="UTF-8"?>
<file xmlns="http://www.imsglobal.org/xsd/imsqti_v2p2">
    <item>
        <label key="123">
            <text>ña <em>ña</em> ña</text>
        </label>
        <label key="456">
            <text>ña &lt;b&gt;ño&lt;/b&gt; ño</text>
        </label>
        <label key="789">
            <text>ña <g1>ña</g1> ña</text>
        </label>      
    </item>
</file>

Labels 123 and 456 are okay. The original markup is there in the first case and the tag that I inserted in the second case appears as text (escaped angle brackets, as expected).

However, In the third label (789) I would have expected "ña &lt;g1&gt;ña&lt;/g1&gt; ña" (as in the previous case with extraneous tag <b>) but instead of that I get an unescaped tag pair <g1> ... </g1> which breaks the XML file.

By "breaking" I mean that it introduces an element that is not expected by the validation schema.

I have tested this with other XML-based standard (not Okapi) filters in OmegaT, and I can't reproduce it. My conclusion and tha of the OmegaT PM is that the issue is in the filter.

I'm attaching the sample project package.

I'll wait for some feedback before creating a ticket so as to have this fixed asap.

Thanks in advance.
Cheers, Manuel

Version: OmegaT-6.1.0_0_d1f75ad22
Platform: Linux 6.1.35-1-lts
Java: 11.0.18 amd64
Okapi plugin: okapiFiltersForOmegaT-1.13-1.45.0.jar


omegat_bug_extraneous_legacy_tag.omt

Jimbo

unread,
Jul 3, 2023, 10:49:24 AM7/3/23
to Manuel Souto Pico, okapi-users

Hi Manuel,

Can you modify this file to match your translation in OmegaT?

One thing I noticed is that OmegaT is still using simplified xliff codes (g/x) - you will get better results if you use bpt/ept/ph etc..

Jim

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CABm46bavw1Je_FRAMmBt7BVnWi5j_YY2Aqc0Oyxrkoq-pPW6_g%40mail.gmail.com.
--
Jim Hargrave
Software Engineer

W: www.strakertranslations.com
E: jim.ha...@strakergroup.com

This e-mail and any attachments are confidential and intended solely for the intended addressee. If you are not the intended addressee or have received this e-mail in error, please notify Straker immediately, delete it from your system and do not copy, disclose, distribute or otherwise act in reliance upon any part of this e-mail or its attachments. Straker will not be held liable for any damage caused by the message.
Is it necessary to print this email? If you care about the environment like we do, please refrain from printing emails. It helps to keep the environment forested and litter-free.
foobar.xml.xliff

Manuel Souto Pico

unread,
Jul 3, 2023, 11:03:33 AM7/3/23
to Jimbo, okapi-users
THank you, Jim.

See the target XLIFF file attached (translated using the Okapi XLIFF filter in OmegaT). I hope this helps.

About the tag notation, I'm not sure. I would assume that this would depend on what the filter / plugin does, I could be wrong but I don't think OmegaT determines this. In Okapi, I think this is an option in the options of the XLIFF and OmegaT project generation.

Cheers, Manuel

Jimbo

unread,
Jul 3, 2023, 7:59:00 PM7/3/23
to Manuel Souto Pico, okapi-users

Hum, maybe google stripped the attachment? Try a zip.

Yes, I bet OmegaT does determine the tag type and as I thought about it, it may be required. It's just a path we don't test heavily anymore and prefer the bpt/ept/ph codes.

Jim

Manuel Souto Pico

unread,
Jul 4, 2023, 2:29:35 AM7/4/23
to Jimbo, okapi-users
Hi JIm,

I think I just forgot to attach the file ^^ Apologies for that. You should see it now.

The content is

            <trans-unit id="3" resname="789" xml:space="preserve">
                <source xml:lang="en">foo baz qux</source>
                <seg-source><mrk mid="0" mtype="seg">foo baz qux</mrk></seg-source>
                <target xml:lang="fr"><mrk mid="0" mtype="seg">ña <g1>ña</g1> ña</mrk></target>
            </trans-unit>

In other words, same behaviour as I reported.

Incidentally, OmegaT shows bpt/ept codes as <g1>:
image.png

Here you can see how I reuse the fuzzy match containing the tags:


I hope that helps.

Cheers, Manuel

foobar.xml.xliff

Jimbo

unread,
Jul 4, 2023, 12:56:28 PM7/4/23
to Manuel Souto Pico, okapi-users

The xliff file you attached is already invalid (<g1></g1>) so won't work. I think the real problem is the TM leverage. OmegaT should have logic to make sure that when writing leveraged content to an xliff file it properly escapes the content. It seems to do this for the <b> tags. So in this case <g1> to <g id="1"> etc.. (or whatever internal xliff tag you are using)

Can you test on an old version of OmegaT and see if it is a problem? If not, this may be an OmegaT bug introduced recently.

Jim

Manuel Souto Pico

unread,
Jul 4, 2023, 2:27:54 PM7/4/23
to Jimbo, okapi-users
Hi Jim,

Thanks for looking into this.

Yes, you're right, the target XLIFF file is invalid, like the target XML I had originally used. This is the issue that I'm reporting.

However, my impression is that the problem is not in the TM leverage but in the Okapi filter. I might be wrong but I think it's not OmegaT but the filter that writes the target text to the target file.

Before writing in this list I had tested this with other OmegaT native XML-based filters (such as the OpenDocument filter) and I didn't get the issue. I have now also tried translating your XLIFF file with the new "XLIFF 1" filer in OmegaT 6.1 and, again, I don't get the issue. The tag is escaped as in the case of <b> when using the ODT filter, and the tags are stripped if they are not in the source text when using the native "XLIFF 1" filter. The issue only happens with the Okapi XLIFF filter, and the TM leverage is the same in all cases.

I have tested this with previous versions OmegaT 4.3.3 and 5.7.1, using all the versions of the okapiFiltersForOmegaT plugin that I have: 1.8-1.40.0, 1.9-1.41.0, 1.11-1.43.0 and 1.12-1.44.0-jre8, as well as the latest  The issue happens in all cases.

Cheers, Manuel
 

Jimbo

unread,
Jul 4, 2023, 3:22:58 PM7/4/23
to Manuel Souto Pico, okapi-users

It's possible this is a problem with the Okapi XliffSkeletonWriter (not filter).   But I don't understand OmegaT enough to know what is writing the translated xliff files. These <g1> codes should be escaped.

Your best option is to find a way to reproduce this with a standard Okapi pipeline (Rainbow or Tikal).

Jim

Manuel Souto Pico

unread,
Jul 4, 2023, 3:24:53 PM7/4/23
to Jimbo, okapi-users
Thanks, Jim.

I have tried in Rainbow but I couldn't reproduce it. I don't know what else I can test.

Cheers, Manuel

Jimbo

unread,
Jul 4, 2023, 3:32:01 PM7/4/23
to Manuel Souto Pico, okapi-users

I would log this issue on the OmegaT project. OmegaT may be using the wrong options when writing out the xliff (raw xliff vs OKapi Code objects). This can get confusing with TM matches.

Manuel Souto Pico

unread,
Jul 4, 2023, 3:35:50 PM7/4/23
to Jimbo, okapi-users
Thanks, Jim.

I can do that, although I had checked there first. I can create a ticket and see whether a developer can have a better look at the interaction between the plugin and the core code.

Thank you for your time.
Cheers, Manuel

Manuel Souto Pico

unread,
Aug 9, 2023, 11:48:30 AM8/9/23
to Mailing list for OmegaT user support, okapi-users

l@tlo <li...@traduction-libre.org> escreveu no dia quarta, 5/07/2023 à(s) 05:02:
Thank you for checking with them.

JC

> On Jul 5, 2023, at 4:39, Manuel Souto Pico <termin...@gmail.com> wrote:
>
> After checking with the Okapi people, I have been sent back here.
> Since I can't reproduce the issue in Rainbow, Jim concludes that OmegaT may be using the wrong options when writing out the target XLIFF file (raw xliff vs OKapi Code objects). This can get confusing with TM matches.
>
> I will create a ticket in the OmegaT tracker as soon as I can. Any further feedback is welcome in the meantime.
>
> Cheers, Manuel
>
> Manuel Souto Pico <termin...@gmail.com> escreveu no dia segunda, 3/07/2023 à(s) 09:58:
> Thank you so much!
> Cheers, Manuel
>
> l@tlo <li...@traduction-libre.org> escreveu no dia segunda, 3/07/2023 à(s) 09:43:
>
>
> > On Jul 3, 2023, at 16:35, Manuel Souto Pico <termin...@gmail.com> wrote:
> >
> > We can conclude the problem is in the Okapi filter, then?
>
> Et voilà, I'd say. :)
>
> --
> Jean-Christophe Helary @jche...@emacs.ch
> https://traductaire-libre.org
> https://mac4translators.blogspot.com
> https://sr.ht/~brandelune/omegat-as-a-book/
>
>
>
> _______________________________________________
> Omegat-users mailing list
> Omegat...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/omegat-users
> _______________________________________________
> Omegat-users mailing list
> Omegat...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/omegat-users

--
Jean-Christophe Helary @jche...@emacs.ch
https://traductaire-libre.org
https://mac4translators.blogspot.com
https://sr.ht/~brandelune/omegat-as-a-book/



_______________________________________________
Omegat-users mailing list
Omegat...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/omegat-users

Manuel Souto Pico

unread,
Apr 8, 2024, 9:32:26 AM4/8/24
to Jimbo, okapi-users, Group: okapi-devel, Mailing list for OmegaT developers.
Dear Jim,

Based on your hunch, which was that OmegaT may be using the wrong options when writing out the target XLIFF file (raw xliff vs OKapi Code objects), I created a ticket in the OmegaT tracker, and Hiroshi Miura (one of the OmegaT core developers) has looked into it an concluded the problem is in the filters plugin. He wrote in the OmegaT ticket:

The bug is in Okapi Filters Plugin for OmegaT.
Please ask okapi project to fix it.

I have pushed bug reproducible test case, with detailed explanations.
https://bitbucket.org/okapiframework/omegat-plugin/pull-requests/22

I also have created a ticket in the omegat-plugin tracker (including Hiroshi's comments): https://bitbucket.org/okapiframework/omegat-plugin/issues/271/extraneous-tags-which-are-not-escaped-in

I hope this helps.
Cheers, Manuel

Reply all
Reply to author
Forward
0 new messages