Change in mrk processing in XLIFF 1.2 filter?

35 views
Skip to first unread message

yves.s...@gmail.com

unread,
Sep 10, 2022, 12:30:51 AM9/10/22
to okapi...@googlegroups.com

Hi,

 

There is a change in 1.44 XLIFFFilter that affects the way we get <mrk> elements.

 

For example in 1.43 for this:

 

Texte avec <mrk mtype="x-sdl-location" mid="someID5"/>location.

 

We were getting an inline code with type=mrk and no annotation.

Now we get an inline code type=”_annotation_ and a inline annotation.

 

The change comes from those new lines:

https://bitbucket.org/okapiframework/okapi/src/dev/okapi/filters/xliff/src/main/java/net/sf/okapi/filters/xliff/XLIFFITSFilterExtension.java#lines-237

 

Is there a reason for the compatibility breaking change?

The default is to add a custom annotation that basically store the mtype of the marker. Something that does not really add any useful information.

 

It seems there is also a side effect with the change.

Some mrk elements processed like this end up as isolated tags in the segment, rather than normal expected opening/closing (like there are with 1.43).

I still have to dig into that part to see why.

But I was wondering if there was a reason for the root cause first.

 

Thanks,

-yves

 

jimbo

unread,
Sep 10, 2022, 12:35:21 PM9/10/22
to okapi...@googlegroups.com, yves.s...@gmail.com

I'll look into this asap. I made these changes a while back to address some bugs we we finding, but I don't remember the details now. I'll try to find the specific  files I was testing against so if we make changes we can make sure that those files still pass.

Jim

--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-devel/007b01d8c4ce%2419d9a270%244d8ce750%24%40gmail.com.

jimbo

unread,
Sep 10, 2022, 1:16:38 PM9/10/22
to okapi...@googlegroups.com, yves.s...@gmail.com

Ok, this change was made to add support for Custom mrk elements in xliff 2 (with the ability to extract these to xliff 1.2 and preserve them).

Here's the commit comment: "various fixes to merger code (TextUnitMerger) additional support for custom mrk elements in xliff2 and xliff 1.2"

Basically when we parse xliff 2 if we see a mrk element with a non-standard type we turn it into a GenericAnnotationType.CUSTOM_TYPE. When we write xliff 1.2 we preserve this information using the an its extension. On merge this allows us to better match and update the the mrk elements between the original xliff 2 and the translated xliff 1.2. 

The new code "if (!"seg".equals(mtype))" basically says if its not a segment treat it as an inline annotation.  That way we are consistent and can match apples to apples when we merge.


>> It seems there is also a side effect with the change.

Now that would be a bug and could have something to do with the new annotation matching logic in TextUnitMerger.

Note that the new serialized format doesn't need any of these tricks!!

I'm open to making any changes needed as long as we can keep the xliff 2 -> xliff 1.2 custom mrk support.

Jim

On 9/9/22 22:30, yves.s...@gmail.com wrote:
--

jimbo

unread,
Sep 10, 2022, 1:29:01 PM9/10/22
to okapi...@googlegroups.com, yves.s...@gmail.com

One reason I wanted to consistently add the annotate type to mrk codes is that during "TextFragmentUtil.synchronizeCodeIds" we treat them differently than normal inline codes. We don't expect to align these between the source and target and can have any number of added or missing annotation codes in the target. The idea is that these can be added in the translation.

Ugly stuff for sure - maybe the bug you noted is cuased by treating these mrk codes specially now.

Jim

jimbo

unread,
Sep 10, 2022, 3:07:32 PM9/10/22
to okapi...@googlegroups.com, yves.s...@gmail.com

Removing the code in question I get a failure on this file. So at least we have something to debug with:

[ERROR] Errors:
[ERROR]   RoundTripXliffIT.xliffFiles:81->BaseRoundTripIT.realTestFiles:98->EventRoundTripIT.runTest:91 » OkapiTest lqiTest.xlf
[ERROR]   RoundTripXliffIT.xliffSerialized:102->BaseRoundTripIT.realTestFiles:98->EventRoundTripIT.runTest:91 » OkapiTest lqiTest.xlf

yves.s...@gmail.com

unread,
Sep 11, 2022, 12:12:38 AM9/11/22
to jimbo, okapi...@googlegroups.com

Hi Jim,

 

I’m fine with keeping the change and the custom annotation, especially since it helps for 2.x.

The change to handle that on the filter caller side is minor.

 

But it’d be great to have the markers in the extracted coded-text set to OPENING/CLOSING rather than ISOLATED.

The attach file should trigger such behavior in the first target segment.

I’ll try to understand why it does that too.

 

Cheers,

-yves

test-comments.docx.sdlxliff

Yves

unread,
Sep 11, 2022, 2:29:54 AM9/11/22
to okapi-devel
It seems the bug is that the closing tag of the mrk stays as 'mrk' for some of the elements that gets changed to '_annotation_', so the two tags cannot match when we balance codes later on.
Still looking...

Yves

unread,
Sep 11, 2022, 3:01:57 AM9/11/22
to okapi-devel
I think I've found it :)
when we fix-up the closing tag we do this:

                        if (( n = annIds.indexOf(id)) != -1 ) {
                            annIds.remove(n);
                            Code oc = current.getCode(current.getIndex(id));
                            GenericAnnotations.addAnnotations(code, oc.getGenericAnnotations());
                            code.setType(Code.TYPE_ANNOTATION_ONLY);
                        }
The problem is that current.getCode() calls the balanceMarkers() and we do this before we change the type.
If we do setType() before, things seems to be ok.

                        if (( n = annIds.indexOf(id)) != -1 ) {
                            annIds.remove(n);
                            code.setType(Code.TYPE_ANNOTATION_ONLY);
                            Code oc = current.getCode(current.getIndex(id));
                            GenericAnnotations.addAnnotations(code, oc.getGenericAnnotations());
                        }

I'll try to run the full build and the integration tests after this change (not a given on Windows it seems).

jimbo

unread,
Sep 12, 2022, 12:00:54 PM9/12/22
to okapi...@googlegroups.com, Yves

Oh, good I'm glad it was something small. Was worried there for a bit :-)

I hate that getCode has side effects. I've looked over the balanceCodes several times to see if I could clean this up - but have never come up with a better solution.

I approved the the PR with one comments.

thanks Yves!

Jim

Reply all
Reply to author
Forward
0 new messages