Should this be fixed?

9 views
Skip to first unread message

Jim Hargrave

unread,
Jun 2, 2025, 10:08:51 PMJun 2
to Group: okapi-devel

I merged https://gitlab.com/okapiframework/Okapi/-/merge_requests/912

However when we extract and merge the original test file we get this error. I think it is unrelated to Denis PR. The error is when we compare the code types - but I don't think type is ever used for merging so an artifact of our stricter tests. What should we do with this? Try to fix it as a bug or ignore?


net.sf.okapi.roundtrip.integration.RoundTripOpenXmlIT,debug2

[INFO] n.s.o.r.i.RoundTripOpenXmlIT - charts.pptx [ERROR] n.s.o.c.filters.FilterTestDriver - Code type difference: x-fonts:+mn-lt,+mn-ea,+mn-cs; vs x-bold;fonts:游ゴシック,+mn-cs; [ERROR] n.s.o.c.filters.FilterTestDriver - Fragment difference [ERROR] n.s.o.c.filters.FilterTestDriver - TextContainer difference [ERROR] n.s.o.c.filters.FilterTestDriver - Text unit difference, tu id=P400AC5C5-tu96 [ERROR] n.s.o.r.i.RoundTripOpenXmlIT - Failing test: charts.pptx Compare Events: charts.pptx

net.sf.okapi.common.integration.OkapiTestException: charts.pptx

charts.pptx

okf_o...@chart.fprm

Caused by: java.lang.AssertionError: Compare Events: charts.pptx at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.assertTrue(Assert.java:42) at net.sf.okapi.common.integration.EventRoundTripIT.runTest(EventRoundTripIT.java:110) ... 29 more

Process finished with exit code 255


yves.s...@gmail.com

unread,
Jun 3, 2025, 9:42:25 AMJun 3
to okapi...@googlegroups.com

It’s strange that the type value changes like this.

I guess there was some auto-change of the font? That seems bizarre during a round-trip test (which would not involve a tool like Trados that may do font mapping).

 

-ys

--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-devel/f10c56ca-c895-4882-8d98-464914c53550%40gmail.com.

Jim Hargrave

unread,
Jun 3, 2025, 11:20:52 AMJun 3
to okapi...@googlegroups.com, yves.s...@gmail.com

Must be the merger changing the font info - maybe based on the target locale? But with Unicode fonts more common do we really need to do this?

Interestingly when code simplification is applied there are no errors. This makes it work for our use case where we always apply code simplification for openxml.

Jim

Chase Tingley

unread,
Jun 3, 2025, 12:40:22 PMJun 3
to okapi...@googlegroups.com, yves.s...@gmail.com
Word has composite fonts so that you can bundle multiple fonts together in a single "logical" font that is selected from given things like script.  My suspicion is that with the font changes, the codes the filter produces now contains the metadata for a different font from within the same composite bundle -- maybe based on target locale as you suggested.

If you've got a clean testcase I'm sure Denis would have a look.

Jim Hargrave

unread,
Jun 3, 2025, 12:43:27 PMJun 3
to okapi...@googlegroups.com, Chase Tingley, yves.s...@gmail.com

This issue has the original file. Also some that denis created to test chart text extraction:

https://gitlab.com/okapiframework/Okapi/-/issues/1405

Jim

Jim Hargrave

unread,
Jun 4, 2025, 11:11:56 AMJun 4
to okapi...@googlegroups.com, Chase Tingley, yves.s...@gmail.com

Fortunately the merge code gracefully fails on the type mismatch and assumes the codes are the same only based on ID match. It's a less accurate match but seems to work. We do so much that can possibly change id's that it worries me a bit. But not much we can do without a huge code review etc. I did this a few years ago and finally punted with this layered code. For merge maybe we *only* match on ID - would speed things up. But remember this code is also used for other tasks like bilingual code alignment.

/*
    strictSearch must always match ID!!!
 */
static int strictSearch(Code tc, List<Code> codes, CodeMatches codeMatches, CodeComparatorOnIsolated cmpTagTypeWithIsolated) {
    // Most accurate match. Use case when source and target codes are the exactly same.
    int fromIndex = findMatch(tc, codes, codeMatches.getFromMatches(), CMP_ID, CMP_TAG_TYPE, CMP_DATA, CMP_TYPE);
    if (fromIndex == -1) {
       fromIndex = findMatch(tc, codes, codeMatches.getFromMatches(), CMP_ID, CMP_TAG_TYPE, CMP_DATA);
       if (fromIndex == -1) {
          // Mostly cases where simplified codes were used: (<x1/>, <g1>, </g1> etc..) We assume id's match
          // If a file merge fails STRICT search then code id's must have gotten misaligned and this should be fixed
          fromIndex = findMatch(tc, codes, codeMatches.getFromMatches(), CMP_ID, cmpTagTypeWithIsolated);
       }
    }

    return fromIndex;
}



On 6/3/25 10:40, Chase Tingley wrote:
Reply all
Reply to author
Forward
0 new messages