Different tag style in the two OpenXML filters

21 views
Skip to first unread message

Manuel Souto Pico

unread,
Nov 28, 2021, 7:56:51 PM11/28/21
to Mailing list for OmegaT user support, okapi-users
Dear all,

Two years ago I translated a Word document using the default OmegaT OpenXML filter, which produced a number of problems that are not relevant in this email. However, now we have the Okapi OpenXML filter included in the Okapi plugin for OmegaT, which is supposed to address some of the issues that the default OmegaT filter has.

I have now received the new version of the Word document, and I am trying with the Okapi OpenXML filter. One problem I have is lost translations because of different tag style.

For example, with the OmegaT filter I had:

This is <t0/>formatted<t1/> text.

Whereas with the Okapi filter I have:

This is <g1>formatted</g1> text.

Therefore there's no exact match. Is there any way I can make the Okapi filter recognize the OmegaT tags or to produce OmegaT-style tags?

The "adapt_tags_to_match_target.groovy" script does not work with this kind of tag difference.

Thanks a lot.
Cheers, Manuel

Manuel Souto Pico

unread,
Dec 2, 2021, 10:38:14 AM12/2/21
to Mailing list for OmegaT user support, okapi-users, Mailing list for OmegaT developers.
Hi there,

Question to OmegaT developers: Would it be technically feasible to use the same tag style that the Okapi filter does (same letter, same opening/closing style, etc.)? Would that be reasonable? Could someone explain?

Cheers, Manuel

yves.s...@gmail.com

unread,
Dec 2, 2021, 10:42:19 AM12/2/21
to Manuel Souto Pico, Mailing list for OmegaT user support, okapi-users, Mailing list for OmegaT developers.

Hi Manuel.

 

One big difference I note is that the OmegaT filter seems to use only standalone tags (with different IDs) while the Okapi one tries to use open-closing tags (with a single ID for the element) when it can. That would make a conversion probably difficult on the way back to the Okapi style.

 

Cheers,

-ys

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CABm46bYekL7tsEvj7-HniugNEB0yyzjA-%2B0G6aOvk_e9TnsZEQ%40mail.gmail.com.

Manuel Souto Pico

unread,
Dec 6, 2021, 8:07:36 AM12/6/21
to Yves Savourel, Mailing list for OmegaT user support, okapi-users, Mailing list for OmegaT developers.
Thanks, Yves.

Yes, that's what I meant by "opening/closing style". I'd like to be optimistic, I don't read "difficult" in contradiction with technically feasible.

Not sure if by conversion you mean somehow converting the TMX files that OmegaT exports after the fact. What I mean is to change the behaviour of  the OmegaT filter(s) so that OmegaT natively uses the same tag style as Okapi filter(s) do, so that any TMX file would work in both directions. My initial email was about the OpenXML filters, but I think this would apply to any file type (e.g. XLIFF).

It seems that Okay filters are better engineered, so that's why I'm talking about OmegaT being the one that should "evolve" to get closer to Okapi filters.

Cheers, Manuel

Manuel Souto Pico

unread,
Jan 18, 2022, 4:57:18 AM1/18/22
to Yves Savourel, Mailing list for OmegaT user support, okapi-users, Mailing list for OmegaT developers.
Good morning.

Here comes a bit of a follow-up about this. The problem is not going to go away on its own... ;)

It seems there are a number of differences to overcome. If I don't overlook anything, they are:
- different style (opening/closing versus standalone)
- different letter displayed in the tag
- different numbering (starting at 1 or 0)

Perhaps instead of addressing the whole problem and all the parts at the same time, it makes more sense to decompose the problem and tackle one thing at a time.

For example, if both filters could agree to have, say, the same first digit in the numbering (either 0 or 1 for both filters), perhaps that's a small change that can be easily implemented but it's already a first step that would be helpful.

For example, the following screenshot shows a segment to translate in an OmegaT project using the Okapi OpenXML filter and a match from a TM generated from an OmegaT project created in Okapi Rainbow. The source file is the same:

image.png

Do you think some collaboration and agreement between the OmegaT and the Okapi teams is possible in this regard?
Thanks.

Cheers, Manuel

yves.s...@gmail.com

unread,
Jan 18, 2022, 11:03:30 PM1/18/22
to Manuel Souto Pico, Mailing list for OmegaT user support, okapi-users, Mailing list for OmegaT developers.

Hi Manuel, all,

 

I haven’t looked at the code that converts the native Okapi inline codes into OmegaT, but I’m guessing it’s probably possible to change some aspects of the created tags, like the letters used and maybe the numbering.

The problem is always the same: time to do it.

I’ve open a ticket for it: https://bitbucket.org/okapiframework/omegat-plugin/issues/35

 

Cheers,

-ys

 

From: Manuel Souto Pico <termin...@gmail.com>
Sent: Tuesday, January 18, 2022 10:57 AM
To: Yves Savourel <yves.s...@gmail.com>
Cc: Mailing list for OmegaT user support <omegat...@lists.sourceforge.net>; okapi-users <okapi...@googlegroups.com>; Mailing list for OmegaT developers. <omegat-de...@lists.sourceforge.net>
Subject: Re: [okapi-users] Re: Different tag style in the two OpenXML filters

 

Good morning.

 

Here comes a bit of a follow-up about this. The problem is not going to go away on its own... ;)

 

It seems there are a number of differences to overcome. If I don't overlook anything, they are:

- different style (opening/closing versus standalone)

- different letter displayed in the tag

- different numbering (starting at 1 or 0)

 

Perhaps instead of addressing the whole problem and all the parts at the same time, it makes more sense to decompose the problem and tackle one thing at a time.

 

For example, if both filters could agree to have, say, the same first digit in the numbering (either 0 or 1 for both filters), perhaps that's a small change that can be easily implemented but it's already a first step that would be helpful.

 

For example, the following screenshot shows a segment to translate in an OmegaT project using the Okapi OpenXML filter and a match from a TM generated from an OmegaT project created in Okapi Rainbow. The source file is the same:

 

image001.png

Manuel Souto Pico

unread,
Jun 9, 2022, 8:58:32 AM6/9/22
to yves.s...@gmail.com, Mailing list for OmegaT developers., Mailing list for OmegaT user support, okapi-users
Thank you, Yves.

I left a comment in your ticket to document an inconsistency in the tag numbering between Rainbow and the Okapi plugin.

Cheers, Manuel
Reply all
Reply to author
Forward
0 new messages