Creating one trans-unit for each segment in XLIFF

24 views
Skip to first unread message

Manuel Souto Pico

unread,
Jun 8, 2021, 6:59:59 AM6/8/21
to okapi-users
Hi there,

Would it be possible to create XLIFF files in Rainbow where each segment is in its own trans-unit node?

For example, for text "Foo. Bar.", I would like to have 

          <trans-unit id="tu1">
                <source xml:lang="en">Foo.</source>
                <target xml:lang="nl"/>
            </trans-unit>
          <trans-unit id="tu2">
                <source xml:lang="en">Bar.</source>
                <target xml:lang="nl"/>
            </trans-unit>

instead of

          <trans-unit id="tu1" restype="x-paragraph">
                <source xml:lang="en">Foo. Bar.</source>
                <seg-source><mrk mid="0" mtype="seg">Foo.</mrk><mrk mid="1" mtype="seg">Bar.</mrk></seg-source>
                <target xml:lang="nl"/>
            </trans-unit>

Is that possible in Rainbow or Tikal?

Thanks a lot in advance.
Cheers, Manuel

Yves Savourel

unread,
Jun 8, 2021, 9:54:02 AM6/8/21
to okapi-users

I believe the Segments to Text Units Converter step does this.

But I’m not sure how you would merge back such flow of events if you need it.

One would probably have to create another step to put things back together.

 

-ys

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CABm46bY2oUQ8joCKXfyRGh7sjs5aZxiXHJr0BG%2B4aQb0DLz2QA%40mail.gmail.com.

Manuel Souto Pico

unread,
Jun 9, 2021, 1:59:47 PM6/9/21
to Yves Savourel, okapi-users
Hi Yves,

Indeed, the Segments to Text Units Converter does the trick, but indeed as you mention, I need to merge back the translation. The segments corresponding to the two sentences in my paragraph have IDs tu2:1 and tu2:2.

I have tried adding segment Desegmentation to the middle of the merge pipeline, but that's probably meant for something else.

I get these errors:

ERROR: De-synchronized files: translated TU id='tu2:1', Original TU id='tu2'.
ERROR: No corresponding text unit for id='tu2:2' in the original file.

The merge should be possible, since the necessary information to reconstruct the text blocks is in the IDs. Do you have any other suggestions?

Thanks.

Cheers, Manuel



Yves Savourel

unread,
Jun 9, 2021, 2:15:55 PM6/9/21
to Manuel Souto Pico, okapi-users

Yes, I think there is currently no step that puts back the several TUs into a single one. (or maybe it exist, but it’s not documented, but I doubt it).

The “exploding” step was likely made for other purpose than merging back.

So, currently, I’m afraid there is no out-of-the-box solution for this type of processing.

Manuel Souto Pico

unread,
Jun 9, 2021, 2:45:43 PM6/9/21
to Yves Savourel, okapi-users
Ok, thanks for your feedback, Yves.
Cheers, Manuel

jim

unread,
Jun 10, 2021, 10:49:34 AM6/10/21
to Manuel Souto Pico, Yves Savourel, okapi-users
Sorry coming in late. There is an option to turn off tu id checking in the merge (if that is the problem). It's kinda hidden in SkeletonMergerWriter which is used by OriginalDocumentXliffMergerStep.

id checking is on by default, but here are the options:


setThrowCodeException(false); // compare tu inline codes
setThrowSegmentIdException(true); // compare tu id's
setThrowSegmentSourceException(false); // compare tu source

Jim

Manuel Souto Pico

unread,
Mar 21, 2022, 4:40:53 PMMar 21
to jim, Yves Savourel, okapi-users
Thank you for your feedback, Jim.

I guess to turn that option I would need to compile the code myself to try that out. I'm not familiar with Maven artifacts.. is there a page that explains how I must proceed with that?

In any case, shall I create a ticket for this? I had a quick look and I haven't seen any that seems relevant.

Cheers, Manuel

jim

unread,
Mar 23, 2022, 9:50:30 AMMar 23
to Manuel Souto Pico, okapi-users

https://bitbucket.org/okapiframework/omegat-plugin/src/dev/

Marco - I'm afraid I won't have time to continue to help. But we have discussed this in the okapi grpup  and we recommend the following:

  1. Get a bitbucket account and create your user - everyone that accesses the code will need one. Send us your username and we will give each full rights to this repo.
  2. Place *all* omegat issues/tickets here: OmegaT Issues. You should move any omegeat specific issue from the Okapi issue list to here as well (we will delete them later). That way everything is self contained and the chances of a ticket being overlooked is greatly reduced. We will give you rights to have full control to edit the issues.
  3. Learn how to pull the code from the repository and make changes and test (I suggest using IntelliJ Community IDE). We will give you or anyone else permissions to update code. Everything is self contained and will build "out of the box". Everything is pre-confugured.

The responsibility to push this forward is now on the shoulders of the OmegaT users. I think you will find after an initial steep learning curve that this gives you much more flexibility and you  (and others with more development experience) will be able to make any changes or fixes you need in a timely manner. Spartan (https://www.spartansoftwareinc.com/2016/01/18/how-spartan-uses-okapi-to-accelerate-custom-worldserver-development/) is a consulting firm that can help. I would also post to okapi-dev. There are many experienced developers that may be willing to contract with you.

Jim

Reply all
Reply to author
Forward
0 new messages