Creating one trans-unit for each segment in XLIFF

16 views
Skip to first unread message

Manuel Souto Pico

unread,
Jun 8, 2021, 6:59:59 AM6/8/21
to okapi-users
Hi there,

Would it be possible to create XLIFF files in Rainbow where each segment is in its own trans-unit node?

For example, for text "Foo. Bar.", I would like to have 

          <trans-unit id="tu1">
                <source xml:lang="en">Foo.</source>
                <target xml:lang="nl"/>
            </trans-unit>
          <trans-unit id="tu2">
                <source xml:lang="en">Bar.</source>
                <target xml:lang="nl"/>
            </trans-unit>

instead of

          <trans-unit id="tu1" restype="x-paragraph">
                <source xml:lang="en">Foo. Bar.</source>
                <seg-source><mrk mid="0" mtype="seg">Foo.</mrk><mrk mid="1" mtype="seg">Bar.</mrk></seg-source>
                <target xml:lang="nl"/>
            </trans-unit>

Is that possible in Rainbow or Tikal?

Thanks a lot in advance.
Cheers, Manuel

Yves Savourel

unread,
Jun 8, 2021, 9:54:02 AM6/8/21
to okapi-users

I believe the Segments to Text Units Converter step does this.

But I’m not sure how you would merge back such flow of events if you need it.

One would probably have to create another step to put things back together.

 

-ys

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CABm46bY2oUQ8joCKXfyRGh7sjs5aZxiXHJr0BG%2B4aQb0DLz2QA%40mail.gmail.com.

Manuel Souto Pico

unread,
Jun 9, 2021, 1:59:47 PM6/9/21
to Yves Savourel, okapi-users
Hi Yves,

Indeed, the Segments to Text Units Converter does the trick, but indeed as you mention, I need to merge back the translation. The segments corresponding to the two sentences in my paragraph have IDs tu2:1 and tu2:2.

I have tried adding segment Desegmentation to the middle of the merge pipeline, but that's probably meant for something else.

I get these errors:

ERROR: De-synchronized files: translated TU id='tu2:1', Original TU id='tu2'.
ERROR: No corresponding text unit for id='tu2:2' in the original file.

The merge should be possible, since the necessary information to reconstruct the text blocks is in the IDs. Do you have any other suggestions?

Thanks.

Cheers, Manuel



Yves Savourel

unread,
Jun 9, 2021, 2:15:55 PM6/9/21
to Manuel Souto Pico, okapi-users

Yes, I think there is currently no step that puts back the several TUs into a single one. (or maybe it exist, but it’s not documented, but I doubt it).

The “exploding” step was likely made for other purpose than merging back.

So, currently, I’m afraid there is no out-of-the-box solution for this type of processing.

Manuel Souto Pico

unread,
Jun 9, 2021, 2:45:43 PM6/9/21
to Yves Savourel, okapi-users
Ok, thanks for your feedback, Yves.
Cheers, Manuel

jim

unread,
Jun 10, 2021, 10:49:34 AM6/10/21
to Manuel Souto Pico, Yves Savourel, okapi-users
Sorry coming in late. There is an option to turn off tu id checking in the merge (if that is the problem). It's kinda hidden in SkeletonMergerWriter which is used by OriginalDocumentXliffMergerStep.

id checking is on by default, but here are the options:


setThrowCodeException(false); // compare tu inline codes
setThrowSegmentIdException(true); // compare tu id's
setThrowSegmentSourceException(false); // compare tu source

Jim

Reply all
Reply to author
Forward
0 new messages