Help >> Converting Bilingual XML to TMX and Creating a Pipeline for DITA Translation

44 views
Skip to first unread message

Paulo Moreno

unread,
May 5, 2025, 3:30:30 PMMay 5
to okapi-users

Hi there,

I would like to know if there is a way to transform a bilingual XML file (ENG-ESP_BT_Updated.xml) into a TMX file using Okapi Rainbow.

Additionally, I'm interested in creating a pipeline in Rainbow that would automatically translate a DITA file (t0001_test.dita) using the TMX generated from the bilingual XML. Note that this DITA files contains some metadata information that do not require translation.

Please find attached the sample files for reference.

I look forward to your guidance on this.

Thanks in advance,
- Paulo Machado

ENG-ESP_BT_Updated.xml
t0001_test.dita

Chinese translator and desktop publisher.

unread,
May 6, 2025, 2:31:08 AMMay 6
to Paulo Moreno, okapi-users

Hi Paulo Machado,
This is ineresting! I have just generated a tmx - see attached - by transforming that bilingual xml using a customized xslt. If this is what you want, then I feel like creating a new public repository in Github for the xslt. Stay tuned! ;-)
Kindest,
Wei


From: okapi...@googlegroups.com on behalf of Paulo Moreno
Sent: Tuesday, May 6, 2025 3:30 AM
To: okapi-users
Subject: [okapi-users] Help >> Converting Bilingual XML to TMX and Creating a Pipeline for DITA Translation
--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/d79e557e-ec82-4082-a702-6242de92dfafn%40googlegroups.com.
ENG-ESP_BT_Updated.tmx
Message has been deleted
Message has been deleted

danielhug

unread,
May 6, 2025, 2:43:40 PMMay 6
to okapi-users
Your sample file looks like you won't need to do any segment alignment - will that always be the case?

Chase Tingley

unread,
May 6, 2025, 7:34:50 PMMay 6
to danielhug, okapi-users
Something like an XSLT would work well here. Okapi's xml/xmlstream filters don't support bilingual content, but if you wanted an Okapi-only solution, you could build a pipeline that:
- Added the same document as both Input 1 and Input 2
- Filtered Input 1 with a config that only extracted <src>, and filtered Input 2 with a config that only extracted <tgt>.
- Used a Pipleine of Raw Document to Filter Events > ID-based Alignment with options to generate a TMX

I can probably provide configs for this later.

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.

Paulo Moreno

unread,
May 6, 2025, 7:47:40 PMMay 6
to okapi-users

Hi Wei,

Thank you for your prompt response and for taking the time to test the TMX creation — I really appreciate your support.

I've reviewed the file you provided, and I noticed that the inline tags such as <u>, <i>, <b> and <xref> appear to be causing issues.
Specifically, they seem to fragment the sentences into multiple segments, which may impact both readability and alignment between source and target segments.

Would it be possible to adjust the tagging or segmentation logic to preserve sentence integrity while still retaining the formatting?

Looking forward to your thoughts.

Best regards,
-P.

Chase Tingley

unread,
May 6, 2025, 7:49:05 PMMay 6
to Paulo Moreno, okapi-users
Apologies Paulo, because you're a new member of the group, your reply to Wei was sent to a moderation queue, and I only just noticed.

Chinese translator and desktop publisher.

unread,
May 6, 2025, 8:32:24 PMMay 6
to Paulo Moreno, okapi-users
Hi Paulo,
I saw/see 4 pairs of segments in 4 TUs altogether:
 
Is that what you see?
Best,
Wei


From: okapi...@googlegroups.com on behalf of Paulo Moreno
Sent: Tuesday, May 6, 2025 3:55 PM
To: okapi-users
Subject: Re: [okapi-users] Help >> Converting Bilingual XML to TMX and Creating a Pipeline for DITA Translation

danielhug

unread,
May 7, 2025, 11:24:34 AMMay 7
to okapi-users
I've been converting bilingual XML files to XLIFF for years, it works flawlessly as long as there is parity between source and target TUs. Process is similar to what Chase outlined.
Reply all
Reply to author
Forward
0 new messages