Extracting bilingual Excel to translated XLIFF

Skip to first unread message

Manuel Souto Pico

Jul 6, 2021, 7:26:20 PM7/6/21
to okapi-users
Hi there, 

Is there any way in Rainbow or Tikal to extract two columns (source and target) a spreadsheet and create a translated/bilingual XLIFF file with them? (first column to source, second column to target). 

The created XLIFF (or OmegaT project) would be for a revision task, not translation.

I had a go with okf_table_src-tab-trg but I wasn't very successful.

Thanks in advance. 

Cheers, Manuel

Chase Tingley

Jul 6, 2021, 7:43:07 PM7/6/21
to Manuel Souto Pico, okapi-users
If you convert to CSV, it can be done with the table/CSV filter, although it's configuration is sort of hard to work with.  

You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CABm46bYyz233yhPLadoZXotgE%3Dx_j5zXUiBZZ%3DKBbrDQd%2BVM5Q%40mail.gmail.com.

Manuel Souto Pico

Jul 6, 2021, 8:11:31 PM7/6/21
to Chase Tingley, okapi-users
Hi Chase, 

I forgot to mention, yes I have saved my spreadsheet as TSV. 

You seem to confirm this is feasible. What is the exact name of the filter that should do the trick? 

Cheers, Manuel

Chase Tingley

Jul 6, 2021, 8:21:40 PM7/6/21
to Manuel Souto Pico, okapi-users
You'll want to create a new config using either okf_table_csv or okf_table_tsv as a base.

I've attached a sample of a 3-column CSV (an id column, english source, french target), along with a custom configuration that parses it:

 tikal.sh -fc okf_...@bilingual-csv.fprm -sl en -tl fr -x test.csv

A lot of the complexity is in the "Columns" tab of the filter config UI for this filter.  Here's how my configuration looked:

Screenshot from 2021-07-06 17-20-38.png


Manuel Souto Pico

Jul 7, 2021, 6:16:54 AM7/7/21
to Chase Tingley, okapi-users
Thank you so much, Chase. 

I can obtain the expected results with your csv and fprm files. However, it's not working if I try to adapt the config to the structure of my spreadsheet. 

I have run a simple test to adapt your config. I have added one new first column that shifts all data  (id, source, target) one position to the right, so now I should extract columns 2, 3 and 4 instead of columns 1, 2 and 3 as you were doing. I have modified the config accordingly, in particular the rows:


which in your config are


(I understand that 'SourceRefs' refers to 'sourceColumns')

However, with these changes I get the source text in both the source and the target: 

<group id="3" restype="row">
<trans-unit id="1" resname="string1">
<source xml:lang="en">Hello</source>
<target xml:lang="fr">Hello</target>

Am I doing something wrong? Or does it only work if there's nothing to the left of the first column? My modified files are attached.

I could use that approach if I removed all columns in my spreadsheet so that the first column extracted is column A. However, that's inconvenient for the post-processing step to merge back. 

Thanks a lot.
Cheers, Manuel

Manuel Souto Pico

Jul 8, 2021, 3:49:39 PM7/8/21
to Chase Tingley, okapi-users
Hi there, 

Will it help if I create a ticket to report this issue?

Cheers, Manuel

Chase Tingley

Jul 8, 2021, 8:45:33 PM7/8/21
to Manuel Souto Pico, okapi-users
Hi Manuel,

Attached is a config that works for your file.  The reason it wasn't working for you seems unrelated to your changes:

$ diff okf_...@bilingual-csv-1.fprm okf_...@manuel-csv.fprm
< detectColumnsMode.i=1
< numColumns.i=3
> detectColumnsMode.i=0
> numColumns.i=1

These are the settings for the "Number of Columns" setting in the UI.  In the original file I sent you, I had it set to mode 1, which is "Defined by Column Names"; in my fixed version, it's set to "Defined by Column Values".

I am actually not sure if "Defined by Column Names" is working correctly.  I would think it would ignore that "numColumns.i=3" value, but it seems like maybe it was using it for some reason.

Reply all
Reply to author
0 new messages