Converting Xliff to java properties file

791 views
Skip to first unread message

blues8

unread,
Feb 11, 2011, 9:34:20 AM2/11/11
to okapi-devel
Hi all,
i just started to use OKAPI framwork. I wrote some simple test
programms. I am able to use a Pipeline reading an java properties
file , do a kind of p

Now I try to read the xliff document and write it as a properties
file.
here is the code that I am using


// Add the writer step to the pipeline

FilterEventsWriterStep pw = new FilterEventsWriterStep();
IFilter filterw = new PropertiesFilter();
IFilterWriter writer = filterw.createFilterWriter();

pw.setFilterWriter(writer);
driver.addStep(pw);seudo localization and write the result as an XLIFF
file.

but the outpu file is still xliff.
I did some research in the SVN and I see thet the
PropertiesFilter.cerateFilterwriter() simlply returns a generic
writer.
Question: What I am doing wrong ?
Question: Do I need to write my owne PropertiesWriter class.

Any help is welcome

many thanks in advance
Regards
Reinhold aka blues8 (short form for "blues -aid")

Yves Savourel

unread,
Feb 11, 2011, 10:56:10 AM2/11/11
to okapi...@googlegroups.com
Hello Reinhold,

> but the outpu file is still xliff.
> I did some research in the SVN and I see thet the
> PropertiesFilter.cerateFilterwriter() simlply returns a generic
> writer.
> Question: What I am doing wrong ?
> Question: Do I need to write my owne PropertiesWriter class.

You could write you own PropertiesWriter class but I don't think it would help as far as I understand your goals.


An IFilter does the following thing:

- it allows you to extract text from the input file and put it into common resources (TextUnit, etc.)
- it provides a way to create a IFilterWriter that can be used to re-write the extracted resources back into the original file format.


=== the basics:

Forget the pipeline for a moment:
Here is how you can extract text and write it back:

// Create the filter and the writer
IFilter filter = new PropertiesFilter();
IFilterWriter writer = filter.createFilterWriter();

// Open the input
RawDocument rd = new RawDocument(new File("myFile.properties").toURI(), "UTF-8", LocaleId.ENGLISH);
filter.open(rd);

// Prepare the output
writer.setOptions(LocaleId.FRENCH, "UTF-8");
writer.setOutput("out.txt");

// Read the input
while ( filter.hasNext() ) {
Event event = filter.next();

// And write it back
writer.handleEvent(event);
}

// Close the files
filter.close();
writer.close();

Then you can do access the extracted resource from the event and do things to them before writing them back

For example the following code changes the text to uppercase:

// Read the input
while ( filter.hasNext() ) {
Event event = filter.next();

if ( event.isTextUnit() ) {
// Get the text unit
TextUnit tu = event.getTextUnit();
// Create the container for the target text, copying the source
TextContainer tc = tu.createTarget(LocaleId.FRENCH, true, IResource.COPY_ALL);
// Get the first (and only) fragment in the target container
TextFragment tf = tc.getFirstContent();
// Change the text to uppercase
tf.setCodedText(tf.getCodedText().toUpperCase());
}

// And write it back
writer.handleEvent(event);
}


=== The pipeline and the steps:

Now, you can extend the principle to a more modular process: the pipeline.
Each step does a specific task.
Here this pipeline read the input and re-write it:

// Create the pipeline driver
IPipelineDriver pdriver = new PipelineDriver();

// Make sure it can use the properties filter
FilterConfigurationMapper fcMapper = new FilterConfigurationMapper();
fcMapper.addConfigurations(PropertiesFilter.class.getName());
pdriver.setFilterConfigurationMapper(fcMapper);

// Add the step that does the extraction
pdriver.addStep(new RawDocumentToFilterEventsStep());

// Add the step that does the writing back into the original format
pdriver.addStep(new FilterEventsToRawDocumentStep());

// Create the batch item to process
String inputPath = "myFile.properties";
String outputPath = inputPath.replace("myFile.", "myOutput.");
pdriver.addBatchItem(
new BatchItemContext(
new File(inputPath).toURI(), "UTF-8", "okf_properties",
new File(outputPath).toURI(), "UTF-8", locEN, locFR));

// Run the batch process
pdriver.processBatch();

You can insert any step between those two and make changes to the text there (like with a pseudo-translation step).


=== XLIFF

Now, if you do an extraction to XLIFF, then a merging of XLIFF back into the original format you need to use the the original file in addition to the XLIFF document, because (as you saw above) re-writing to the original format requires to read the original file.

So the pipeline that merges back the translated XLIFF document into the original file format will need to read two files: a) the original file, and b) the XLIFF document. Each time you get a TextUnit from the original file you can seek that same text unit in the XLIFF document, substitute the target of the resource extracted from the original file with the target text read from the XLIFF document, and write the original event.

You don't use a step for reading the XLIFF document. You would likely do this by directly using the XLIFF Filter from within your merging step. The merging task should be a step itself that handles the text units received from the original file.

You can look at the recent merging steps here: http://tinyurl.com/4gtbh74
And the Merger class here: http://tinyurl.com/4amjvcm
to get an idea how that could be done.

I hope this helps.
-yves


Yves

unread,
Feb 11, 2011, 11:32:02 AM2/11/11
to okapi-devel
Re-reading your question I realize that I may have not answer it
properly.

Yes the filter-writer used by the PropertiesFilter is the
GenericFilterWriter. That's because, like for most other filters,
those filter-writer rely on the data taken from the original file (the
skeleton) to re-create the output. So for example it does not to write
out the properties keys by using the resname values of the extracted
text, but by using whatever is in the skeleton for the given text
unit.

As you noted, you would have to develop you own PropertiesWriter that
implements IFilterWriter and use it with FilterEventWriterStep().

It's pretty simple: you just have to overwrite the handler for
START_DOCUMENT, TEXT_UNIT and END_DOCUMENT for a properties file as
it's a basic format.
See the POFilterWriter in the PO filter package for an example.

But keep in mind that using this mechanism (a
RawDocumentToFilterEventStep followed by a FilterEventWriterStep) you
are loosing any part of the original document that is not extracted
(white spaces between entries, possibly comments, etc.)

-ys

blues8

unread,
Feb 14, 2011, 2:47:56 AM2/14/11
to okapi-devel
Hi Yves,
many thanks for your replies.

Both replies makes perfect sense to me.
After reading your latest reply I think you are right. I need a kind
of Merger Class like you said:
- reading the original properties file
- reading the xliff input and
- merge both files to a new property file.
I am going to have a closer look to the Merger class (thanks for both
URL in your first reply - looks promising).


Quick question(s):
Do you know where I could find some sample code aboout merging files?

Again
Many thanks for your quick reply and your valuable input.

- Reinhold (aka blues8)

Yves Savourel

unread,
Feb 14, 2011, 6:38:13 AM2/14/11
to okapi...@googlegroups.com
> Quick question(s):
> Do you know where I could find some sample code about merging files?

Here are some possible places to look:

Rainbow's merger: http://tinyurl.com/4svh8kc

The adjustTargetCodes() method in TextUnitUtil.java could be useful as well: http://tinyurl.com/6884wvv

TextUnitMerger in the xliffkit package: http://tinyurl.com/6gnnpgb

Tikal's XLIFF merger: http://tinyurl.com/64zfbld


We're slowly trying to re-use/unify those different parts, but they all do more or less the same thing: get the translation of a set of resource coming from some sort of translation format, and put it as the translation of another set that is being used to recreate files in their original format.

Hope this helps,
-ys


Jim

unread,
Feb 14, 2011, 12:55:00 PM2/14/11
to okapi...@googlegroups.com
Let me dig around on our side - we do something like this for a custom tkit format we use

Cheers,

Jim Hargrave

-----Original Message-----
From: blues8 <reinhol...@hotmail.com>
Sent: Monday, February 14, 2011 12:47 AM
To: okapi-devel <okapi...@googlegroups.com>
Subject: [okapi-devel] Re: Converting Xliff to java properties file

Hi Yves,
many thanks for your replies.

Both replies makes perfect sense to me.
After reading your latest reply I think you are right. I need a kind
of Merger Class like you said:
- reading the original properties file
- reading the xliff input and
- merge both files to a new property file.
I am going to have a closer look to the Merger class (thanks for both
URL in your first reply - looks promising).


Quick question(s):

blues8

unread,
Feb 16, 2011, 9:46:45 AM2/16/11
to okapi-devel
Many thanks for your support
In the meantime I will do soem reasearch regarding Merger Class

Cheers
Reinhold
> > -ys- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages