New (optional) serialized format for extraction and merge...

3 views
Skip to first unread message

jimbo

unread,
May 13, 2022, 4:24:19 PM5/13/22
to Group: okapi-devel
Okapi users primarily use xliff 1.2 as a "pivot" format. This file is
updated with translations. Okapi then uses the pivot file to "merge" the
new translations back into the original format. But, formats have gotten
much more complicated (xliff 2.1 is a good example). The problem with
xliff 1.2 as a pivot format is that much of the metadata in the original
file is lost. Also, tools don't have access to all of the important
information that is needed by the translators.

As of Okapi 1.44.0  we provide a new (optional) pivot format that
retains all of the original files metadata - or at least all the
metadata that the filters extract. We use Google protobuffer to model
the Okapi TextUnit.

In order to preserve (map) xliff 2.x metadata TextUnitMerger has grown
increasingly complex. The serialized format avoids all the complexity as
the metadata we want doesn't need to be copied from the original file. A
new TextUnitMergerSerialized will be used and is *much* simpler.

Attached is a simple xliff 1.2 and serialized file for comparison. Note
that the serialized format preserves the original xliff 2.x attribute
values.

TextUnitFlat.proto is the protobuffer file that is used to map Okapi
TextUnits to the serialized format. Currently we have a semi-json
output, but this may change to a more efficient binary format.

Comments welcome.

Jim

small_many_inline_codes3.xlf.ser
small_many_inline_codes3.xlf.xliff_extracted
TextUnitFlat.proto

jimbo

unread,
May 18, 2022, 1:29:34 PM5/18/22
to Group: okapi-devel
All three PR's 604, 606, 607 are ready for review and approval. They
should be reviewed in order (see comments). The last PR implements a new
TextUnitMerger for the serialized format (simpler) - but not as simple
as it could be - probably due to some TextFragment.append bug I couldn't
figure out. But I have a workaround.

https://bitbucket.org/okapiframework/okapi/pull-requests/604

https://bitbucket.org/okapiframework/okapi/pull-requests/606

https://bitbucket.org/okapiframework/okapi/pull-requests/607

Jim
Reply all
Reply to author
Forward
0 new messages