filters maintaining set id's is broken with TextFragment.append etc...

Skip to first unread message


Nov 17, 2021, 6:11:57 PM11/17/21
to Group: okapi-devel
A heads up that several filters (xliff primary) make heavy use of
TextFragment.append. The default for append is to rewrite code id's
(basically renumbering starting at 1). Normally this wasn't an issue
since TextUnitMerger was matching primary on data. The problem with this
is that targets may have reordered codes, which result in different id's
if they are generated out of context with the source. The xliff filter
has also been updated to carefully match codes between source and target
(using the original id's in the xliff or generating them based on data).

I am adding several new methods (append, joinAll, createJoinedContent
etc.) to use a boolean keepCodeIds parameter. The default behavior is
kept (keepCodeIds=false), but if a filter uses these methods or any
other code that depends on keeping the id's as-is you will need to use
the new methods (keepCodeIds=true).



Nov 19, 2021, 1:12:11 PM11/19/21
to Group: okapi-devel
 We have all kinds of code that wants to automatically rewrite/renumber code id's. I would have to modify more code than I originaly thought. In reality the TextFragment methods like insert, append are really designed for monolingual cases and cases where code id's are automatically generated. Trying to use that same code in the xliff filter is problematic becuase we need to keep source and target ids aligned.

We could continue to match on code data/TagType as we have done before. But that is problematic because sometimes the data is empty or duplicated - then we run the risk of matching the wrong code.

My personal preference is that we get this right - even if it means significant changes. Filters *must* create consistent code id's that are aligned across source and target. Filters and Writers also *must* maintain original id's (Code.originalId if observed makes this work).

The code changes would mostly be updating dozens of methods to use boolean keepCodeIds - maybe it's not that bad - but wanted to get general approval before we change core classes.


Reply all
Reply to author
0 new messages