Why XLIFF Is Not the Right Format for Filter Extraction/Merge

13 views

Skip to first unread message

Jim Hargrave

unread,

Mar 10, 2026, 3:22:05 PMMar 10

to Group: okapi-devel

As promised here are my arguments for encouraging the use of our JSON serialization format (as we continue to support XLIFF).

1. XLIFF is an interchange format, not a processing format. XLIFF is for exchanging translations between systems — not to serve as the internal representation within a filter pipeline. We can easily create an XLIFF from the serialized format as needed.

2. XLIFF alone cannot always round-trip a file. Without heavy custom elements XLIFF does not support all filters. Most important is XLIFF 2 which requires inter-segment metadata. XLIFF does not have an element for this. Our extraction XLIFF already uses a lot of customization - at that point is it really XLIFF?

I have noticed other filters successfully merge with serialized format, but fail with XLIFF. Probably bugs, but you can see the serialized merger is *much* simpler than the XLIFF version.

3. Many companies already use serialization. Google's internal localization toolchains use ProtoBuffer serialization for their processing pipelines, not XLIFF. XLIFF is used at the boundary — to exchange with external vendors and CAT tools — not as the internal processing format. I've heard of other tools that prefer Okapi event serialization of various kinds.

Filters should extract into a detailed internal event/resource model optimized for fidelity. XLIFF is produced from that model only when you need to exchange data with external systems, and consumed back when translations return. XLIFF belongs at the boundary, not inside the engine.