XLIFF 2.0 Validation, once again

1 view
Skip to first unread message

Chase Tingley

unread,
Jun 20, 2025, 2:50:49 PMJun 20
to Group: okapi-devel

Hi all,

I've encountered some new forms of invalid XLIFF 2.0 in the wild, and I have a general question about validation.  The XLIFF 2.0 library contains a flexible validation model based on a bitmap of validation options.  In the filter, this is currently exposed as a binary flag -- "maxValidation", either true (all validation bits) or false (no validation bits).  Disabling maxValidation in the filter does things like disabling validation against the XSD.

I have a couple questions based about specific validation scenarios:
  • First, a problem we've run into is unit ids that do not conform to NMToken (they contain spaces, etc).  If maxValidation is enabled, this error is caught by schema validation.  If maxValidation is disabled, schema validation does not catch the error, but the library checks for it independently.  
  • Second, we periodically see empty XLIFF 2.0 files that contain no units.  This is invalid, although in my opinion in a fairly harmless way.
I am thinking about adding additional validation flags to lib-xliff2 to permit these constructions, which means that they would be allowed with maxValidation=false in the filter.  Does this approach sound reasonable?

Jim Hargrave

unread,
Jun 20, 2025, 4:42:18 PMJun 20
to okapi...@googlegroups.com, Chase Tingley

I don't see a problem with more options as long as they are documented as non-standard. Real world always throws us imperfect files.

--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-devel/CAGRYq4jLeJ2kPP8OHxsz9wh8Fp-uURFPcEtx9x-FTz7PVodHMA%40mail.gmail.com.

yves.s...@gmail.com

unread,
Jun 20, 2025, 11:03:31 PMJun 20
to okapi...@googlegroups.com

Sounds reasonable to me.

-ys

--

Chase Tingley

unread,
Jun 21, 2025, 3:17:22 PMJun 21
to okapi...@googlegroups.com
Thanks, that sounds good.

The other one I forgot about is a funny corner case.  I got a file that someone was trying to make monolingual.  It declared a trgLang attribute in the header with no value (trgLang="").  Normally, this would be allowed, except that the file also contained unit <target> elements -- except all of the <target> elements were empty.

Both of these things are illegal in isolation, and it's sort of irritating to fix the combination (you have to scan the whole file before figuring out if you need the trgLang declaration at the top or not!), so I may just leave this one for now.

Chase Tingley

unread,
Jun 21, 2025, 5:40:44 PMJun 21
to okapi...@googlegroups.com
Ahh, I think I figured out how to do it without too much trouble.  I'll open a PR.

Chase Tingley

unread,
Jun 23, 2025, 1:03:22 PMJun 23
to okapi...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages