Using codefinder inside of cdata content with xml stream filter and okf_html subfilter

28 views
Skip to first unread message

Marc Mittag

unread,
Jun 12, 2024, 2:40:04 PM6/12/24
to okapi-users
Dear all,

I use the following settings for xml stream filter with okapi 1.47
snapshot (same result with a 1.44 snapshot, so seems not to be version
related).

I would like to protect certain things by codefinder in the content
already processed by the okf_html subfilter.

Yet the codefinder seems to be ignored.

The file is an xliff with html inside cdata of the target (yes I know,
who would create things like that? Guess what: Typo3 does).

Is that a bug? Or am I missing something?

Please find attached an example xliff, that should be converted.

This is the content for the xml stream filter config:

assumeWellformed: true
preserve_whitespace: false
exclude_by_default: true
global_cdata_subfilter: okf_html
useCodeFinder: true
codeFinderRules: |-
  #v1
  count.i=1
  rule0=t3
attributes:
  xml:lang:
    ruleTypes: [ATTRIBUTE_WRITABLE]
  xml:id:
    ruleTypes: [ATTRIBUTE_ID]
  id:
    ruleTypes: [ATTRIBUTE_ID]
  xml:space:
    ruleTypes: [ATTRIBUTE_PRESERVE_WHITESPACE]
    preserve: ['xml:space', EQUALS, preserve]
    default: ['xml:space', EQUALS, default]
elements:
  target:
    ruleTypes: [TEXTUNIT]


best

Marc
test.xliff

Álvaro Mira del Amo

unread,
Jun 18, 2024, 5:28:54 PM6/18/24
to okapi-users
Hi Marc,

If you are using a sub-filter (like the okf_html), have you tried defining the code finder in the subfilter used to process from the text units your filter generates?

Chase Tingley

unread,
Jun 18, 2024, 6:59:34 PM6/18/24
to Álvaro Mira del Amo, okapi-users
Subfilters aren't compatible with the codefinder (more generally: if a subfilter is in use, the parent filter can't also generate codes in the same TU), because Okapi doesn't have a way of labelling whether a given code originated with the parent filter or the subfilter. This creates problems on merge. As a result, if a subfilter is configured, the codefinder settings are ignored.

So, Álvaro's suggestion is the only way that works.  The codefinder needs to be configured in the innermost filter.

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/34c258a9-a21b-4ba1-88e8-4bb511f8955dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages