Dear all,
to add to Axels question: We try to do the following with Okapi Longhorn:
Step 1 and 2 work.
Step 3 does not work. You always get back the txt-file in utf-8.
The reason for this is, that in the manifest-file that Longhorn creates for the UTF-16LE file you have this:
inputEncoding="UTF-16LE" targetEncoding="UTF-8"
regardless of what we try to send to Longhorn as params in the pipeline or in the fprm-file, which are packaged in the bconf.
There seems to be no way to influence the targetEncoding, that Longhorn writes in the manifest.
If we manually manipulate the manifest and set targetEncoding="UTF-16LE" and then convert the xliff back to txt with Longhron, we get UTF-16LE txt-file as we want it.
Is there anyway to achieve it with Longhorn to set targetEncoding in the manifest to something else than utf-8?
Why does Longhorn not use the encoding as targetEncoding, that it discovered for the source file (where it discovers correctly, that the encodingn is UTF-16LE?
Is that a gap in the current Longhorn implementation? Because with Rainbow you can achieve what you want, if you set the encoding like I did in this screenshot:

Thank you very much in advance for any help!!!
best
Marc
--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/319d72f7-db6e-45c9-ba2f-22a38be7833an%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/24c36b2b-7f28-4cc7-80d4-652aefef278a%40marcmittag.de.
Hi Chase,
thank you very much for your answer!
Since you write "just oversight" it sounds like a bug, right?
What would be the expected way to send the output encoding to Longhorn, if it would work?
Since in Rainbow the output encoding setting seems not to be part of the pipeline or the filter settings. But part of the document properties, as shown in my below screenshot.
So it does not surprise me, that putting it in the pipeline or in the fprm has not effect.
Just asking, what would be the right fix here. We would then ask someone to fix it and do a PR.
My impression is, that since it seems not to be part of the filter settings or pipeline, the right place would be to transfer it as separate REST params in addition to how source and target language shortcuts are passed.
Would that be the right way?
best
Marc
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/CAGRYq4hUOMuqwYFLCmUKx22QgbJPYXC0V-J-r_FEt331jLe1AA%40mail.gmail.com.
Hi Chase,
thank you very much for your answer!
Would there also be a reasonable way to add it to the pipeline as param somehow?
That would be easier for us to implement in the way we integrate Okapi.
But I guess, that is not in the way Okapi "thinks" in this regard?
Or would it make sense to add it to RawDocumentToFiltersEvent as params (one for input and one for output encoding)?
best
Marc
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/CAGRYq4h2J_aC--ayg-nJA_jbUMpMG1%2BEyVtybPOh7sO2Cw%3DfcQ%40mail.gmail.com.
Hi Chase,
yes, that is the direction, I'm thinking.
Claude already suggested, that would work ;-)
What of course it does not.
But this is, what it suggested as part of the .pln file:
<step class="net.sf.okapi.steps.common.FilterEventsToRawDocumentStep">
<param name="encoding">UTF-16LE</param>
<param name="targetEncoding">UTF-16LE</param>
</step>

best
Marc
Hi Chase,
thank you for your answer!
The more I know about how it works and should work, the more I have the feeling, the main problem is the following bug:
Could you and the Okapi dev team agree to this?
For us at the moment that would fix the problem. Then we would contribute the fix for this.
To be able to set the output encoding to something different then the input encoding is something, where I do not see, that for the foreseeable future we will need it. Therefore I would go with the basic bugfix, which in my eyes anyway would make sense, even if it would be possible to pass the output encoding somehow to Longhorn.
If you agree I think it would not be needed/make sense, that we take part in the dev meeting this afternoon.
best
Marc
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/CAGRYq4gfAr1V4ztvRQvM6J864D0TES4j_yd8RY6x11PNd0LZ-A%40mail.gmail.com.