xml-to-edifact with german umlaute

182 просмотра
Перейти к первому непрочитанному сообщению

Andreas Auras

не прочитано,
8 июл. 2021 г., 09:55:4108.07.2021
– Smooks Users
Hi,
first many thanks for this exiting project!

I am using smooks for converting xml to edifact and have issues with text values containing german umlaute characters.
These characters are converted to Question marks in resulting edi stream.

this is my smooks config:

<?xml version="1.0" encoding="UTF-8"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"    xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd" xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd" xmlns:D01B="http://www.ibm.com/dfdl/edi/un/edifact/D01B">
    <core:filterSettings defaultSerialization="false" readerPoolSize="0"/>
    <edifact:unparser schemaURI="/d01b/EDIFACT-Messages.dfdl.xsd" unparseOnElement="*">
        <edifact:messageTypes>
            <edifact:messageType>ORDERS</edifact:messageType>
        </edifact:messageTypes>
    </edifact:unparser>
</smooks-resource-list>

this is my java code:

Smooks smooks = smooksProvider.getSmooks("classpath:/META-INF/smooks/orders-export-config.xml");
StringWriter result = new StringWriter();
smooks.filterSource(new StreamSource(new StringReader(xmlOrders)), new StreamResult(result));
String ediOrders = result.toString();
Writer writer = Files.newBufferedWriter(filepath, StandardCharsets.ISO_8859_1));
writer.write(ediOrders);

this is snippet of input xml:

<E4440>Speicherbestellung. Bitte St&#xfc;ck genau liefern. Finalen Beleg beachten</E4440>

this is snippet of resulting edi:

FTX+ZZZ+++Speicherbestellung. Bitte St?ck genau liefern. Finalen Beleg beachten'

What i am doing wrong?

nick.i...@gmail.com

не прочитано,
9 июл. 2021 г., 11:04:0109.07.2021
– Smooks Users
Hi Andreas,

it looks like encoding for the read part is missing:

smooks.filterSource(new StreamSource(new StringReader(xmlOrders)), new StreamResult(result));

you could try like this:

smooks.filterSource(new StreamSource(new InputStreamReader(new FileInputStream(xmlOrders), StandardCharsets.ISO_8859_1)), new StreamResult(result));

Regards,
Nick

Andreas Auras

не прочитано,
12 июл. 2021 г., 10:39:4812.07.2021
– Smooks Users
Hi,
@nick Thank you for quick answer.
My source xml is not in a file but the result of a previous template evaluation witch returns the usual utf-16 encoded java string.
As you can see in the xml snippet the german umlaute are escaped as hexadecimal iso-8859-1 codes. So the real content
of the string is pure ascii.
The xml is prefixed by "<?xml version="1.0" encoding="ISO-8859-1"?>" but it seems that the xml reader smooks uses does not care about it.
Now i found the solution myself:
You have to set the encoding with the help of the smooks execution context:

ExecutionContext execContext = smooks.createExecutionContext();
execContext.setContentEncoding("ISO-8859-1");
smooks.filterSource(new StreamSource(execContext, new StringReader(xmlOrders)), new StreamResult(result));

Alexandru

не прочитано,
26 июл. 2021 г., 10:14:1626.07.2021
– Smooks Users
Hi,
We are also having issues converting the german umlaut characters. What we do not understand is why it does not work with UTF-8 encoding (the resulting xml is prefixed with <?xml version="1.0" encoding="UTF-8" standalone="yes"?>).
However, the converter results in ???? for those special characters "äüöß". However, if we use the ISO_8859_1 encoding everything works fine.
Here is the block we use for converting the Interchange:
val byteArrayOutputStream = ByteArrayOutputStream()
val marshaller = jaxbContext.createMarshaller()
val encoding = StandardCharsets.ISO_8859_1.toString() // if we replace this with UTF-8 it results in ????
marshaller.setProperty(Marshaller.JAXB_ENCODING, encoding)
marshaller.marshal(interchange, byteArrayOutputStream)
val stringResult = StringResult()
val execContext = smooksInstance.createExecutionContext()
execContext.contentEncoding = encoding
smooksInstance.filterSource(
execContext,
ByteSource(byteArrayOutputStream.toString(encoding).toByteArray()),
stringResult
)

What are we missing?
Ответить всем
Отправить сообщение автору
Переслать
0 новых сообщений