xml-to-edifact with german umlaute

Skip to first unread message

Andreas Auras

Jul 8, 2021, 9:55:41 AMJul 8
to Smooks Users
first many thanks for this exiting project!

I am using smooks for converting xml to edifact and have issues with text values containing german umlaute characters.
These characters are converted to Question marks in resulting edi stream.

this is my smooks config:

<?xml version="1.0" encoding="UTF-8"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"    xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd" xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd" xmlns:D01B="http://www.ibm.com/dfdl/edi/un/edifact/D01B">
    <core:filterSettings defaultSerialization="false" readerPoolSize="0"/>
    <edifact:unparser schemaURI="/d01b/EDIFACT-Messages.dfdl.xsd" unparseOnElement="*">

this is my java code:

Smooks smooks = smooksProvider.getSmooks("classpath:/META-INF/smooks/orders-export-config.xml");
StringWriter result = new StringWriter();
smooks.filterSource(new StreamSource(new StringReader(xmlOrders)), new StreamResult(result));
String ediOrders = result.toString();
Writer writer = Files.newBufferedWriter(filepath, StandardCharsets.ISO_8859_1));

this is snippet of input xml:

<E4440>Speicherbestellung. Bitte St&#xfc;ck genau liefern. Finalen Beleg beachten</E4440>

this is snippet of resulting edi:

FTX+ZZZ+++Speicherbestellung. Bitte St?ck genau liefern. Finalen Beleg beachten'

What i am doing wrong?


Jul 9, 2021, 11:04:01 AMJul 9
to Smooks Users
Hi Andreas,

it looks like encoding for the read part is missing:

smooks.filterSource(new StreamSource(new StringReader(xmlOrders)), new StreamResult(result));

you could try like this:

smooks.filterSource(new StreamSource(new InputStreamReader(new FileInputStream(xmlOrders), StandardCharsets.ISO_8859_1)), new StreamResult(result));


Andreas Auras

Jul 12, 2021, 10:39:48 AMJul 12
to Smooks Users
@nick Thank you for quick answer.
My source xml is not in a file but the result of a previous template evaluation witch returns the usual utf-16 encoded java string.
As you can see in the xml snippet the german umlaute are escaped as hexadecimal iso-8859-1 codes. So the real content
of the string is pure ascii.
The xml is prefixed by "<?xml version="1.0" encoding="ISO-8859-1"?>" but it seems that the xml reader smooks uses does not care about it.
Now i found the solution myself:
You have to set the encoding with the help of the smooks execution context:

ExecutionContext execContext = smooks.createExecutionContext();
smooks.filterSource(new StreamSource(execContext, new StringReader(xmlOrders)), new StreamResult(result));


Jul 26, 2021, 10:14:16 AMJul 26
to Smooks Users
We are also having issues converting the german umlaut characters. What we do not understand is why it does not work with UTF-8 encoding (the resulting xml is prefixed with <?xml version="1.0" encoding="UTF-8" standalone="yes"?>).
However, the converter results in ???? for those special characters "äüöß". However, if we use the ISO_8859_1 encoding everything works fine.
Here is the block we use for converting the Interchange:
val byteArrayOutputStream = ByteArrayOutputStream()
val marshaller = jaxbContext.createMarshaller()
val encoding = StandardCharsets.ISO_8859_1.toString() // if we replace this with UTF-8 it results in ????
marshaller.setProperty(Marshaller.JAXB_ENCODING, encoding)
marshaller.marshal(interchange, byteArrayOutputStream)
val stringResult = StringResult()
val execContext = smooksInstance.createExecutionContext()
execContext.contentEncoding = encoding

What are we missing?
Reply all
Reply to author
0 new messages