Hello Claude,
many thanks for your answers.
At the moment we are running fine with the Java-based approach. I also enabled support for multiple EDIFACT versions including the slicing per document type by subclassing EdifactDataProcessorFactory and changing materialiseEntrySchema to my needs. Maybe we will re-introduce XML config and profile support if we need to support further transformation types.
This issue was fixed a while back and it will be rolled out in RC2 which should go out in the next few days.
Already updated, works like a charm now.
I noticed though that there are no Java bindings available for the four new schema versions (d20a, d20b, d21a, d21b). Is this intended or were they just overseen?
Is there some kind of documentation of how to read the debug output of
Smooks? There are information bits like bitPosition, childIndex,
foundDelimiter, etc. but I have no clue how this can help me.
Probably this is produced from Apache Daffodil. Could you post an example?
Here is a snippet cut out of the output:
diff:
bitPosition: 30528 -> 30536
childIndex: 484 -> 485
foundDelimiter: + -> (no value)
foundField: 1 -> (no value)
groupIndex: 1 -> 2
----------------------------------------------------------------- 1268
parser: <Element name='E0020'><DelimiterStackParser>...</DelimiterStackParser></Element>
bitPosition: 30536
data:
│ │
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000ee0: 3127 0a55 4e5a 2b31 2b31 3127 0a 1'␊UNZ+1+11'␊
infoset:
<?xml version="1.0" encoding="UTF-8"?>
<UNZ>
<E0036>1</E0036>
<E0020></E0020>
</UNZ>
diff:
(no differences)
----------------------------------------------------------------- 1269
parser: <StringDelimitedParser/>
bitPosition: 30552
data:
├───┤ ├┤
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000ee0: 0a55 4e5a 2b31 2b31 3127 0a ␊UNZ+1+11'␊
infoset:
<?xml version="1.0" encoding="UTF-8"?>
<UNZ>
<E0036>1</E0036>
<E0020>11</E0020>
</UNZ>
diff:
bitPosition: 30536 -> 30552
foundDelimiter: (no value) -> '␊
foundField: (no value) -> 11
- Are there ways to find out programmatically if and at which line there is an issue in a given message?
Not
that I know of. I'm seeking to implement better error handling support since it has been asked for in the past. I know it's
not ideal but for now my recommendation is to route bad documents to another application. I had created an
example showing this. It's not specific to EDIFACT but you can easily adapt it to your use-case.
Until now I could distinguish the following error causes:
1) On the one hand there are documents which have an old EDIFACT syntax version. Daffodil supports syntax versions 3 and 4, but cannot process versions 1 and 2. This looks like a dead end, unless there is another solution for such documents (but I don't think so).
2) Second, there are documents that do not match their schema. These again break down into two groups:
2.1) Documents with minor problems, such as invalid enum values, can be processed as long as ValidationMode is switched to Off. The invalid values must then be handled subsequently.
2.2) Larger problems, like 20 FTX segments in a row instead of 5, cause the processing to jump to the BadMessage branch. Here I had already a case, which had specified as version 96A, which did not work; with 99B however the conversion had worked. Maybe it would be a fallback approach to try other versions in case of a BadMessage result.
If anyone has any further insights on this topic, I would appreciate feedback.
Best regards,
Axel