Proposal: Better EDIFACT validation

61 views
Skip to first unread message

Claude

unread,
Jul 22, 2023, 5:15:24 AM7/22/23
to Smooks Development
We've had numerous requests for better validation of EDIFACT documents. When the EDIFACT doc fails validation, the EDIFACT cartridge is Base64 encoding the message and inserting the encoded content within a BadMessage element. The BadMessage element gives no indication of what went wrong but the developer can troubleshoot the error by setting the debugging attribute on the edifact:parser to true. With debugging enabled, Daffodil prints the parser state to console, nonetheless, it can be hard to decipher the logs.

The guidance so far has been to feed the invalid document to a 3rd party EDIFACT validator (either manually or by having a Smooks visitor decode and route the BadMessage element) and resolve the issues in the document before reprocessing it in Smooks. At a glance, this guidance is assuming that the developer has access to an EDIFACT validator which may not be the case. More importantly, a document which is well-formed but invalid cannot be  processed until it's fixed. 

Given the above problems, I'm proposing that we introduce two validation modes to the cartridge: strict and lax. In strict validation, the behaviour is similar to the current one: a BadMessage is produced when the content in the document breaks a rule. However,  somewhere within the message or the Smooks execution context, we include the details of the validation error such that it can be programmatically processed.

On the other hand, in lax validation, a document that is well-formed but invalid will be successfully read like a valid one though validation errors will be made available through the execution context.

To implement this feature, I suggest we leverage Apache Daffodil's pluggable Schematron validator. We should be able to generate and embed Schematron rules from the EDIFACT directories. We might need to generate two flavours of the DFDL schemas so that both strict and lax validation can be supported. One downside I can think of from using Daffodil's custom validation is that it leads to the whole document being loaded into memory (see DAFFODIL-2386) so we should definitely have an option to disable Schematron validation in order to handle large documents.

Claude

Claude Mamo

unread,
Jul 22, 2023, 11:24:07 AM7/22/23
to smook...@googlegroups.com
Erratum:

When the EDIFACT doc fails validation, the EDIFACT cartridge is Base64 encoding the message and inserting the encoded content within a BadMessage element.

The cartridge is hexdecimal encoding the binary contents of the message.

Claude

--
You received this message because you are subscribed to the Google Groups "Smooks Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to smooks-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/smooks-dev/56a48cf8-cede-43c0-a347-ab2fee658b9dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages