We've had numerous requests for better validation of EDIFACT documents. When the EDIFACT doc fails validation, the EDIFACT cartridge is Base64 encoding the message and inserting the encoded content within a BadMessage element. The BadMessage element gives no indication of what went wrong but the developer can troubleshoot the error by setting the debugging attribute on the edifact:parser to true. With debugging enabled, Daffodil prints the parser state to console, nonetheless, it can be hard to decipher the logs.
The guidance so far has been to feed the invalid document to a 3rd party EDIFACT validator (either manually or by having a Smooks visitor decode and route the BadMessage element) and resolve the issues in the document before reprocessing it in Smooks. At a glance, this guidance is assuming that the developer has access to an EDIFACT validator which may not be the case. More importantly, a document which is well-formed but invalid cannot be processed until it's fixed.
Given the above problems, I'm proposing that we introduce two validation modes to the cartridge: strict and lax. In strict validation, the behaviour is similar to the current one: a BadMessage is produced when the content in the document breaks a rule. However, somewhere within the message or the Smooks execution context, we include the details of the validation error such that it can be programmatically processed.
On the other hand, in lax validation, a document that is well-formed but invalid will be successfully read like a valid one though validation errors will be made available through the execution context.
To implement this feature, I suggest we leverage
Apache Daffodil's pluggable Schematron validator. We should be able to generate and embed Schematron rules from the EDIFACT directories. We might need to generate two flavours of the DFDL schemas so that both strict and lax validation can be supported. One downside I can think of from using Daffodil's custom validation is that it leads to the
whole document being loaded into memory (see
DAFFODIL-2386) so we should definitely have an option to disable Schematron validation in order to handle large documents.
Claude