Splitting Huge EDI

57 views
Skip to first unread message

Jeff Bradley

unread,
May 24, 2021, 10:45:18 AM5/24/21
to Smooks Users
Is anyone using smooks to process huge EDI? The freemarker-huge-transform example on Github does not run due to an error (and there are no unit tests), and the documentation is not helpful in my use case. 

For example, we receive EDI files that are gigabytes in size. We need to persist each envelope (interchange, group, transaction set), and each document (invoice, PO, etc). Ideally this would be done without bringing the entire EDI into memory, but based on the example it seems that the best we will be able to do is split on the group, pull an entire group into memory, route that over JMS, split again on transaction set, route each transaction set over JSM, and then split on each document, and route that over JMS. 

Is anyone handling huge EDI in this way?

Claude Mamo

unread,
May 24, 2021, 11:39:47 AM5/24/21
to smook...@googlegroups.com
The freemarker-huge-transform example on Github does not run due to an error (and there are no unit tests), and the documentation is not helpful in my use case.

I wasn't aware there was no test for freemarker-huge-transform example. I'm not very familiar with that example but can you open an issue about it? We perhaps need to revamp the example to factor in the new pipeline feature.

We need to persist each envelope (interchange, group, transaction set), and each document (invoice, PO, etc).

If I am understanding you correctly Jeff, you'd like to persist each envelope header (i.e., ISA, GS, and ST) as well as each transaction set. I can't see why it can't be done efficiently as long as each item is represented as its own element. For example:

<core:smooks filterSourceOn="ISA" maxNodeDepth="0">
  <core:config>
    <smooks-resource-list>
      <resource-config selector="#document">
        <resource>...</resource>
      </resource-config>
    </smooks-resource-list>
  </core:config>
</core:smooks>

<core:smooks filterSourceOn="GS" maxNodeDepth="0">
  <core:config>
    <smooks-resource-list>
      <resource-config selector="#document">
        <resource>...</resource>
      </resource-config>
    </smooks-resource-list>
  </core:config>
</core:smooks>

<core:smooks filterSourceOn="ST" maxNodeDepth="0">
  <core:config>
    <smooks-resource-list>
      <resource-config selector="#document">
        <resource>...</resource>
      </resource-config>
    </smooks-resource-list>
  </core:config>
</core:smooks>

<core:smooks filterSourceOn="850" maxNodeDepth="0">
  <core:config>
    <smooks-resource-list>
      <resource-config selector="#document">
        <resource>...</resource>
      </resource-config>
    </smooks-resource-list>
  </core:config>
</core:smooks>

Like this, Smooks is bringing into memory, one at a time, the headers and the transaction sets. Pipelines are used to build the header and transaction set node trees but you can avoid pipelines and reduce the memory footprint by streaming each node:

<resource-config selector="ISA/*">
  <resource>...</resource>
</resource-config>

Claude

--
You received this message because you are subscribed to the Google Groups "Smooks Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to smooks-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/smooks-user/b68b84cb-5e2c-4f7d-9bf1-ce89a51c71ddn%40googlegroups.com.

Jeff Bradley

unread,
May 24, 2021, 11:48:15 AM5/24/21
to Smooks Users
Thanks for the response Claude. I think I am following what you are saying, but I need to be able to associate documents (i.e. 850) to their envelopes, and don't see how to do that like this, since the envelopes and documents are processed separately.

Claude Mamo

unread,
May 24, 2021, 12:02:35 PM5/24/21
to smook...@googlegroups.com
They are processed separately but on the same thread, at least in my example, so you should be able to leverage ExecutionContext to stash away a correlation ID such as the interchange control number and then retrieve it later on from the document visitor.

Claude

Jeff Bradley

unread,
May 24, 2021, 12:15:03 PM5/24/21
to Smooks Users
I can't find any reference to that in the docs. How can I stash a correlation id in the ExecutionContext?

Claude Mamo

unread,
May 24, 2021, 12:34:21 PM5/24/21
to smook...@googlegroups.com
Yeah, the docs are lacking in some parts. Will make a note about documenting this scenario. Here's one easy way how to stash values:

    @Override
    public void visitAfter(Element element, ExecutionContext executionContext) {
        NodeList childNodes = element.getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
            if (childNodes.item(i).getLocalName().equals("ISA13")) {
                executionContext.put(new TypedKey<>("myCorrelationId"), childNodes.item(i).getTextContent());
            }
        }
    }

Another visitor can then retrieve the value with:

executionContext.get(new TypedKey<String>("myCorrelationId"));

Claude

Claude

unread,
May 24, 2021, 1:14:04 PM5/24/21
to Smooks Users
I'm realising that the first snippet can probably be simplified to:

@Override
public void visitAfter(Element element, ExecutionContext executionContext) {
    executionContext.put(new TypedKey<>("myCorrelationId"), element.getElementsByTagName("ISA13").item(0).getTextContent());
}

Claude

Jeff Bradley

unread,
May 25, 2021, 7:32:52 AM5/25/21
to Smooks Users
Claude, I'm still trying to wrap my head around what you are suggesting. Thanks for your patience.

If I understand, you are suggesting a Smooks config like below.

<?xml version="1.0" encoding="UTF-8" ?>
<smooks-resource-list
  xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
  xmlns:edi="https://www.smooks.org/xsd/smooks/edi-2.0.xsd">

<edi:parser schemaURI="/mappings/odfl-shipment-status-message.dfdl.xsd"
segmentTerminator="~"
dataElementSeparator="*"
compositeDataElementSeparator="^"/>

<import file="/mappings/interchange-map.xml"/>
<import file="/mappings/shipment-status-message-map.xml"/>

<resource-config selector="ISA">
<resource>com.my.visitor.IsaVisitor</resource>
</resource-config>

<resource-config selector="B10">
<resource>com.my.visitor.B10Visitor</resource>
</resource-config>

</smooks-resource-list>


And for the visitors, something like this for the ISA visitor:

public class IsaVisitor implements AfterVisitor {


 @Override
 public void visitAfter(Element element, ExecutionContext executionContext) {
   var isaNumber = element.getElementsByTagName("sender-id")
     .item(0)
     .getTextContent();
   executionContext.put(new TypedKey<>("isaNumber"), isaNumber);
   }
}

And this for the B10 visitor:

public class  B10Visitor implements AfterVisitor {


  @Override
  public void visitAfter(Element element, ExecutionContext executionContext) {
    var bean = (ShipmentStatus) executionContext.getBeanContext().getBean("shipmentStatus");
    var isaNumber = executionContext.get(new TypedKey<>("isaNumber"));
    bean.setTransactionSetIdentifier(isaNumber.toString());
  }
}


And then getting the StreamResult from the filter event stream like so:

StreamSource streamSource = new StreamSource("my_file.edi");
StreamResult streamResult = new StreamResult();
otherSmooks.filterSource(context, streamSource, streamResult);

Is that correct?

Jeff Bradley

unread,
May 25, 2021, 7:48:30 AM5/25/21
to Smooks Users
To be clear, I haven't been able to make the above config/visitors run without error, so trying to figure out the issue.

I would still like to hear from others in the community as well, if anyone is implementing Smooks for huge EDI. Any insight into real-world implementation is appreciated.

Claude Mamo

unread,
May 25, 2021, 9:09:23 AM5/25/21
to smook...@googlegroups.com
Hi Jeff,

If I understand, you are suggesting a Smooks config like below.

Yes and no. As it stands, you should be getting a NullPointerException because the ISA element in IsaVisitor doesn't have child nodes: Keep in mind that Smooks doesn't accumulate child nodes in memory unless the maxNodeDepth parameter is set. There are different ways for setting this parameter. One way is to use pipelines. Another way is to implement the ParameterizedVisitor interface in your visitor class:

public class IsaVisitor implements ParameterizedVisitor {


    @Override
    public void visitAfter(Element element, ExecutionContext executionContext) {
        var isaNumber = element.getElementsByTagName("sender-id").item(0).getTextContent();
        executionContext.put(new TypedKey<>("isaNumber"), isaNumber);
    }

    @Override
    public int getMaxNodeDepth() {
        return Integer.MAX_VALUE;
    }

    @Override
    public void visitBefore(Element element, ExecutionContext executionContext) {

    }
}

By setting the maxNodeDepth to the maximum value, Smooks will not discard any of the ISA element's child nodes which means that IsaVisitor can reference the sender-id child element. As a side note, I'm not sure why this Java coding is needed. I would have thought that it's possible to declaratively set the bean's transactionSetIdentifier in your mapping config but perhaps I'm missing something.

Claude

Jeff Bradley

unread,
May 25, 2021, 9:17:34 AM5/25/21
to Smooks Users
That is a good point, which I have also not been able to make work but have tried. I have not been able to reference properties of any other beans declaratively in the java bean mapping. 

In the example below, how would the ShipmentStatusMessage bean mapping reference the senderId from the "outer" Group bean? I have tried everything I can find in the docs and examples but nothing works.

<jb:bean beanId="group" class="com.novapath.isd.domain.data.Group" createOnElement="group">
<jb:value data="group-code" property="groupCode" />
<jb:value data="sender-id" property="senderId" />
<jb:value data="receiver-id" property="receiverId" />
<jb:value data="date" property="date" />
<jb:value data="time" property="time" />
<jb:value data="control-number" property="controlNumber" />
<jb:value data="agency-code" property="agencyCode" />
<jb:value data="version" property="version" />
<jb:wiring beanIdRef="transactionSets" property="shipmentStatuses"/>
</jb:bean>

<jb:bean beanId="transactionSets" class="java.util.ArrayList" createOnElement="group" retain="true">
<jb:wiring beanIdRef="shipmentStatus" />
</jb:bean>

<jb:bean beanId="shipmentStatus" class="com.novapath.isd.domain.data.ShipmentStatus" createOnElement="shipment-status">
<jb:value data="???" property="senderId" />
</jb:bean>


Claude Mamo

unread,
May 25, 2021, 10:00:09 AM5/25/21
to smook...@googlegroups.com
Are you saying that the following doesn't work?

<jb:bean beanId="shipmentStatus" class="com.novapath.isd.domain.data.ShipmentStatus" createOnElement="shipment-status">
<jb:value data="sender-id" property="senderId" />
</jb:bean>

The data attribute is meant to hold a selector that references the source stream. Not sure if you can reference a bean's property from within another bean. Perhaps it's possible with jb:expression .

Claude

Jeff Bradley

unread,
May 25, 2021, 10:01:44 AM5/25/21
to Smooks Users
Correct, that does not work.

An expression is an idea.

Claude

unread,
May 25, 2021, 10:46:28 AM5/25/21
to Smooks Users
Can you post a sample of the document you're attempting to map?

Claude

Jeff Bradley

unread,
May 25, 2021, 10:52:25 AM5/25/21
to Smooks Users
Sure. FYI, the expression worked :success:

ISA*00* *00* *02*ODFL           *ZZ*1234567        *170202*1244*U*00401*000001062*0*P*>~
GS*QM*ODFL*1234567*20170202*124452*906*X*004010~
ST*214*0001~
B10*02625962200*NAMCLF0005633*TEST~
L11*4550465693*PO~
LX*1~
AT7*D1*NS***20170202*1217*ET~
AT8*G*L*8142*15~
SE*7*0001~
ST*214*0001~
B10*02625962201*NAMCLF0005633*FOOB~
L11*4550465693*PO~
LX*1~
AT7*D1*NS***20170202*1217*ET~
AT8*G*L*8142*15~
SE*7*0001~
GE*2*9185~
IEA*1*000015198~

Claude Mamo

unread,
May 25, 2021, 10:53:57 AM5/25/21
to smook...@googlegroups.com
Sorry, I meant after it's read, in its XML form.

Claude

Jeff Bradley

unread,
May 25, 2021, 10:56:28 AM5/25/21
to Smooks Users
<shipment-status-message>
<interchange>
<authorization-qualifier>00</authorization-qualifier>
<authorization-information></authorization-information>
<security-qualifier>00</security-qualifier>
<security-information></security-information>
<sender-qualifier>02</sender-qualifier>
<sender-id>ODFL </sender-id>
<receiver-qualifier>ZZ</receiver-qualifier>
<receiver-id>1234567 </receiver-id>
<date>170202</date>
<time>1244</time>
<repetition-separator>U</repetition-separator>
<version>00401</version>
<control-number>000001062</control-number>
<acknowledgment-requested>0</acknowledgment-requested>
<usage-indicator>P</usage-indicator>
<composite-separator>&gt;</composite-separator>
</interchange>
<group>
<group-code>QM</group-code>
<sender-id>ODFL</sender-id>
<receiver-id>1234567</receiver-id>
<date>20170202</date>
<time>124452</time>
<control-number>906</control-number>
<agency-code>X</agency-code>
<version>004010</version>
</group>
<transaction-sets>
<shipment-status>
<identifier-code>214</identifier-code>
<sequence>0001</sequence>
<reference-identification>02625962200</reference-identification>
<shipment-identification-number>NAMCLF0005633</shipment-identification-number>
<scac>ODFL</scac>
<reference-numbers>
<reference-number>
<reference-identification>4550465693</reference-identification>
<reference-qualifier>PO</reference-qualifier>
</reference-number>
</reference-numbers>
<location-information/>
<assigned-number>1</assigned-number>
<indicator-code>D1</indicator-code>
<reason-code-1>NS</reason-code-1>
<status-code></status-code>
<reason-code-2></reason-code-2>
<date>20170202</date>
<time>1217</time>
<time-code>ET</time-code>
<weight-qualifier>G</weight-qualifier>
<weight-unit-code>L</weight-unit-code>
<weight>8142</weight>
<lading-quantity>15</lading-quantity>
<number-of-segments>7</number-of-segments>
<control-number>0001</control-number>
</shipment-status>
<shipment-status>
<identifier-code>214</identifier-code>
<sequence>0001</sequence>
<reference-identification>02625962201</reference-identification>
<shipment-identification-number>NAMCLF0005633</shipment-identification-number>
<scac>ODFL</scac>
<reference-numbers>
<reference-number>
<reference-identification>4550465693</reference-identification>
<reference-qualifier>PO</reference-qualifier>
</reference-number>
</reference-numbers>
<location-information/>
<assigned-number>1</assigned-number>
<indicator-code>D1</indicator-code>
<reason-code-1>NS</reason-code-1>
<status-code></status-code>
<reason-code-2></reason-code-2>
<date>20170202</date>
<time>1217</time>
<time-code>ET</time-code>
<weight-qualifier>G</weight-qualifier>
<weight-unit-code>L</weight-unit-code>
<weight>8142</weight>
<lading-quantity>15</lading-quantity>
<number-of-segments>7</number-of-segments>
<control-number>0001</control-number>
</shipment-status>
</transaction-sets>
<ge>
<transaction-count>2</transaction-count>
<control-number>9185</control-number>
</ge>
<iea>
<group-count>1</group-count>
<control-number>000015198</control-number>
</iea>
</shipment-status-message>

Claude

unread,
May 25, 2021, 12:06:59 PM5/25/21
to Smooks Users
OK, I understand what's going on and it's behaving as expected. The problem is that the shipmentStatus bean is created on the shipment-status event. This event happens after the sender-id event. Since the bean is created on the shipment-status event, any wiring must happen on events that occur after the shipment-status event because the bean won't exist in prior events. You could change this:

<jb:bean beanId="shipmentStatus" class="com.novapath.isd.domain.data.ShipmentStatus" createOnElement="shipment-status">
    <jb:value data="sender-id" property="senderId" />
</jb:bean>

to

<jb:bean beanId="shipmentStatus" class="com.novapath.isd.domain.data.ShipmentStatus" createOnElement="interchange">
    <jb:value data="sender-id" property="senderId" />
</jb:bean>

However, given that you may have multiple shipment-status events, you'll have only a single shipmentStatus bean instead of a shipmentStatus bean for every shipment-status event. Luckily, jb:expression does what you need.

Claude

Jeff Bradley

unread,
May 25, 2021, 12:12:08 PM5/25/21
to Smooks Users
Yeah, 10-4, thanks.

Another question about your OG answer: in order to stream the results, is it required to use the StreamResult as opposed to JavaResult? I am interested in that, but am having trouble finding usage of it. If I can achieve the same thing with JavaResult that would be great, but seems like that would not be as memory efficient as processing the StreamResult.

Claude Mamo

unread,
May 25, 2021, 1:01:06 PM5/25/21
to smook...@googlegroups.com
You can stream Java beans but it would be with a listener rather than a JavaResult. I had blogged about this a few years ago but here's a snippet showing how one would go about this:

    ExecutionContext executionContext = smooks.createExecutionContext();

        // set an event listener on Smooks
        executionContext.getBeanContext().addObserver(new BeanContextLifecycleObserver() {
            @Override
            public void onBeanLifecycleEvent(BeanContextLifecycleEvent event) {

                // apply logic only when Smooks has made a 'org.ossandme.Product' and set its properties
                if (event.getLifecycle().equals(BeanLifecycle.END_FRAGMENT) && event.getBeanId().toString().equals("product")) {
                    Product product = (Product) event.getBean();

                    System.out.println(product.getItemDesc());
                    // DO STUFF
                    // ...
                }
            }
        });

Claude

Jeff Bradley

unread,
May 25, 2021, 1:08:42 PM5/25/21
to Smooks Users
Awesome I'll check that out and the blog :)
Reply all
Reply to author
Forward
0 new messages