Issues Using Treetop to parse OFX response files

12 views
Skip to first unread message

Daniel Doherty

unread,
Sep 7, 2023, 4:47:39 PM9/7/23
to Treetop Development
Listers,

I have successfully used treetop to parse OFX response files in an accounting app I have been tweaking for about 10 years.  It has worked like a charm.  OFX has two major version variants, a 100 series and a 200 series.

So far, my grammar has only dealt with the 100 series grammar, since that seems to be what the vast majority of financial institutions use.  They have different headers, but the bodies seem to be very much alike except for one major difference: in version 1xx, value-bearing tags can leave off the closing tag; in version 2xx, the closing tag is mandatory:

<AMOUNT>211.56

versus

<AMOUNT>211.56</AMOUNT>

My version 1 grammar has worked by leaving out the requirement for a closing tag, which has worked because apparently, the practice is almost universal for version 1xx OFX servers.  However, I am starting to see version 2xx servers and wanted to create a grammar for those.  But when I make the closing tag mandatory, the parse chokes.  Also when I make the closing tag optional in the version 1xx grammar, the parse chokes.

I have a github repository at this link with a test script, sample OFX files, and a README that explains all this.

Here is the version 1xx grammar:

```
grammar OfxGrammer do
  # The root of a parsed OFX response will be a single node, an Aggregate,
  # with a tag of '<OFX>' and several children.
  rule aggregate do
    space?
    tag
    space?
    children
    space?
    close_tag
    space?
    # This ensures that this rule matches only if the closing tag matches the
    # opening tag
    &{ |s| s[1].text_value.delete('</>') == s[5].text_value.delete('</>') }
    <Aggregate>
  end

  # Each Aggregate will have a Children node that will consist of zero or
  # more Aggregates and zero or more Fields.
  rule children do
    child*
    <Children>
  end

  rule child do
    aggregate / field
  end

  # A Field node will have a Tag node and a Value node.
  rule field do
    tag
    space?
    value
    space?
    # In OFX version 1, the closing tag is optional; in version 2 it is
    # mandatory.  The predicate ensures that this rule matches only if the
    # closing tag is either blank or matches the opening tag
    # close_tag? &{ |s| s[4].blank? ||
    #                s[0].text_value.delete('</>') == s[4].text_value.delete('</>')
    #             }
    <Field>
  end

  # '
  rule tag do
    '<' space? [.A-Z0-9]+ space? '>'
    <Tag>
  end

  rule close_tag do
    '</' space? [.A-Z0-9]+ space? '>'
  end

  rule value do
    [^<>]+
    <Value>
  end

  rule space do
    [\s]+
  end
end
```

And here is the version 2xx grammar:

```
grammar OfxGrammer do
  # The root of a parsed OFX response will be a single node, an Aggregate,
  # with a tag of '<OFX>' and several children.
  rule aggregate do
    space?
    tag
    space?
    children
    space?
    close_tag
    space?
    # This ensures that this rule matches only if the closing tag matches the
    # opening tag
    &{ |s| s[1].text_value.delete('</>') == s[5].text_value.delete('</>') }
    <Aggregate>
  end

  # Each Aggregate will have a Children node that will consist of zero or
  # more Aggregates and zero or more Fields.
  rule children do
    child*
    <Children>
  end

  rule child do
    aggregate / field
    # field / aggregate
  end

  # A Field node will have a Tag node and a Value node.
  rule field do
    tag
    space?
    value # &{ |s| debugger || true }
    space?
    # In OFX version 1, the closing tag is optional; in version 2 it is
    # mandatory.  The predicate ensures that this rule matches only if the
    # closing tag is either blank or matches the opening tag
    close_tag &{ |s| s[0].text_value.delete('</>') == s[4].text_value.delete('</>') }
    <Field>
  end

  # '
  rule tag do
    '<' space? [.A-Z0-9]+ space? '>' # '
    <Tag>
  end

  rule close_tag do
    '</' space? [.A-Z0-9]+ space? '>'
  end

  rule value do
    [^<]+
    <Value>
  end

  rule space do
    [\s]+
  end
end
```

Any suggestions?  You can clone the above github link to get sample OFX response files and a script to test against.

Cheers,

mar...@reality.com

unread,
Sep 8, 2023, 5:03:40 PM9/8/23
to Treetop Development

I'm not where I can test, and haven't messed with Treetop in years, but at
a guess I'd say you may be running into problems with the predicate
rejections getting cached. As the docs say:

Warning: This is an advanced feature. You need to understand the way a
packrat parser operates to use it correctly. The result of computing a
rule containing a semantic predicate will be memoized, even if the
same rule, applied later at the same location in the input, would
work differently due to a semantic predicate returning a different
value. If you don't understand the previous sentence yet still use
this feature, you're on your own, so test carefully!

Some things you might try to test this:

1) eliminate the predicate, thus implicitly assuming that either no tags
are closed or that all tags are closed and correctly paired.

2) try the new grammar on "flat" data (no nested tags), which should work
& detect mismatched closing tags

3) if there are a _very_ small number of tags, explicitly handle them in
the grammar

Hope this helps, at least a little.

--
Sent from ALPINE (GUI? Phooey!)

Doherty, Daniel

unread,
Sep 9, 2023, 4:33:09 PM9/9/23
to treet...@googlegroups.com
Marcus,

Thanks for taking the time to respond to my request.  I was able to get the grammars working (with v1xx now making the closing tag optional) on my test files.

While the predicates turned out to work fine, you did prod me into looking more closely at the docs.  I believe the principal culprit was failing to provide for optional spaces at certain critical points, but other nits as well.

If anyone is interested, I will leave the github page out there, which shows the detailed diffs in both grammars.

Many thanks.
====================================================
Daniel E. Doherty



--
You received this message because you are subscribed to the Google Groups "Treetop Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to treetop-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/treetop-dev/alpine.DEB.2.20.2309081554001.1102%40saul.reality.com.
Reply all
Reply to author
Forward
0 new messages