Listers,
I have successfully used treetop to parse OFX response files in an accounting app I have been tweaking for about 10 years. It has worked like a charm. OFX has two major version variants, a 100 series and a 200 series.
So far, my grammar has only dealt with the 100 series grammar, since that seems to be what the vast majority of financial institutions use. They have different headers, but the bodies seem to be very much alike except for one major difference: in version 1xx, value-bearing tags can leave off the closing tag; in version 2xx, the closing tag is mandatory:
<AMOUNT>211.56
versus
<AMOUNT>211.56</AMOUNT>
My version 1 grammar has worked by leaving out the requirement for a closing tag, which has worked because apparently, the practice is almost universal for version 1xx OFX servers. However, I am starting to see version 2xx servers and wanted to create a grammar for those. But when I make the closing tag mandatory, the parse chokes. Also when I make the closing tag optional in the version 1xx grammar, the parse chokes.
I have a github repository at
this link with a test script, sample OFX files, and a README that explains all this.
Here is the version 1xx grammar:
```
grammar OfxGrammer do
# The root of a parsed OFX response will be a single node, an Aggregate,
# with a tag of '<OFX>' and several children.
rule aggregate do
space?
tag
space?
children
space?
close_tag
space?
# This ensures that this rule matches only if the closing tag matches the
# opening tag
&{ |s| s[1].text_value.delete('</>') == s[5].text_value.delete('</>') }
<Aggregate>
end
# Each Aggregate will have a Children node that will consist of zero or
# more Aggregates and zero or more Fields.
rule children do
child*
<Children>
end
rule child do
aggregate / field
end
# A Field node will have a Tag node and a Value node.
rule field do
tag
space?
value
space?
# In OFX version 1, the closing tag is optional; in version 2 it is
# mandatory. The predicate ensures that this rule matches only if the
# closing tag is either blank or matches the opening tag
# close_tag? &{ |s| s[4].blank? ||
# s[0].text_value.delete('</>') == s[4].text_value.delete('</>')
# }
<Field>
end
# '
rule tag do
'<' space? [.A-Z0-9]+ space? '>'
<Tag>
end
rule close_tag do
'</' space? [.A-Z0-9]+ space? '>'
end
rule value do
[^<>]+
<Value>
end
rule space do
[\s]+
end
end
```
And here is the version 2xx grammar:
```
grammar OfxGrammer do
# The root of a parsed OFX response will be a single node, an Aggregate,
# with a tag of '<OFX>' and several children.
rule aggregate do
space?
tag
space?
children
space?
close_tag
space?
# This ensures that this rule matches only if the closing tag matches the
# opening tag
&{ |s| s[1].text_value.delete('</>') == s[5].text_value.delete('</>') }
<Aggregate>
end
# Each Aggregate will have a Children node that will consist of zero or
# more Aggregates and zero or more Fields.
rule children do
child*
<Children>
end
rule child do
aggregate / field
# field / aggregate
end
# A Field node will have a Tag node and a Value node.
rule field do
tag
space?
value # &{ |s| debugger || true }
space?
# In OFX version 1, the closing tag is optional; in version 2 it is
# mandatory. The predicate ensures that this rule matches only if the
# closing tag is either blank or matches the opening tag
close_tag &{ |s| s[0].text_value.delete('</>') == s[4].text_value.delete('</>') }
<Field>
end
# '
rule tag do
'<' space? [.A-Z0-9]+ space? '>' # '
<Tag>
end
rule close_tag do
'</' space? [.A-Z0-9]+ space? '>'
end
rule value do
[^<]+
<Value>
end
rule space do
[\s]+
end
end
```
Any suggestions? You can clone the above github link to get sample OFX response files and a script to test against.
Cheers,