Announce: new Beancount parser in Rust

331 views
Skip to first unread message

Simon Guest

unread,
Nov 12, 2023, 3:15:13 PM11/12/23
to Beancount
Ahoy Beancounters!

Of interest really only to developers, I created a new parser for Beancount in Rust, called beancount-parser-lima (because I am aware it is not the first such).

I'm currently actively working on this, with my current focus being adding Python bindings.

So far it parses everything I have tried.  No support for plugins, because, well, it's purely Rust so far!  (Also no support so far for query and custom directives, unsure how important they are.)

Performance and rich error reporting are the two headlines.

Hopefully this may be useful for some.

cheers,
Simon

Martin Blais

unread,
Nov 12, 2023, 3:46:50 PM11/12/23
to bean...@googlegroups.com
Thank you for sharing. Added to contrib doc.


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/d63af5c5-2670-4a65-9a88-570dabbb19fbn%40googlegroups.com.

Chary Chary

unread,
Nov 15, 2023, 4:54:06 AM11/15/23
to Beancount
Martin,

is this somehow alternative to the C++ parser, which you are working on in beancount v3?

Martin Blais

unread,
Nov 15, 2023, 6:29:01 AM11/15/23
to Beancount
The c++ parser in V3 is pretty much done.
This is other people writing a parser in rust.

It's worth noting that n beancount the parser alone isn't that useful. It's because beancount processes its data in two stages: parsing, which produces roughly a data structure that matches the input, and then running over the stream of transactions to fill in missing numbers and also running all the plugins. The result of that is the final stream of transactions that you can do queries on. The logic in the second part is where all the complexity lies. I haven't ported that.




Chary Chary

unread,
Nov 15, 2023, 4:28:34 PM11/15/23
to Beancount
Martin,

thanks for clarification

Simon Guest

unread,
Nov 15, 2023, 5:17:18 PM11/15/23
to Beancount
beancount-parser-lima may do a bit more than the existing core parser in Beancount proper.  It processes all the included files and returns a date-ordered list of all the directives, with all pragmas either processed or returned as a single Options structure.

It doesn't do the filling in of missing numbers, because that seems tricky indeed (and an application level thing not a parser level thing).  Nor plugins.

I'm currently making good progress with Python bindings, including some performance optimisations to reduce the number of allocations of Python objects.

Martin Blais

unread,
Nov 18, 2023, 1:16:02 PM11/18/23
to bean...@googlegroups.com
Have you guys seen this suite of tests?

Any parser that supports a suite like this I'd probably want to integrate and contemplate a Rust version of Beancount.
There are two crucial things at the bottom of it all:
- a good parser
- the ability to call to & from Python without any copies
Anything else is a piece of cake.



Simon Guest

unread,
Nov 18, 2023, 2:53:00 PM11/18/23
to bean...@googlegroups.com
I had seen that, and it looks interesting!

How could we free it from being C++ only? If such a test suite could be language independent, that would really open things up for testing of other language implementations.

The only thought I had so far would be for a parser adapter layer to output as, say JSON, which could be used for comparing with expected output for such a suite.

cheers,
Simon

Martin Blais

unread,
Nov 18, 2023, 3:02:22 PM11/18/23
to bean...@googlegroups.com
On Sat, Nov 18, 2023 at 2:53 PM Simon Guest <s...@cantab.net> wrote:
I had seen that, and it looks interesting!

How could we free it from being C++ only? If such a test suite could be language independent, that would really open things up for testing of other language implementations.

Well it's based on the original Python one:

If you have Python bindings and rendering woud be sufficient to get this right, you'd use the Python bindings to run the tests
I think this parser_test.py could be converted to assert the data structure as the C++ one does.
The elegant thing about the C++ one is that since it emits protobufs, using ascii protobuf syntax naturally provides a way to describe the data structure.
Do the Rust structs have an elegant ser/de to/from a text syntax?


The only thought I had so far would be for a parser adapter layer to output as, say JSON, which could be used for comparing with expected output for such a suite.

Urg... JSON. 
I know JSON's popular, but it would make a rather crappy syntax for the complexity of this schema.
(I know that's really widespread in the OSS community, and that's a real shame. JSON sucks with its three types.
protos much nicer and better defined. See the C++ test strings...)

I think the right thing to do would be to make the Rust parser produce protos and then you can create/assert equality against them in any language (e.g. Python of the Rust produced data). Protos are nice in that way, they're language independent.

But then we'd back to square one: how do you pass protos back and forth btw Rust and Python without making copies.
And besides, I'm pretty sure you'll prefer the Rust data structures over protos.
So maybe implement an elegant ser/de to some ascii format to/from Rust data structures, as you would with protos.
Then create a binding from Python to accept and parse a string that creates the corresponding Rust data structures.
Then implement a comparator function.
That's all a bunch of work -- none of it is rocket science, but you're really just redoing protos at that point.



Simon Guest

unread,
Nov 18, 2023, 3:37:22 PM11/18/23
to bean...@googlegroups.com
Interesting ideas here, Martin, thanks!

I'm really not attached to JSON, so don't worry on that account.

I won't have any time to work on my parser for the next couple of weeks, but will consider these things when I start again after that.

cheers,
Simon

Reply all
Reply to author
Forward
0 new messages