Programmatically Rewriting Ledgers

Aaron Lindsay

unread,

Nov 30, 2020, 5:11:58 PM11/30/20

to Beancount

I've been working on automating some accounting tasks lately and am curious for feedback from others on my approach, or ideas for doing it better in the future.

I really dislike manually organizing my ledger file. I do still want to look over incoming transactions, but ideally would like everything else to be automated for me - I'd like transactions to be uniformly formatted and ordered. `bean-format` can take care of most of the former, but I've been using `bean-query example.beancount print` (wrapped in a shell script to preserve the options and remove trailing spaces) to do the ordering for me. This isn't ideal since it's actually modifying the ledger in the process. It seems like there are more possibilities for programmatically managing your .beancount file, too: I can imagine keeping prices separate from transactions, maybe automatically moving transactions for certain accounts to their own file, etc.

Is there a better way to programmatically rewrite ledgers by hooking into pieces of the beancount internals today? If not, will v3 have any impact on this? I'm looking at the "Intermediate Parsed Data vs. Final List of Directives" section of the v3 document, but am not sure I grok the beancount internals enough to understand the implications there, if any.

-Aaron

Martin Blais

unread,

Nov 30, 2020, 11:26:13 PM11/30/20

to Beancount

You cannot reinsert the output of "bean-query print" back into the Ledger, it's just not designed to work that way, that will not work. There's the input syntax, Beancount parses that, and then makes changes to the transactions, some which are dependent on the state of the inventories just before it (chronologically, e.g., matching cost specs to available lots). In fact, if anything in v3 the introduction of currency trading accounts and transfer accounts (for postings from a single transaction at different dates) will make that distinction even more pronounced, as a single transaction will often result in multiple transactions realized in the output stream. It may look like they're the same because the data types coming out of the parser are also reused for the final realization (they're close enough), but they're different. One is the straightforward translation by the parser of what it read to a data structure--its AST, if you will-- an incomplete, partially filled and yet-to-be booked and interpolated and processed directives, and the other is the final result after all the work is done, which you be summed up to compute inventory states. With "bean-query print" you're producing the final result, not a 1:1 translation of the input.

What you may be able to do instead is manipulate the input text. But in v3 I'll provide both beginning and end lines as part of all directive data, so that should allow you to break apart your file cleanly, using the same parser used for processing. Also, I could easily provide some library function to accept a file and produce a mapping of

(parser-directive, text)

whereby the parser directive is the unprocessed, straightforward translation of your input text to the intermediate data structure, which you could more easily inspect and reason about - keeping in mind that some of the fields may be unset,e.g. if you leave out a number to be interpolated -- in order to decide where to reinsert the corresponding text in your output file. Please file a ticket if this would be useful. I'm in the process of rewriting the parser in C++ (almost done actually).

One question that will remain is what to do with comments immediately preceding and/or following a transaction. Those are often associated with the transaction and slicing and dicing files to put them back together should probably preserve the comments like that. I'd have to inject comments into the grammar in order to do that (that may not be trivial).

Justus Pendleton

unread,

Dec 1, 2020, 8:47:12 AM12/1/20

to Beancount

On Tuesday, December 1, 2020 at 11:26:13 AM UTC+7 bl...@furius.ca wrote:

One question that will remain is what to do with comments immediately preceding and/or following a transaction. Those are often associated with the transaction and slicing and dicing files to put them back together should probably preserve the comments like that. I'd have to inject comments into the grammar in order to do that (that may not be trivial).

I've made two or three half-hearted attempts at programmatically reformatting ledgers and this is definitely one of the biggest sticking points. If comments were an actual part of the transaction -- the way docstrings are on a python method, for instance -- it would be great.

But there are some other complications:

- Markers for code folding and just in general how to handle "sections". Do you want all of your "#europe-holiday-2019" tagged transactions grouped together automatically? Do you want that behaviour for *all* tags? (Probably not!)

- How to handle commented out things that should stay close to the non-commented things out they are related to. For instance, I have a custom fava option commented out. But after reformatting it should still be adjacent to all the other fava options!

- Handling de-facto multi-line comments (again, on things that aren't transactions, consider a plugin configuration of multiple lines of embedded JSON that has been commented out)

- Handling include files

- Comments on things that aren't transactions -- prices, commodities

- Inline comments on individual legs of a transaction. For instance I have one transaction that looks like:

2017-08-31 * "Transfer to Vietnam"

Assets:US:Blah -50 USD

Expenses:Bank-Fees 10 USD ; charged by correspondent bank on the wire transfer

Assets:VN:Foo 40 USD

- Handling all of the non-transaction things that you probably (maybe?) don't want mixed in among transactions. But where *do* you want them? I keep all of mine in separate files: prices in one file, balances in another, etc.

In the end I felt like there were so many edge cases and I struggled to see much real benefit outside of my own OCD.

Stefano Zacchiroli

unread,

Dec 1, 2020, 8:58:43 AM12/1/20

to bean...@googlegroups.com

On Mon, Nov 30, 2020 at 02:11:58PM -0800, Aaron Lindsay wrote:
Is there a better way to programmatically rewrite ledgers by hooking into
> pieces of the beancount internals today? If not, will v3 have any impact on
> this?

As a side comment on this topic, I'm more and more convinced that the
ability to programmatically edit the syntax of textual ledgers is
(should be) a key feature for plain text accounting. And that such a
feature would naturally complement the great support that Beancount
already has for plugins (which modify the semantics, rather than the
syntax, of textual ledgers).

The reason is that, thanks to version control, when doing plain text
accounting with a computer we routinely alter the *historical* records,
e.g., by improving how we book past transactions, fixing mistakes,
etc. --- which is something both great and novel in comparison to
traditional accounting practices.

Now the problem lies in programmatically editing *concrete* syntax,
which is a complicated problem in general, due to the amount of
information that parsers tend to throw away when converting to ASTs.
So, Martin, everything the new Beancount parsers manage to keep
(locations, comments, etc.) would be definitely welcome in this respect.

Another ingredient that would help is a very-opinionated,
fully-automated formatter for Beancount syntax, similar to what Black[1]
is to Python. With something like that an hypothetical "sed" equivalent
for Beancount syntax would be able to worry less about getting right
details such as spacing, indentation, etc. --- it will just have to pipe
its output to bean-format(-ng) and be done with it. (But of course this
is assuming that nothing is lost in the concrete syntax -> AST
translation, and most notably comments.)

Cheers

[1]: https://github.com/psf/black
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Martin Blais

unread,

Dec 1, 2020, 9:27:44 AM12/1/20

to Beancount

On Tue, Dec 1, 2020 at 8:58 AM Stefano Zacchiroli <za...@upsilon.cc> wrote:

Another ingredient that would help is a very-opinionated,
fully-automated formatter for Beancount syntax, similar to what Black[1]
is to Python. With something like that an hypothetical "sed" equivalent
for Beancount syntax would be able to worry less about getting right
details such as spacing, indentation, etc. --- it will just have to pipe
its output to bean-format(-ng) and be done with it. (But of course this
is assuming that nothing is lost in the concrete syntax -> AST
translation, and most notably comments.)

Yes... bean-format was intended to be a bit like that.

(I went 70% of the way, then dropped the ball on some corner cases.)

It needs a bit more work, but it could be made much better, even without custom parsing (just processing the text in bean-format).

Maybe in v3 I'll expose an interface to parse and produce the intermediate transaction, and implement a proper printer for it that can reverse the process. That would help with that.

In v2 the printer can grok *some* intermediate transactions and all finalized ones. It's designed to print finalized transactions.

99% of the time we print only finalized ones, but it's happened in the past for debugging and for writing tests that I print the intermediate transactions.

I think in v3 I should make a distinct printer (and code path) that can fully reverse the intermediate partially filled transaction.

I think that's doable.

That, in itself, would naturally be the best formatter.

Aaron Lindsay

unread,

Dec 1, 2020, 9:36:42 AM12/1/20

to Beancount

On Monday, November 30, 2020 at 11:26:13 PM UTC-5 bl...@furius.ca wrote:

On Mon, Nov 30, 2020 at 5:12 PM Aaron Lindsay <acli...@gmail.com> wrote:
I've been working on automating some accounting tasks lately and am curious for feedback from others on my approach, or ideas for doing it better in the future.

I really dislike manually organizing my ledger file. I do still want to look over incoming transactions, but ideally would like everything else to be automated for me - I'd like transactions to be uniformly formatted and ordered. `bean-format` can take care of most of the former, but I've been using `bean-query example.beancount print` (wrapped in a shell script to preserve the options and remove trailing spaces) to do the ordering for me. This isn't ideal since it's actually modifying the ledger in the process. It seems like there are more possibilities for programmatically managing your .beancount file, too: I can imagine keeping prices separate from transactions, maybe automatically moving transactions for certain accounts to their own file, etc.

Is there a better way to programmatically rewrite ledgers by hooking into pieces of the beancount internals today? If not, will v3 have any impact on this? I'm looking at the "Intermediate Parsed Data vs. Final List of Directives" section of the v3 document, but am not sure I grok the beancount internals enough to understand the implications there, if any.

You cannot reinsert the output of "bean-query print" back into the Ledger, it's just not designed to work that way, that will not work. There's the input syntax, Beancount parses that, and then makes changes to the transactions, some which are dependent on the state of the inventories just before it (chronologically, e.g., matching cost specs to available lots). In fact, if anything in v3 the introduction of currency trading accounts and transfer accounts (for postings from a single transaction at different dates) will make that distinction even more pronounced, as a single transaction will often result in multiple transactions realized in the output stream. It may look like they're the same because the data types coming out of the parser are also reused for the final realization (they're close enough), but they're different. One is the straightforward translation by the parser of what it read to a data structure--its AST, if you will-- an incomplete, partially filled and yet-to-be booked and interpolated and processed directives, and the other is the final result after all the work is done, which you be summed up to compute inventory states. With "bean-query print" you're producing the final result, not a 1:1 translation of the input.

Well, it does sort of work - I've been doing it for a few weeks now, for better or for worse. But as you mention, it *does* make things more explicit than they were in the input text. I got started down this path because I didn't have cost basis from a GnuCash import and thought running it through `bean-query print` was the most expedient way to convert those accounts from FIFO to STRICT. Maybe I should've just stuck with FIFO, I don't know... In either case, I'm definitely not arguing that `bean-query print` is ideal. I'd love a better solution.

What you may be able to do instead is manipulate the input text. But in v3 I'll provide both beginning and end lines as part of all directive data, so that should allow you to break apart your file cleanly, using the same parser used for processing. Also, I could easily provide some library function to accept a file and produce a mapping of

(parser-directive, text)

whereby the parser directive is the unprocessed, straightforward translation of your input text to the intermediate data structure, which you could more easily inspect and reason about - keeping in mind that some of the fields may be unset,e.g. if you leave out a number to be interpolated -- in order to decide where to reinsert the corresponding text in your output file. Please file a ticket if this would be useful. I'm in the process of rewriting the parser in C++ (almost done actually).

I do think this would be useful - I'll try to write it up. How would user code hook into beancount to get this listing? Would there just be a parser function call to be imported and called via arbitrary python that would return a list of these tuples?

One question that will remain is what to do with comments immediately preceding and/or following a transaction. Those are often associated with the transaction and slicing and dicing files to put them back together should probably preserve the comments like that. I'd have to inject comments into the grammar in order to do that (that may not be trivial).

I can see an argument for pushing the burden of dealing with comments back onto whoever uses such a feature. I think that would mean simply providing them in the same format as non-comments, perhaps keeping everything in the order it appears in the input. User code can decide how to associate them (maybe always to the first subsequent non-comment, maybe fancier heuristics like the 'closest' non-comment when taking newlines into account, etc.). To Justus' point, though, I don't know how that would work for comments which are part of other transactions. Or maybe you're proposing that such comments would be preserved in the `text` portion of `(parser-directive, text)` for that transaction?

Thanks!

-Aaron

Martin Blais

unread,

Dec 1, 2020, 9:56:25 AM12/1/20

to Beancount

On Tue, Dec 1, 2020 at 8:47 AM Justus Pendleton <just...@gmail.com> wrote:

On Tuesday, December 1, 2020 at 11:26:13 AM UTC+7 bl...@furius.ca wrote:
One question that will remain is what to do with comments immediately preceding and/or following a transaction. Those are often associated with the transaction and slicing and dicing files to put them back together should probably preserve the comments like that. I'd have to inject comments into the grammar in order to do that (that may not be trivial).

I've made two or three half-hearted attempts at programmatically reformatting ledgers and this is definitely one of the biggest sticking points. If comments were an actual part of the transaction -- the way docstrings are on a python method, for instance -- it would be great.

But there are some other complications:

- Markers for code folding and just in general how to handle "sections". Do you want all of your "#europe-holiday-2019" tagged transactions grouped together automatically? Do you want that behaviour for *all* tags? (Probably not!)
- How to handle commented out things that should stay close to the non-commented things out they are related to. For instance, I have a custom fava option commented out. But after reformatting it should still be adjacent to all the other fava options!
- Handling de-facto multi-line comments (again, on things that aren't transactions, consider a plugin configuration of multiple lines of embedded JSON that has been commented out)
- Handling include files

All great points Justus; it seems to me with some conventions specific to your own file you can probably get away with it.

- Comments on things that aren't transactions -- prices, commodities
- Inline comments on individual legs of a transaction. For instance I have one transaction that looks like:

2017-08-31 * "Transfer to Vietnam"
Assets:US:Blah -50 USD
Expenses:Bank-Fees 10 USD ; charged by correspondent bank on the wire transfer
Assets:VN:Foo 40 USD

Okay, so let's talk about comments as docstrings and full round-trip (with comments) capability.

Let's play with some ideas in this doc (click the button on the top right like to edit, please just add, don't delete):

http://furius.ca/beancount/doc/parsed-comments

https://docs.google.com/document/d/1yestw21g4AEMNrIUsBuOaxucfz3_7eMAR6NYnVnTzV0/

In particular, if we make a big schema change like that, I think the transaction's narration could be merged with the comment.

Basically each transaction and posting would have a single comment field.

Have to run, back after work...

- Handling all of the non-transaction things that you probably (maybe?) don't want mixed in among transactions. But where *do* you want them? I keep all of mine in separate files: prices in one file, balances in another, etc.

In the end I felt like there were so many edge cases and I struggled to see much real benefit outside of my own OCD.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/7c27c941-79d8-4d4e-af32-94c394ecf265n%40googlegroups.com.

Aaron Lindsay

unread,

Dec 1, 2020, 10:33:27 AM12/1/20

to Beancount

On Tuesday, December 1, 2020 at 9:36:42 AM UTC-5 Aaron Lindsay wrote:

On Monday, November 30, 2020 at 11:26:13 PM UTC-5 bl...@furius.ca wrote:
What you may be able to do instead is manipulate the input text. But in v3 I'll provide both beginning and end lines as part of all directive data, so that should allow you to break apart your file cleanly, using the same parser used for processing. Also, I could easily provide some library function to accept a file and produce a mapping of

(parser-directive, text)

whereby the parser directive is the unprocessed, straightforward translation of your input text to the intermediate data structure, which you could more easily inspect and reason about - keeping in mind that some of the fields may be unset,e.g. if you leave out a number to be interpolated -- in order to decide where to reinsert the corresponding text in your output file. Please file a ticket if this would be useful. I'm in the process of rewriting the parser in C++ (almost done actually).

I do think this would be useful - I'll try to write it up. How would user code hook into beancount to get this listing? Would there just be a parser function call to be imported and called via arbitrary python that would return a list of these tuples?

I filed https://github.com/beancount/beancount/issues/586. One thing I realized when writing it up is that many of the more sophisticated things I can envision wanting to do with a programmatic ability to re-write a ledger involve adding or changing the contents of directives in addition to merely moving them around. Obviously this adds complexity, but it seems like maybe the main pieces are already being discussed (parsing comments inside transactions, and being able to print the intermediate representation)? Feel free to tell me I'm getting a little too crazy, though!

-Aaron

Manuel Amador (Rudd-O)

unread,

Dec 1, 2020, 12:26:33 PM12/1/20

to bean...@googlegroups.com

On 01/12/2020 14.58, Stefano Zacchiroli wrote:

Now the problem lies in programmatically editing *concrete* syntax,
which is a complicated problem in general, due to the amount of
information that parsers tend to throw away when converting to ASTs.
So, Martin, everything the new Beancount parsers manage to keep
(locations, comments, etc.) would be definitely welcome in this respect.

+1


Another ingredient that would help is a very-opinionated,
fully-automated formatter for Beancount syntax, similar to what Black[1]
is to Python. With something like that an hypothetical "sed" equivalent
for Beancount syntax would be able to worry less about getting right
details such as spacing, indentation, etc. --- it will just have to pipe
its output to bean-format(-ng) and be done with it. (But of course this
is assuming that nothing is lost in the concrete syntax -> AST
translation, and most notably comments.)

+1

-- 
Rudd-O
    http://rudd-o.com/

OpenPGP_0x5C06F67A8BDEBA09_and_old_rev.asc

OpenPGP_signature

Martin Blais

unread,

Dec 2, 2020, 7:45:35 AM12/2/20

to Beancount

No it's great your ticket description is spot on. Thanks for filing it.

I'll try to prototype comments parsing plus printer for intermediate ast in v3 and let you know. The new parser is working at this point - generates exactly what the python one did but in c++ protos - I just have to figure out the python bindings for protos.

--

You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/37d0dbe1-0d15-4edf-8081-6c876250dcd5n%40googlegroups.com.

James Cook

unread,

Dec 2, 2020, 4:52:53 PM12/2/20

to bean...@googlegroups.com

Another example where programmatically modifying could be useful:

Recently I wanted to add a stable unique ID to every posting to a certain account (as a new metadata field). In my case I was able to hack something together through plain text editing, but maybe if programmatic re-writing were easy to do I could have saved some time.

(The hash-based IDs available in bean-query won't work because I don't want the ID to change if I edit the transaction. My goal is to eventually export these to a different beancount ledger I have (joint vs. personal accounting) using beancount-import.)

James

Martin Blais

unread,

Dec 2, 2020, 8:13:15 PM12/2/20

to Beancount

Idea: Auto generate a hash from just the part of the transaction that will not change even if you edit it.

James Cook

unread,

Dec 2, 2020, 9:36:57 PM12/2/20

to bean...@googlegroups.com

Yeah, that would be simpler. There isn't any part of the transaction that I'm confident I won't edit, but maybe date+amount would be close enough.

Reply all

Reply to author

Forward