Tips for speeding up Beancount?

Matthew Harris

unread,

Aug 16, 2017, 5:50:49 PM8/16/17

to Beancount

How big are your input files, and how long does Beancount take to parse them?

My input file is 2.5M, 55k lines, 19515 directives (38017 postings in 13347 transactions). Running bean-web without a pickle cache takes about 30 seconds to display my data on a MacBook Air. It's gotten to the point that it's rather painful to update and reprocess my file.

I'm awfully tempted to split up my file and use "include"s, for a number of reasons, but I've resisted up to this point because

You use and advocate a single file.
I'm afraid that I could forget to "include" one of my files and never notice.
It looks like the pickle cache is only a single, root-level cache. (Would it be possible to cache each of the included files separately, so that when I split my file into n pieces and edit only one piece, I still get the benefit of the cache for the other n-1 pieces?)

Another option, which I've seen suggested for Ledger in the past, is to "close out" each year. That makes it harder to look at the complete history of a single account though.

Matthew

Martin Blais

unread,

Aug 17, 2017, 1:06:47 AM8/17/17

to Beancount

On Wed, Aug 16, 2017 at 5:50 PM, Matthew Harris <mharr...@gmail.com> wrote:

How big are your input files,

$ bean-report $L stats-entries

Type Num Entries

----------- -----------

Transaction 12259

Price 4511

Balance 3525

Document 1699

Open 760

Event 308

Close 197

Commodity 128

Note 78

Pad 13

Query 3

~Total~ 23481

----------- -----------

31677 postings.

Very similar scale to yours.

and how long does Beancount take to parse them?

bergamot:~$ time bean-check $L

real 0m6.882s

user 0m6.784s

sys 0m0.096s

bergamot:~$ time bean-check $L

real 0m0.508s

user 0m0.468s

sys 0m0.036s

On one of those:

https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc6i5syh.html

with 32Gb of RAM running Linux.

(It's basically a souped-up laptop in a small box.)

My input file is 2.5M, 55k lines, 19515 directives (38017 postings in 13347 transactions). Running bean-web without a pickle cache takes about 30 seconds to display my data on a MacBook Air. It's gotten to the point that it's rather painful to update and reprocess my file.

I suspect most of that 30 secs is rendering time.

Try running this:

$ time bean-check -v $L

INFO : Operation: 'beancount.parser.parser.parse_file' Time: 651 ms

INFO : Operation: 'beancount.parser.parser' Time: 651 ms

INFO : Operation: 'beancount.ops.pad' Time: 71 ms

INFO : Operation: 'beancount.ops.documents' Time: 86 ms

INFO : Operation: 'beancount.plugins.ira_contribs' Time: 26 ms

INFO : Operation: 'beancount.plugins.implicit_prices' Time: 209 ms

INFO : Operation: 'beancount.plugins.sellgains' Time: 26 ms

INFO : Operation: 'washsales.commissions' Time: 31 ms

INFO : Operation: 'beancount.plugins.check_commodity' Time: 35 ms

INFO : Operation: 'beancount.ops.balance' Time: 1819 ms

INFO : Operation: 'function: validate_open_close' Time: 7 ms

INFO : Operation: 'function: validate_active_accounts' Time: 46 ms

INFO : Operation: 'function: validate_currency_constraints' Time: 25 ms

INFO : Operation: 'function: validate_duplicate_balances' Time: 10 ms

INFO : Operation: 'function: validate_duplicate_commodities' Time: 5 ms

INFO : Operation: 'function: validate_documents_paths' Time: 5 ms

INFO : Operation: 'function: validate_check_transaction_balances' Time: 257 ms

INFO : Operation: 'function: validate_data_types' Time: 107 ms

INFO : Operation: 'beancount.ops.validate' Time: 465 ms

INFO : Operation: 'beancount.loader (total)' Time: 6586 ms

real 0m6.781s

user 0m6.708s

sys 0m0.068s

I sort-of live with it (I mostly use the SQL commands now), but I'd be lying if I said it doesn't annoy me.

Used to be snappy and fast, I think beyond 2secs it starts to annoy me.

I'm in a similar situation as you... getting annoyed, but not enough to actually do anything about it yet.

I'm awfully tempted to split up my file and use "include"s, for a number of reasons, but I've resisted up to this point because
You use and advocate a single file.
I'm afraid that I could forget to "include" one of my files and never notice.

It looks like the pickle cache is only a single, root-level cache. (Would it be possible to cache each of the included files separately, so that when I split my file into n pieces and edit only one piece, I still get the benefit of the cache for the other n-1 pieces?)

Some of that would be possible, but it's not trivial.

I wanted to do this at some point, here are some notes I took at the time:

https://bitbucket.org/blais/beancount/src/c5dfae27c8b598c1267a741b972388072618c1a7/TODO?at=default&fileviewer=file-view-default#TODO-2272

We could also do a run or two of profiling and take a couple of stabs at that (not much has been done in that dept so far TBH).

Beyond a few big cuts that way, I think the most oft-called functions could be translated to C and that would probably make it much faster.

I'll admit I'm more or less in maintenance mode for a little while.

Another option, which I've seen suggested for Ledger in the past, is to "close out" each year. That makes it harder to look at the complete history of a single account though.

Yes, and in the past I've made a lot of noise on the Ledger list about the fact that should be handled by the software... Mainly because changing data in past years would force you to regenerate all the close/open transactions in all the files in all future years. I think that's not a workable scenario.

But you're right, it would circumvent the speed issue.

I think I have some code somewhere to generate those split transactions too.

Hmm, I think if there was a super fast way (e.g. in the parser, in C) to drop off transactions before a filtered date and replace them by that equivalent open transaction (computed from a previous run) that could potentially offer an on-the-fly version of this. Basically, can we build this as a feature, without forcing the user to edit the input file, and would it be worth it?

Matthew

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew S. Harris

unread,

Aug 20, 2017, 1:35:46 PM8/20/17

to bean...@googlegroups.com

On Wed, Aug 16, 2017 at 10:06 PM Martin Blais <bl...@furius.ca> wrote:

and how long does Beancount take to parse them?

$ time bean-check finances.beancount
real 0m16.947s
user 0m16.393s
sys 0m0.305s

$ time bean-check finances.beancount
real 0m1.494s
user 0m1.304s
sys 0m0.135s

On one of those:
https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc6i5syh.html
with 32Gb of RAM running Linux.
(It's basically a souped-up laptop in a small box.)

Ah, I'm going to consider getting one of those. Looks nice!

I suspect most of that 30 secs is rendering time.
Try running this:

$ time bean-check -v $L

$ time bean-check -v finances.beancount
INFO : Operation: 'beancount.parser.parser.parse_file' Time: 1635 ms
INFO : Operation: 'beancount.parser.parser' Time: 1635 ms
INFO : Operation: 'beancount.ops.pad' Time: 279 ms
INFO : Operation: 'beancount.ops.documents' Time: 33 ms
INFO : Operation: 'beancount.plugins.implicit_prices' Time: 585 ms
INFO : Operation: 'beancount.plugins.unrealized' Time: 703 ms
INFO : Operation: 'beancount.plugins.tag_pending' Time: 17 ms
INFO : Operation: 'beancount.plugins.check_commodity' Time: 64 ms
INFO : Operation: 'beancount.ops.balance' Time: 800 ms
INFO : Operation: 'function: validate_open_close' Time: 11 ms
INFO : Operation: 'function: validate_active_accounts' Time: 68 ms
INFO : Operation: 'function: validate_currency_constraints' Time: 48 ms
INFO : Operation: 'function: validate_duplicate_balances' Time: 7 ms
INFO : Operation: 'function: validate_duplicate_commodities' Time: 7 ms
INFO : Operation: 'function: validate_documents_paths' Time: 4 ms
INFO : Operation: 'function: validate_check_transaction_balances' Time: 676 ms
INFO : Operation: 'function: validate_data_types' Time: 186 ms
INFO : Operation: 'beancount.ops.validate' Time: 1009 ms

Looks like my times are about twice yours.

I sort-of live with it (I mostly use the SQL commands now), but I'd be lying if I said it doesn't annoy me.
Used to be snappy and fast, I think beyond 2secs it starts to annoy me.
I'm in a similar situation as you... getting annoyed, but not enough to actually do anything about it yet.

Looks like you've had the same ideas and many more, and it just hasn't been worth fixing yet.

Matthew

Jason Chu

unread,

Aug 20, 2017, 2:21:26 PM8/20/17

to bean...@googlegroups.com

Have there been any tests with cython? I've had great results converting methods into c functions with type annotations. Would it be worth the additional dependency? Or possibly we could make it an optional dependency if done properly.

--

You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CALo5b9Lt-HE%2BnXst016yjuN8uH9KqcQgpNfumKfdDq%3DW6WhGcA%40mail.gmail.com.

Martin Blais

unread,

Aug 20, 2017, 2:47:03 PM8/20/17

to Beancount

Never tried.

Not a fan of custom syntax, but if it helps you prototype speed improvements +1. We could then make a final C translation.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CALo5b9Lt-HE%2BnXst016yjuN8uH9KqcQgpNfumKfdDq%3DW6WhGcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAFFHUgtc_aj5dZ5TXUEJOuipY13dzCn8GVNjL7FLoaS5P5zfjw%40mail.gmail.com.

Jason Chu

unread,

Aug 20, 2017, 3:17:20 PM8/20/17

to Beancount

I'm a fan of the readability/maintainability relative to straight C. It is also possible to commit the C output so that cython is only a development dependency and building just needs the current deps.

I think of it less like a custom syntax and more of a syntax augmentation. All of the cython features I've used can be translated back to straight python very easily.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CALo5b9Lt-HE%2BnXst016yjuN8uH9KqcQgpNfumKfdDq%3DW6WhGcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAFFHUgtc_aj5dZ5TXUEJOuipY13dzCn8GVNjL7FLoaS5P5zfjw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhP2oudoZL-tsW8h34hbXKf0xA9SNu1aaG4NrvZyocHFFg%40mail.gmail.com.

Metin Akat

unread,

Aug 22, 2017, 7:28:57 AM8/22/17

to bean...@googlegroups.com

Hi,

Here are my stats:

Type Num Entries

----------- -----------

Transaction 13071

Price 297

Open 181

Close 1

~Total~ 13550

----------- -----------

This is on a MacBook Pro late 2013

$ time bean-check lk.beancount

real 0m4.430s

user 0m4.148s

sys 0m0.134s

This is about the same as I have it in Ledger (still maintaining backwards compatibility via some includes):

$ time ledger -f lk.ledger bal

real 0m4.383s

user 0m4.291s

sys 0m0.066s

I don't know if it's because Ledger has a lot more overhead because of what it does, but by this stupid comparison, I don't really see a good reason to rewrite parts of Beancount in C/C++. I think it's much better to have more features and wait a little more for generating reports, instead of spending time in porting to C.

I also have a script that is based on the "net worth over time" beancount script (but does a lot more custom reporting) which is able to process my journal in ~40 seconds. This is much better than the ~2 minute times I saw for the same thing with Ledger.

Regards,

Metin

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CALo5b9Lt-HE%2BnXst016yjuN8uH9KqcQgpNfumKfdDq%3DW6WhGcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAFFHUgtc_aj5dZ5TXUEJOuipY13dzCn8GVNjL7FLoaS5P5zfjw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhP2oudoZL-tsW8h34hbXKf0xA9SNu1aaG4NrvZyocHFFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAFFHUgsTib1MnEDRxEJwr8EmAz2keQJoR-geW-t1jZ8MSHHCuw%40mail.gmail.com.

Reply all

Reply to author

Forward