Tips for speeding up Beancount?

204 views
Skip to first unread message

Matthew Harris

unread,
Aug 16, 2017, 5:50:49 PM8/16/17
to Beancount
How big are your input files, and how long does Beancount take to parse them?

My input file is 2.5M, 55k lines, 19515 directives (38017 postings in 13347 transactions). Running bean-web without a pickle cache takes about 30 seconds to display my data on a MacBook Air. It's gotten to the point that it's rather painful to update and reprocess my file.

I'm awfully tempted to split up my file and use "include"s, for a number of reasons, but I've resisted up to this point because
  • You use and advocate a single file.
  • I'm afraid that I could forget to "include" one of my files and never notice.
  • It looks like the pickle cache is only a single, root-level cache. (Would it be possible to cache each of the included files separately, so that when I split my file into n pieces and edit only one piece, I still get the benefit of the cache for the other n-1 pieces?)
Another option, which I've seen suggested for Ledger in the past, is to "close out" each year. That makes it harder to look at the complete history of a single account though.


Matthew

Martin Blais

unread,
Aug 17, 2017, 1:06:47 AM8/17/17
to Beancount
On Wed, Aug 16, 2017 at 5:50 PM, Matthew Harris <mharr...@gmail.com> wrote:
How big are your input files,

$ bean-report $L stats-entries
Type         Num Entries
-----------  -----------
Transaction        12259
Price               4511
Balance             3525
Document            1699
Open                 760
Event                308
Close                197
Commodity            128
Note                  78
Pad                   13
Query                  3
~Total~            23481
-----------  -----------

31677 postings.

Very similar scale to yours.

 
and how long does Beancount take to parse them?

bergamot:~$ time bean-check $L

real    0m6.882s
user    0m6.784s
sys     0m0.096s
bergamot:~$ time bean-check $L

real    0m0.508s
user    0m0.468s
sys     0m0.036s

On one of those:
with 32Gb of RAM running Linux.
(It's basically a souped-up laptop in a small box.)

 
My input file is 2.5M, 55k lines, 19515 directives (38017 postings in 13347 transactions). Running bean-web without a pickle cache takes about 30 seconds to display my data on a MacBook Air. It's gotten to the point that it's rather painful to update and reprocess my file.

I suspect most of that 30 secs is rendering time.
Try running this:

$ time bean-check -v $L
INFO    : Operation: 'beancount.parser.parser.parse_file'             Time:                651 ms
INFO    : Operation: 'beancount.parser.parser'                        Time:          651 ms
INFO    : Operation: 'beancount.ops.pad'                              Time:           71 ms
INFO    : Operation: 'beancount.ops.documents'                        Time:           86 ms
INFO    : Operation: 'beancount.plugins.ira_contribs'                 Time:           26 ms
INFO    : Operation: 'beancount.plugins.implicit_prices'              Time:          209 ms
INFO    : Operation: 'beancount.plugins.sellgains'                    Time:           26 ms
INFO    : Operation: 'washsales.commissions'                          Time:           31 ms
INFO    : Operation: 'beancount.plugins.check_commodity'              Time:           35 ms
INFO    : Operation: 'beancount.ops.balance'                          Time:         1819 ms
INFO    : Operation: 'function: validate_open_close'                  Time:                  7 ms
INFO    : Operation: 'function: validate_active_accounts'             Time:                 46 ms
INFO    : Operation: 'function: validate_currency_constraints'        Time:                 25 ms
INFO    : Operation: 'function: validate_duplicate_balances'          Time:                 10 ms
INFO    : Operation: 'function: validate_duplicate_commodities'       Time:                  5 ms
INFO    : Operation: 'function: validate_documents_paths'             Time:                  5 ms
INFO    : Operation: 'function: validate_check_transaction_balances'  Time:                257 ms
INFO    : Operation: 'function: validate_data_types'                  Time:                107 ms
INFO    : Operation: 'beancount.ops.validate'                         Time:          465 ms
INFO    : Operation: 'beancount.loader (total)'                       Time:   6586 ms

real    0m6.781s
user    0m6.708s
sys     0m0.068s

I sort-of live with it (I mostly use the SQL commands now), but I'd be lying if I said it doesn't annoy me.
Used to be snappy and fast, I think beyond 2secs it starts to annoy me.
I'm in a similar situation as you... getting annoyed, but not enough to actually do anything about it yet.



I'm awfully tempted to split up my file and use "include"s, for a number of reasons, but I've resisted up to this point because
  • You use and advocate a single file.
  • I'm afraid that I could forget to "include" one of my files and never notice.

 
  • It looks like the pickle cache is only a single, root-level cache. (Would it be possible to cache each of the included files separately, so that when I split my file into n pieces and edit only one piece, I still get the benefit of the cache for the other n-1 pieces?)
Some of that would be possible, but it's not trivial. 
I wanted to do this at some point, here are some notes I took at the time:

We could also do a run or two of profiling and take a couple of stabs at that (not much has been done in that dept so far TBH).
Beyond a few big cuts that way, I think the most oft-called functions could be translated to C and that would probably make it much faster.

I'll admit I'm more or less in maintenance mode for a little while.

 
Another option, which I've seen suggested for Ledger in the past, is to "close out" each year. That makes it harder to look at the complete history of a single account though.

Yes, and in the past I've made a lot of noise on the Ledger list about the fact that should be handled by the software...   Mainly because changing data in past years would force you to regenerate all the close/open transactions in all the files in all future years. I think that's not a workable scenario.
But you're right, it would circumvent the speed issue.
I think I have some code somewhere to generate those split transactions too.

Hmm, I think if there was a super fast way (e.g. in the parser, in C) to drop off transactions before a filtered date and replace them by that equivalent open transaction (computed from a previous run) that could potentially offer an on-the-fly version of this. Basically, can we build this as a feature, without forcing the user to edit the input file, and would it be worth it?

 


Matthew

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/b4b4ce09-0877-4a8a-a609-6aa6d97e89c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew S. Harris

unread,
Aug 20, 2017, 1:35:46 PM8/20/17
to bean...@googlegroups.com
On Wed, Aug 16, 2017 at 10:06 PM Martin Blais <bl...@furius.ca> wrote:
and how long does Beancount take to parse them? 

$ time bean-check finances.beancount
real 0m16.947s
user 0m16.393s
sys 0m0.305s

$ time bean-check finances.beancount
real 0m1.494s
user 0m1.304s
sys 0m0.135s
 
On one of those:
with 32Gb of RAM running Linux.
(It's basically a souped-up laptop in a small box.)

Ah, I'm going to consider getting one of those. Looks nice!
 
I suspect most of that 30 secs is rendering time.
Try running this:

$ time bean-check -v $L
 
$ time bean-check -v finances.beancount
INFO : Operation: 'beancount.parser.parser.parse_file' Time: 1635 ms
INFO : Operation: 'beancount.parser.parser' Time: 1635 ms
INFO : Operation: 'beancount.ops.pad' Time: 279 ms
INFO : Operation: 'beancount.ops.documents' Time: 33 ms
INFO : Operation: 'beancount.plugins.implicit_prices' Time: 585 ms
INFO : Operation: 'beancount.plugins.unrealized' Time: 703 ms
INFO : Operation: 'beancount.plugins.tag_pending' Time: 17 ms
INFO : Operation: 'beancount.plugins.check_commodity' Time: 64 ms
INFO : Operation: 'beancount.ops.balance' Time: 800 ms
INFO : Operation: 'function: validate_open_close' Time: 11 ms
INFO : Operation: 'function: validate_active_accounts' Time: 68 ms
INFO : Operation: 'function: validate_currency_constraints' Time: 48 ms
INFO : Operation: 'function: validate_duplicate_balances' Time: 7 ms
INFO : Operation: 'function: validate_duplicate_commodities' Time: 7 ms
INFO : Operation: 'function: validate_documents_paths' Time: 4 ms
INFO : Operation: 'function: validate_check_transaction_balances' Time: 676 ms
INFO : Operation: 'function: validate_data_types' Time: 186 ms
INFO : Operation: 'beancount.ops.validate' Time: 1009 ms
 
Looks like my times are about twice yours.

I sort-of live with it (I mostly use the SQL commands now), but I'd be lying if I said it doesn't annoy me.
Used to be snappy and fast, I think beyond 2secs it starts to annoy me.
I'm in a similar situation as you... getting annoyed, but not enough to actually do anything about it yet.

Looks like you've had the same ideas and many more, and it just hasn't been worth fixing yet.


Matthew 

Jason Chu

unread,
Aug 20, 2017, 2:21:26 PM8/20/17
to bean...@googlegroups.com
Have there been any tests with cython? I've had great results converting methods into c functions with type annotations. Would it be worth the additional dependency? Or possibly we could make it an optional dependency if done properly.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Martin Blais

unread,
Aug 20, 2017, 2:47:03 PM8/20/17
to Beancount
Never tried. 
Not a fan of custom syntax, but if it helps you prototype speed improvements +1. We could then make a final C translation.


To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Jason Chu

unread,
Aug 20, 2017, 3:17:20 PM8/20/17
to Beancount
I'm a fan of the readability/maintainability relative to straight C. It is also possible to commit the C output so that cython is only a development dependency and building just needs the current deps.

I think of it less like a custom syntax and more of a syntax augmentation.  All of the cython features I've used can be translated back to straight python very easily.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Metin Akat

unread,
Aug 22, 2017, 7:28:57 AM8/22/17
to bean...@googlegroups.com
Hi,

Here are my stats:

Type         Num Entries
-----------  -----------
Transaction        13071
Price                297
Open                 181
Close                  1
~Total~            13550
-----------  -----------


This is on a MacBook Pro late 2013


$ time bean-check lk.beancount

real 0m4.430s
user 0m4.148s
sys 0m0.134s

This is about the same as I have it in Ledger (still maintaining backwards compatibility via some includes):

$ time ledger -f lk.ledger bal
real 0m4.383s
user 0m4.291s
sys 0m0.066s

I don't know if it's because Ledger has a lot more overhead because of what it does, but by this stupid comparison, I don't really see a good reason to rewrite parts of Beancount in C/C++. I think it's much better to have more features and wait a little more for generating reports, instead of spending time in porting to C.


I also have a script that is based on the "net worth over time" beancount script (but does a lot more custom reporting) which is able to process my journal in ~40 seconds. This is much better than the ~2 minute times I saw for the same thing with Ledger.

Regards,
Metin


To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages