working with multiple journal files

1,046 views
Skip to first unread message

Tom Frei

unread,
Dec 7, 2012, 5:37:14 PM12/7/12
to hle...@googlegroups.com
I have the latest version of hledger installed on a linux system.  I'm relatively new to ledger and now hledger but I really like how it works for me.  I'm starting to use hledger for a small residential rental business I have and it is working out great.  Now I want to use it with my personal finances. What is the best way to work with multiple journal files?  Use the "-f  file.journal" option? Or will hledger look for ".hledger.journal" in the current directly only. If this was the case, I could simply divide my accounts into different directories.  Anything else I'm missing?

Simon Michael

unread,
Dec 8, 2012, 4:12:50 PM12/8/12
to hle...@googlegroups.com
Hi Tom, welcome. That's something to consider. Currently it always looks for the default journal file in your home directory, when its not specified by the LEDGER_FILE environment variable or -f option. Normally -f is what to use when working with different files. I've sometimes made shell aliases to save typing, eg

alias hours='hledger -f work.timelog'
alias work='hledger -f work.journal'

etc.

Best
-Simon

Tom Frei

unread,
Dec 16, 2012, 11:23:47 AM12/16/12
to hle...@googlegroups.com
Thank you Simon.  Now that I'm used to it, the -f option works very well for me.  By the way, hledger is a very nice piece of software.  Thank you for putting it together.

Simon Michael

unread,
Dec 16, 2012, 3:59:42 PM12/16/12
to hle...@googlegroups.com
Thanks, you are welcome!

Simon

naren...@gmail.com

unread,
Mar 10, 2016, 8:44:44 AM3/10/16
to hledger
Hi Simon,

I have been using hledger for 6 months now. Now, I am curious how exactly people switch files after an year has passed. Because I think people don't maintain a single file for all their finances during their whole lifetime. 
It would be great if you share with me your setup so that I can use the same.

Best,
Narendra Joshi

Martin Blais

unread,
Mar 10, 2016, 9:40:02 AM3/10/16
to hle...@googlegroups.com
On Thu, Mar 10, 2016 at 3:39 AM, <naren...@gmail.com> wrote:
Hi Simon,

I have been using hledger for 6 months now. Now, I am curious how exactly people switch files after an year has passed. Because I think people don't maintain a single file for all their finances during their whole lifetime. 

Why not?
That's what I do.


 
It would be great if you share with me your setup so that I can use the same.

Best,
Narendra Joshi

On Monday, 17 December 2012 02:29:42 UTC+5:30, Simon Michael (sm) wrote:
Thanks, you are welcome!

Simon

On Sun, Dec 16, 2012, at 08:23 AM, Tom Frei wrote:
> Thank you Simon.  Now that I'm used to it, the -f option works very well
> for me.  By the way, hledger is a very nice piece of software.  Thank you
> for putting it together.
>

--
You received this message because you are subscribed to the Google Groups "hledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hledger+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Michael

unread,
Mar 10, 2016, 10:36:14 AM3/10/16
to hle...@googlegroups.com
Hi Narendra,

I've written about this before, but I'm not sure where.. here's what I've been doing for a while.

I don't keep all history in my default file with hledger, because it slows down daily reporting. Current hledger parses fairly quickly but not quickly enough that you'd want to be processing all history unnecessarily on each run. 

I split it by year: general-YYYY.journal. general.journal is a symbolic link to the current year. 

I usually have the current journal include the previous year for a bit more history. (Actually, I haven't moved 2015 data into its own file yet, time to do that.)

I have all.journal which includes all years (using the include directive), and "alias all='hledger -f all.journal'" in my bashrc, for querying all history. 

Also "alias YYYY=hledger -f general-YYYY.journal" for quickly querying a specific year.

There is a trick: at the end of each year's journal, add a "closing balances" transaction transferring the year-end asset/liability account balances to equity, bringing all asset and liability accounts to zero. And at the start of the next file, add an "opening balances" transaction which reverses that. This ensures you will see correct historical asset/liability balances independent of whether you are processing a single year's journal or several of them at once. You may need to filter out these closing/opening transactions from some reports, eg with " not:'(opening|closing)' ".

Simon Michael

unread,
Mar 10, 2016, 10:55:28 AM3/10/16
to hle...@googlegroups.com
Correction: not:desc:'(opening|closing)'


Martin Blais

unread,
Mar 11, 2016, 10:51:15 AM3/11/16
to hle...@googlegroups.com
If you remove data by splitting your input to multiple files you remove your ability to report on arbitrary periods, e.g. any period that would straddle that date.

But given this might be needed, I just had an idea to fix this: create two features: 
1. Given a date, automatically generate a closing and opening transaction that balance each other out; link them together with a unique tag; insert these to your files to split them up. You might even be able to automate the splitting of the files, imagine a script that produces two files, keeping every non-matching line, all in order, dispatching the input transaction to its respective file.
2. Add a feature (in my case a plugin) that would replace all the transactions which match a particular tag by a single summarizing transaction. if the balances are all zero, don't insert anything. This should annul the closing + opening transactions to each other.

If you support providing multiple input files, one should be able to provide the last e.g., three years, and obtain an uninterrupted stream of their transactions to carry out reporting over any subperiod within that, without viewing opening/closing transactions.

One question: What part is slow and why? It's surprising to me that my slow and naive Python code should be sufficiently fast to process the whole thing every time while your swanky single-pass Haskell implementation should be slower and require splitting. Curious to understand why it's slow.






Simon Michael

unread,
Mar 11, 2016, 12:33:09 PM3/11/16
to hle...@googlegroups.com

> On Mar 11, 2016, at 7:50 AM, Martin Blais <bl...@furius.ca> wrote:
> If you remove data by splitting your input to multiple files you remove your ability to report on arbitrary periods, e.g. any period that would straddle that date.

yes, that's what my all.journal is for. It's a tradeoff - accept some slowdown on every query, or require switching to a separate command/file for the occasional deep history queries. Or invest more time improving performance.

> One question: What part is slow and why? It's surprising to me that my slow and naive Python code should be sufficiently fast to process the whole thing every time while your swanky single-pass Haskell implementation should be slower and require splitting. Curious to understand why it's slow.

Ok, now you are really making me want to do some more comparative benchmarking! :-) Like http://hledger.org/release-notes.html#hledger-0.24 . Maybe later. My goal is for normal queries to feel "instant", I wonder if you're having that experience with beancount.


Simon Michael

unread,
Mar 11, 2016, 12:56:59 PM3/11/16
to hle...@googlegroups.com
PS, Martin what would be the nearest beancount equivalents to the commands at http://hledger.org/release-notes.html#hledger-0.24 ?

(I just updated my beancount installation, and should learn how to use it. There are a lot of docs, executables and commands, some of which say "this isn't supported yet" so I'll just ask you.)

Martin Blais

unread,
Mar 12, 2016, 5:41:41 PM3/12/16
to hle...@googlegroups.com
Beancount certainly doesn't feel instant; I think, however, that <1sec feels instant enough to me, or less than 2-3 seconds is still a reasonable bump to a query. More than that starts to become annoying. I do realize that it's a personal assessment.

You can use the -v option on bean-check to get the breakdown of time taken by each step.
Here's bean-check on my file (about 8-10 years of data):


BEANCOUNT_DISABLE_LOAD_CACHE=1 bean-check -v $L
INFO    : Operation: 'beancount.parser.parser.parse_file'             Time:                739 ms
INFO    : Operation: 'beancount.parser.parser'                        Time:          740 ms
INFO    : Operation: 'beancount.ops.pad'                              Time:           74 ms
INFO    : Operation: 'beancount.ops.documents'                        Time:           74 ms
INFO    : Operation: 'beancount.plugins.ira_contribs'                 Time:           17 ms
INFO    : Operation: 'beancount.plugins.implicit_prices'              Time:          181 ms
INFO    : Operation: 'beancount.plugins.sellgains'                    Time:           20 ms
INFO    : Operation: 'office.payees'                                  Time:         1292 ms
INFO    : Operation: 'beancount.plugins.check_commodity'              Time:           25 ms
INFO    : Operation: 'beancount.ops.balance'                          Time:          556 ms
INFO    : Operation: 'function: validate_open_close'                  Time:                  5 ms
INFO    : Operation: 'function: validate_active_accounts'             Time:                 38 ms
INFO    : Operation: 'function: validate_currency_constraints'        Time:                 24 ms
INFO    : Operation: 'function: validate_duplicate_balances'          Time:                  8 ms
INFO    : Operation: 'function: validate_duplicate_commodities'       Time:                  4 ms
INFO    : Operation: 'function: validate_documents_paths'             Time:                  4 ms
INFO    : Operation: 'function: validate_check_transaction_balances'  Time:                237 ms
INFO    : Operation: 'function: validate_data_types'                  Time:                 88 ms
INFO    : Operation: 'beancount.ops.validate'                         Time:          409 ms
INFO    : Operation: 'beancount.loader (total)'                       Time:   4347 ms


Another thing I do is to use a cache for the frequent cases where the user runs multiple queries on a file that hasn't changed. This dramatically speeds up the query time. Here's the same, with a cache hit:

bean-check -v $L
INFO    : Operation: 'beancount.loader (total)'                       Time:    341 ms


The web app keeps all the data in memory. It only needs to reload and pay the 4+ seconds cost when I save the file to disk.

(Note: I haven't done a single performance optimization yet, other than writing my parser generator in C.)

BTW, you can use bean-example to generate a _lot_ of years of data if you like. This tool could also be adapted to output HLedger code directly, or the converter to HLedger syntax to produce output you can use for HLedger.



Martin Blais

unread,
Mar 12, 2016, 5:43:10 PM3/12/16
to Martin Blais, hle...@googlegroups.com
Ah, and the bean-query tool starts in interactive mode if you don't provide a query on the command-line. That also keeps all the directive in memory and one can run multiple queries from the prompt most of which feel nearly instant. (The user can type the "reload" command to reparse the input file if it has changed.)
Reply all
Reply to author
Forward
0 new messages