Cumulative sum in bean-query?

816 views
Skip to first unread message

ethan.gla...@gmail.com

unread,
Nov 1, 2018, 2:49:36 PM11/1/18
to Beancount
I was wondering if it's possible to build some kind of bean-query that would produce output like the fava "Balance Sheet" view does. If you haven't seen it, it looks like this: https://fava.pythonanywhere.com/example-beancount-file/balance_sheet/ . It shows the "net worth" of the beancount file over time.

You can get it per-account using `BALANCES AT COST FROM date < 2018-09-05`. And using `EXPLAIN`, I was able to turn that into `SELECT sum(cost(position)) WHERE account ~ '(Liabilities|Assets):.*' and date < 2018-09-05`, which works great for a single point in time. Finally I was calculate to see month-to-month changes using `SELECT year(date), month(date), sum(cost(position)) WHERE account ~ '(Liabilities|Assets):.*'`. But I don't have any way to produce a running total (the sum of those changes up to each month).

Is there a bean-query mechanism for doing accumulation or "running count" operations? I could "roll my own" using a subquery but I see from the documentation that sub-selects aren't supported.

Thanks!

Ethan

Stefano Zacchiroli

unread,
Nov 1, 2018, 3:01:51 PM11/1/18
to bean...@googlegroups.com
On Thu, Nov 01, 2018 at 11:49:36AM -0700, ethan.gla...@gmail.com wrote:
> Is there a bean-query mechanism for doing accumulation or "running count"
> operations? I could "roll my own" using a subquery but I see from the
> documentation that sub-selects aren't supported.

You have the balance "column", as in:

SELECT balance WHERE account ~ '^(Liabilities|Assets)' ORDER BY date;

See: "The “balance” Column" in the BQL documentation here:

https://docs.google.com/document/d/1s0GOZMcrKKCLlP29MD7kHO4L88evrwWdIO0p4EwRBE0/

It's not a fully generic running count, but AFAICT is what is used by
the BALANCES shorthand query which you cited as initial example.

Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Ethan

unread,
Nov 1, 2018, 3:19:43 PM11/1/18
to bean...@googlegroups.com
That's fantastic, thank you very much. I was able to get what (grouped by month) I wanted using `SELECT year(date), month(date), last(cost(balance))`. I wasn't able to use `SELECT year, month, sum(cost(position)), balance WHERE date > 2018-10-20`. When I try I get this exception:

Traceback (most recent call last):
  File "/nix/store/nrl0l79a48924xb0897ap572xf29ciir-python3-3.6.6/lib/python3.6/cmd.py", line 214, in onecmd
    func = getattr(self, 'do_' + cmd)
AttributeError: 'QueryShell' object has no attribute 'do_SELECT'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nix/store/fp1w3x8kapd6bj0d2ay2q5ghpwjhzl1h-python3.6-beancount-2.1.2/lib/python3.6/site-packages/beancount/query/shell.py", line 270, in run_parser
    self.dispatch(statement)
  File "/nix/store/fp1w3x8kapd6bj0d2ay2q5ghpwjhzl1h-python3.6-beancount-2.1.2/lib/python3.6/site-packages/beancount/query/shell.py", line 250, in dispatch
    return method(statement)
  File "/nix/store/hi4vx0wnnllvlvfbd3hdblpxhdmlcjjr-fava-1.7/lib/python3.6/site-packages/fava/core/query_shell.py", line 89, in on_Select
    self.options_map)
  File "/nix/store/fp1w3x8kapd6bj0d2ay2q5ghpwjhzl1h-python3.6-beancount-2.1.2/lib/python3.6/site-packages/beancount/query/query_execute.py", line 327, in execute_query
    store = agg_store[row_key]
TypeError: unhashable type: 'Inventory'

I'm not sure if that's expected or not -- should I file a bug?

Ethan

--
You received this message because you are subscribed to a topic in the Google Groups "Beancount" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beancount/dtOply6B8xQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/20181101190147.75d3a224xwe7gi2d%40upsilon.cc.
For more options, visit https://groups.google.com/d/optout.

Martin Blais

unread,
Nov 3, 2018, 5:43:37 PM11/3/18
to bean...@googlegroups.com
The problem with your query is a combination of things:
- You're making an aggregate query, as per the presence of sum()
- You're not using an explicit GROUP BY clause, so it selects all the non-aggregate columns for you (year, month, balance).
balance is of type Inventory.
All non-aggregate types must be hashable (though I could relax that to make them required comparable, with a little cost).

If you use last(balance) instead of  balance, that would just work (group by year, month, only, which is likely what you wanted anyway).

Granted: It should not fail with an exception like this (but this needs a full rewrite, see other emails for my thoughts on this).





You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Ethan

unread,
Nov 4, 2018, 9:08:41 AM11/4/18
to bean...@googlegroups.com
OK, that's where I ended up anyhow. Thank you very much for the pointers!

Ethan


shreedha...@gmail.com

unread,
Nov 14, 2018, 12:13:21 AM11/14/18
to Beancount
There is a catch though. When there are multiple transactions on the same day, last(balance) may give seemingly meaningless sums:

2018-02-09 * "Lots of expenses"
 
Assets:Checking
 
Expenses:Gas          40 USD
 
Expenses:Groceries    20 USD

2018-02-01 * "Groceries"
 
Assets:Checking
 
Expenses:Groceries          20 USD

2018-02-01 * "Gas"
 
Assets:Checking
 
Expenses:Gas          40 USD


beancount> select account, sum(position), last(balance) where account ~ 'Expenses'
     account       sum_po last_ba
------------------ ------ -------
Expenses:Groceries 40 USD 120 USD
Expenses:Gas       80 USD 100 USD

The reason seems to be that order matters for last(balance). Now the actual total is still there - its the maximum value in the last_balance column. But it is not useful for a running total of aggregates - especially if you add ORDER BYs. I works fine for non-aggregate queries:

beancount> select account, position, balance where account ~ 'Expenses'
     account       positi balance
------------------ ------ -------
Expenses:Groceries 20 USD  20 USD
Expenses:Gas       40 USD  60 USD
Expenses:Gas       40 USD 100 USD
Expenses:Groceries 20 USD 120 USD

I personally would love a "total" displayed as an additional line at the very end (kind of like ledger does) for balances.

It would get tricky if there are multiple projected columns. Perhaps that can be specified with an extension to the query language: (SELECT .. TOTAL 1, 2) to total the final output of columns 1 & 2. Or it can be done automatically for any non-text column with only one currency. There might be other caveats/issues.

Alternatively, 'balance' can be computed on the final output of the query rather than per posting before aggregation.

- Shreedhar

Martin Blais

unread,
Nov 15, 2018, 1:04:27 AM11/15/18
to bean...@googlegroups.com
On Wed, Nov 14, 2018 at 12:13 AM <shreedha...@gmail.com> wrote:
There is a catch though. When there are multiple transactions on the same day, last(balance) may give seemingly meaningless sums:

2018-02-09 * "Lots of expenses"
 
Assets:Checking
 
Expenses:Gas          40 USD
 
Expenses:Groceries    20 USD

2018-02-01 * "Groceries"
 
Assets:Checking
 
Expenses:Groceries          20 USD

2018-02-01 * "Gas"
 
Assets:Checking
 
Expenses:Gas          40 USD


beancount> select account, sum(position), last(balance) where account ~ 'Expenses'
     account       sum_po last_ba
------------------ ------ -------
Expenses:Groceries 40 USD 120 USD
Expenses:Gas       80 USD 100 USD

The reason seems to be that order matters for last(balance). Now the actual total is still there - its the maximum value in the last_balance column. But it is not useful for a running total of aggregates - especially if you add ORDER BYs.

Interesting. Indeed you're correct. I've never come across this issue.
Maybe what's needed is not an automatically generated column, but a feature (a function?) at the SQL level which refers to another column, computing a running sum, something like this:
> select account, sum(position) as pos, ACCUMULATE(pos) where ...
Something like that.
I wonder if there's precedent for something like this in some variant of SQL.
Another idea would be to provide the value for the previous column, so you could do
> select account, sum(position) as pos, (prev.balance + pos) as balance where ...


 
I works fine for non-aggregate queries:

beancount> select account, position, balance where account ~ 'Expenses'
     account       positi balance
------------------ ------ -------
Expenses:Groceries 20 USD  20 USD
Expenses:Gas       40 USD  60 USD
Expenses:Gas       40 USD 100 USD
Expenses:Groceries 20 USD 120 USD

I personally would love a "total" displayed as an additional line at the very end (kind of like ledger does) for balances.

It would get tricky if there are multiple projected columns. Perhaps that can be specified with an extension to the query language: (SELECT .. TOTAL 1, 2) to total the final output of columns 1 & 2. Or it can be done automatically for any non-text column with only one currency. There might be other caveats/issues.

Alternatively, 'balance' can be computed on the final output of the query rather than per posting before aggregation.

This would definitely be an extension, e.g., your TOTAL idea.


 

- Shreedhar

On Thursday, November 1, 2018 at 1:49:36 PM UTC-5, ethan.gla...@gmail.com wrote:
I was wondering if it's possible to build some kind of bean-query that would produce output like the fava "Balance Sheet" view does. If you haven't seen it, it looks like this: https://fava.pythonanywhere.com/example-beancount-file/balance_sheet/ . It shows the "net worth" of the beancount file over time.

You can get it per-account using `BALANCES AT COST FROM date < 2018-09-05`. And using `EXPLAIN`, I was able to turn that into `SELECT sum(cost(position)) WHERE account ~ '(Liabilities|Assets):.*' and date < 2018-09-05`, which works great for a single point in time. Finally I was calculate to see month-to-month changes using `SELECT year(date), month(date), sum(cost(position)) WHERE account ~ '(Liabilities|Assets):.*'`. But I don't have any way to produce a running total (the sum of those changes up to each month).

Is there a bean-query mechanism for doing accumulation or "running count" operations? I could "roll my own" using a subquery but I see from the documentation that sub-selects aren't supported.

Thanks!

Ethan

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

shreedha...@gmail.com

unread,
Nov 15, 2018, 9:40:01 AM11/15/18
to Beancount


On Thursday, November 15, 2018 at 12:04:27 AM UTC-6, Martin Blais wrote:
On Wed, Nov 14, 2018 at 12:13 AM <shreedha...@gmail.com> wrote:
There is a catch though. When there are multiple transactions on the same day, last(balance) may give seemingly meaningless sums:

2018-02-09 * "Lots of expenses"
 
Assets:Checking
 
Expenses:Gas          40 USD
 
Expenses:Groceries    20 USD

2018-02-01 * "Groceries"
 
Assets:Checking
 
Expenses:Groceries          20 USD

2018-02-01 * "Gas"
 
Assets:Checking
 
Expenses:Gas          40 USD


beancount> select account, sum(position), last(balance) where account ~ 'Expenses'
     account       sum_po last_ba
------------------ ------ -------
Expenses:Groceries 40 USD 120 USD
Expenses:Gas       80 USD 100 USD

The reason seems to be that order matters for last(balance). Now the actual total is still there - its the maximum value in the last_balance column. But it is not useful for a running total of aggregates - especially if you add ORDER BYs.

Interesting. Indeed you're correct. I've never come across this issue.
Maybe what's needed is not an automatically generated column, but a feature (a function?) at the SQL level which refers to another column, computing a running sum, something like this:
> select account, sum(position) as pos, ACCUMULATE(pos) where ...
Something like that.
I wonder if there's precedent for something like this in some variant of SQL.

Yes there is - they're called WINDOW functions. I'm most familiar with how they're implemented in postgres so I'll link to that : https://www.postgresql.org/docs/10/tutorial-window.html. In short, they are used to calculated aggregates on a different set of "group bys" than are listed in the main SQL group by clause. You can use any aggregate function you like and thus aren't restricted to 'sum'.

For for a running total, the query would look like:

> select account, sum(position) as pos, SUM(pos) OVER () where ...

You could also do partial totals, for example say you want to total by account and then have a separate running total. That would look like : 

> select account, sum(position) as pos, SUM(pos) OVER (PARTITION BY account) AS total_over_account , SUM(pos) OVER ()  AS running_total where ...
 
There's also syntax to specify how many previous and following rows to consider (called the frame), the way you want the rows ordered etc. See the details here https://www.postgresql.org/docs/10/sql-select.html#SQL-WINDOW. It is a very powerful feature of SQL and very very useful to generate reports. But it is definitely complex to understand and implement.


Another idea would be to provide the value for the previous column, so you could do
> select account, sum(position) as pos, (prev.balance + pos) as balance where ...


Are you thinking of prev as the prev posting or the prev output row?
 
 
I works fine for non-aggregate queries:

beancount> select account, position, balance where account ~ 'Expenses'
     account       positi balance
------------------ ------ -------
Expenses:Groceries 20 USD  20 USD
Expenses:Gas       40 USD  60 USD
Expenses:Gas       40 USD 100 USD
Expenses:Groceries 20 USD 120 USD

I personally would love a "total" displayed as an additional line at the very end (kind of like ledger does) for balances.

It would get tricky if there are multiple projected columns. Perhaps that can be specified with an extension to the query language: (SELECT .. TOTAL 1, 2) to total the final output of columns 1 & 2. Or it can be done automatically for any non-text column with only one currency. There might be other caveats/issues.

Alternatively, 'balance' can be computed on the final output of the query rather than per posting before aggregation.

This would definitely be an extension, e.g., your TOTAL idea.


Yeah. I was thinking of it as a short hand for the window function with the 1,2 being the grouping/partition-by columns. But honestly, window functions are the most flexible and powerful.

Shreedhar Hardikar

unread,
Nov 15, 2018, 10:16:06 AM11/15/18
to bean...@googlegroups.com

Yeah. I was thinking of it as a short hand for the window function with the 1,2 being the grouping/partition-by columns. But honestly, window functions are the most flexible and powerful.

Oops, I mis-remembered the context. 1,2 are *not* the grouping/partition-by columns. They'd just be the columns for which a total needs to be calculated and printed. Sorry for the confusion.

I thought I'd also point out another common way to get subtotal aggregates only (and not running totals) : grouping sets. https://www.postgresql.org/docs/10/queries-table-expressions.html#QUERIES-GROUPING-SETS. It's like applying multiple group by clauses at the same time and union the result.

Reply all
Reply to author
Forward
0 new messages