Querying the journal via JSON using jq

263 views
Skip to first unread message

xiaor...@gmail.com

unread,
May 20, 2016, 1:45:55 AM5/20/16
to Beancount
Hi,

I'd like to share a way to query the journal in addition to bean-query. Using a simple python script, the beancount journal could be converted to JSON. Then one can use command line JSON tools, for example jq, to manipulate and query it. 

Here's the script: b2j.py.

For example:

./b2j.py test.beancount | jq '.[] | select (._type == "Event")'
{
  "meta": {
    "lineno": 50
  },
  "date": "2013-01-01",
  "type": "employer",
  "description": "Ace Corp",
  "_type": "Event"
}
{
  "meta": {
    "lineno": 80
  },
  "date": "2016-01-01",
  "type": "employer",
  "description": "Bob Corp",
  "_type": "Event"
}

Another use case is to query for transactions with certain metadata, which currently bean-count doesn't support querying. It's probably very hard for jq to do balance reports, but it's quite good at filtering and outputting parts of the transactions.

Hope this helps.

Martin Blais

unread,
May 22, 2016, 2:09:18 PM5/22/16
to Beancount
Thanks for sharing. The same could be said about using a good old regular SQL database to do the same, using XML with some Xpath querying tool, or even using a sequence of protocol buffers stored in a file. At the end of the day, we're the result of the little bit processing Beancount does on its input is just a simple stream of data, and there are many ways in which you could represent it externally and use available tools on it.

There's already a converter from Beancount format to SQL which you can try out:

I've also dabbled with the idea of extending SQLite3 to provide the contents of a Beancount input file as a data source directly (it turns out it's difficult to add the other things I need to replace bean-query).

The problem is, there are a few legitimate reasons for a custom query tool to exist, let me explain:

Much of what is interesting about the accounting operations we carry out is the ability to accumulate positions in inventories. For example, you can deposit 100 CAD in an account (e.g., "cash") which holds 200 USD. The result is never 300 anything... it's 200 USD & 100 CAD. SImilarly, you could accumulate 10 shares of AAPL acquired at 88.00 USD in an account that already contains a number of other existing lots of AAPL. The result should be the list of those lots.

Those types of operations are difficult to emulate using SQL or any other generic tool. To replicate this, you'd have to _always_ imply an aggregation clause by currency, i.e., a "GROUP BY currency". The cost basis, acquisition dates, and labels complicate this further. It's very inconvenient, I'm not even sure it can be done for accumulating balances. 

Note on the other hand that for aggregating a list of holdings at a particular point in time, these are isomorphic to Posting objects, you can flatten those into a list of columns and apart from some subtleties it can be useful to do arithmetic on that. Moving to tables for holdings make sense because usually all that needs to be done is to deal with total cost basis and market value of positions.

The way bean-query solves the query tool problem is by supporting the data structures using in Beancount itself, namely, Amount, Position, and Inventory. Inventory columns can accumulate Position columns, and you can derive Amounts for the "units" or the "cost" of a Position or an Inventory.

Furthermore, the basic data structure of Beancount, the Transaction, always has a two-level hierarchy: the transaction and its attributes, and its child postings. You could represent this as an SQL JOIN, but it would be the very same every time. Yes, you could create a view, but I think there's a space to make it easier. To be clear, here's what I mean (in a simplified way):

Transactions have date, flag, payee, narration
Postings have account, units (number and currency), cost (number, currency, acquisition date, label), price (number and currency)

These can be represented as tables, with a unique "id" field for Transactions and a non-unique "parent_id" reference in the Postings table. You could join on this id, to obtain this:

  date, flag, payee, narration, account, units_number, units_currency, cost_number, ...

bean-query deals with that for you every time, and provides short-hands to render journals and balances for example. It'll get even easier over time, as I want to add some niceties like implicit group-by clauses (there are very few uses for aggregation which differs from that which can be implied from its targets, e.g., in "SELECT account, month(date), sum(posting)" it's already obvious that 1, 2 is the grouping key).

What I want to do eventually is genericize bean-query to a more powerful SQL processor that could
- Act on a multitude of data sources, one of which would the data from Beancount. Others would include tables from HTML documents, attributes of files in directories, and many others.
- Support complex custom data types defined in pure Python, like Beancount's Amount, Position, and Inventory.
- Support repeated values generically, and use that on the Inventory type, flattening those rows into multiple rows with a single Position in each
- Support hierarchical references.
- Support recursive SQL queries.
The ultimate purpose of this tool will be to reach beyond Beancount and allow anyone to easily inject a command-line SQL processor onto many common data sources, think of it as a swiss knife using SQL. This simple tool would not be designed for scalability or high performance, but rather for convenience. It could occupy some of the same space as Pandas does in the Python world, and work on the command-line (SQL on the command-line is a convenient way to express some expressions that would otherwise be difficult using the battery of common UNIX tools).

Anyhow, I hope this offers some perspective on why there's bean-query in the first place. I'd love to be able to do without, but I don't think it's possible.

Martin Blais

unread,
May 22, 2016, 2:09:30 PM5/22/16
to Beancount
Reply all
Reply to author
Forward
0 new messages