A Query Language for Beancount

821 views
Skip to first unread message

Martin Blais

unread,
Sep 29, 2014, 4:02:29 AM9/29/14
to bean...@googlegroups.com, ledger-cli
Hi,

I've been doing some thinking about creating a query language for Beancount for a while, but until a few days ago I was left profoundly unsatisfied with the design I had in mind, but I think I've finally nailed down an idea that would allow me to remove all report types (i.e., no more "balances" (Ledger: bal), no more "journal" (Ledger: register)): by combining filtering of transactions with filtering of postings, as two separate clauses, in a SQL-like syntax where you get to specify the desired output columns and inventory aggregations. The target of the FROM clause is not a table name, but a set of filters applied to full transactions, and the WHERE clause applies to postings.

The ideas are a little bit fresh, so this proposal might feel a bit "stream of consciousness" and needs a bit more polishing.  See the "Two-Phase Filtering" section of this doc:

  https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing

I'd love some feedback and your thoughts (as always you can comment on the doc directly).

If this works as I imagine it would, I would remove support for all other report types, or perhaps create aliases for them to canned select queries.

Simon Michael

unread,
Oct 2, 2014, 8:55:05 PM10/2/14
to bean...@googlegroups.com, ledge...@googlegroups.com, hle...@googlegroups.com
On 9/28/14 10:02 PM, Martin Blais wrote:
> The ideas are a little bit fresh, so this proposal might feel a bit
> "stream of consciousness" and needs a bit more polishing.
...
> https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing

I haven't much to add today, except this is a very nice discussion - thanks.

Your "Gmail-style" query language describes hledger's current query
language well.

A more powerful SQL-ish query language does seem like a good idea,
especially one based on the nice Google Charts api linked in your comments.


Martin Blais

unread,
Oct 3, 2014, 2:37:12 AM10/3/14
to ledger-cli, bean...@googlegroups.com, hle...@googlegroups.com
On Thu, Oct 2, 2014 at 5:51 PM, Simon Michael <si...@joyful.com> wrote:
On 9/28/14 10:02 PM, Martin Blais wrote:
The ideas are a little bit fresh, so this proposal might feel a bit
"stream of consciousness" and needs a bit more polishing.
...
https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing

I haven't much to add today, except this is a very nice discussion - thanks.

Your "Gmail-style" query language describes hledger's current query language well.

I'm assuming you mean this: http://hledger.org/manual#query-arguments
Thanks for pointing it out, I hadn't noticed, it's indeed quite similar.

I think I lean heavily on the SQL version though, for the following reasons:
- I like the explicit nature of the transaction filter vs. postings filter being in two separate location.
- With aggregate operations defined on inventory objects, I'm able to fold both the journal and balance operations into a single syntax. This gets rid of the concept of "report" altogether, which is nice. There would be no more reports, just a query expression syntax (and perhaps some shortcuts to generate the reports using a prebaked query).
- I like to be able to specify the desired output columns. Along with aggregations, this will allow it to replace all the custom holdings reports I had been creating lately.

It's always important to me to state the reasons why this is not possible as a regular table: filtering has to be operable on complete transactions (with all legs), and operations on inventories of commodities have to be possible. These are the reasons that we cannot just spit out the data to some SQLite table and then use that (I'd like to see someone argue the opposite).




A more powerful SQL-ish query language does seem like a good idea, especially one based on the nice Google Charts api linked in your comments.

I'm not sure about the chart API, that's an idea that has been pointed out to me.

Wm...

unread,
Oct 18, 2014, 1:47:20 PM10/18/14
to bean...@googlegroups.com, ledge...@googlegroups.com
Thu, 2 Oct 2014 23:37:09
<CAK21+hMx7wgt0imEtHp_WTFu...@mail.gmail.com>
Martin Blais <bl...@furius.ca> wrote...

[not sub'd to hledger so dropped from ng list]

>It's always important to me to state the reasons why this is not possible
>as a regular table: filtering has to be operable on complete transactions
>(with all legs), and operations on inventories of commodities have to be
>possible. These are the reasons that we cannot just spit out the data to
>some SQLite table and then use that (I'd like to see someone argue the
>opposite).

Given that you seem to write your thoughts out I've missed the doc where
you say what the reasons are it can't be done in SQLite (or postgres or
mysql) tables. Bear in mind that GnuCash uses any of the 3 mentioned
dbs as storage methods alongside XML (cf plain text file).

Your own recently mentioned implementation (bean-sql) creates multiple
tables but it could actually be done in one table if you simply regard
SQLite3 (the overhead and fuss to new users mitigates against the
others) as a very efficient storage method and just take it from there.

Within a few minutes you will start seeing your data and wanting to make
some impression on it like "I want to put the transaction and
transaction lines seperately" ... which is what you did :)


--
Wm...

Martin Blais

unread,
Oct 18, 2014, 2:30:20 PM10/18/14
to Wm..., bean...@googlegroups.com, ledger-cli
On Sat, Oct 18, 2014 at 12:45 PM, Wm... <tcn...@tarrcity.demon.co.uk> wrote:
Thu, 2 Oct 2014 23:37:09 <CAK21+hMx7wgt0imEtHp_WTFuYaqLTPJ+OwFHM+wjPS_77s...@mail.gmail.com>

Martin Blais <bl...@furius.ca> wrote...

[not sub'd to hledger so dropped from ng list]

It's always important to me to state the reasons why this is not possible
as a regular table: filtering has to be operable on complete transactions
(with all legs), and operations on inventories of commodities have to be
possible. These are the reasons that we cannot just spit out the data to
some SQLite table and then use that (I'd like to see someone argue the
opposite).

Given that you seem to write your thoughts out I've missed the doc where you say what the reasons are it can't be done in SQLite (or postgres or mysql) tables.  Bear in mind that GnuCash uses any of the 3 mentioned dbs as storage methods alongside XML (cf plain text file).

I'm not so much interested in the storage method; a relational database of XML can represent the internals of Beancount alright, that's easy, the issue is making simple queries on tree-structured data that produces reports suitable for accounting, with operations on inventories of positions.

 

Your own recently mentioned implementation (bean-sql) creates multiple tables but it could actually be done in one table if you simply regard SQLite3 (the overhead and fuss to new users mitigates against the others) as a very efficient storage method and just take it from there.

No you can't.
How would you represent the two-level tree structure of "transactions <- 1:N -> postings" with
- a single relational table, and
- no duplication (otherwise it's a derived form and not suitable for representation)
?
If you have multiple tables, you end up having to do joins and deduplication.
With a single table, you have duplication of information.
At the moment I output separate tables for transactions and postings, which can be joined, but it still doesn't allow you to make great queries that involve the kind of filtering that is convenient for generating, e.g. balance sheets.

The Beancount Query Language will specifically address these issues by:
- allowing filtering to occur at two levels in a single query: 
  * filtering at the entries level (with predicates on postings allowed), plus 
  * filtering at the postings level
- support inventory arithmetic and aggregation functions on inventory objects and rendering functions (e.g., COST()).
- support flattening of multiple lots (to produce lists of holdings each with their cost basis)
- explicitly support the "close" operation as syntax, that inserts transactions to clear income statement accounts to equity and removes past history

I claim that with the features above you can explicitly model all the queries being done in Ledger, HLedger and Beancount.

 

Within a few minutes you will start seeing your data and wanting to make some impression on it like "I want to put the transaction and transaction lines seperately" ... which is what you did :)

I don't understand this sentence.



 



--
Wm...

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/tLKufVTCdqQUFwvv%40tarrcity.demon.co.uk.

For more options, visit https://groups.google.com/d/optout.

Wm...

unread,
Oct 19, 2014, 5:37:31 AM10/19/14
to bean...@googlegroups.com, ledge...@googlegroups.com
Sat, 18 Oct 2014 13:30:15
<CAK21+hOon=Aph+bBTSMwxw-5p08-Gy...@mail.gmail.com>
Martin Blais <bl...@furius.ca> wrote...

>On Sat, Oct 18, 2014 at 12:45 PM, Wm... <tcn...@tarrcity.demon.co.uk> wrote:
>
>> Thu, 2 Oct 2014 23:37:09 <CAK21+hMx7wgt0imEtHp_WTFuYaqLTPJ+OwFHM+wjPS_
>> 77s...@mail.gmail.com>
>> Martin Blais <bl...@furius.ca> wrote...
>>
>> [not sub'd to hledger so dropped from ng list]
>>
>> It's always important to me to state the reasons why this is not possible
>>> as a regular table: filtering has to be operable on complete transactions
>>> (with all legs), and operations on inventories of commodities have to be
>>> possible. These are the reasons that we cannot just spit out the data to
>>> some SQLite table and then use that (I'd like to see someone argue the
>>> opposite).
>>>
>>
>> Given that you seem to write your thoughts out I've missed the doc where
>> you say what the reasons are it can't be done in SQLite (or postgres or
>> mysql) tables. Bear in mind that GnuCash uses any of the 3 mentioned dbs
>> as storage methods alongside XML (cf plain text file).
>>
>
>I'm not so much interested in the storage method; a relational database of
>XML can represent the internals of Beancount alright, that's easy,

ok

>the
>issue is making simple queries on tree-structured data that produces
>reports suitable for accounting, with operations on inventories of
>positions.

I'm not sure I'm getting this. Surely most of the objects (point in
time, something, amount of that something, who had it before and who had
it afterwards) are easily defined and represented with a bit of glue
saying that-one-over-there and this-one-here belong together, let's call
it a transaction ?

Did you know that on a gnucash stored db
===
Select
Sum(splits.value_num),
Sum(splits.quantity_num)
From
splits
===
gives me 0, 0 ?

[1] see below

that *is* the simplest balance sheet

what is it you are struggling to represent ?

surely, in spite of what you say, it is complexity you want rather than
simplicity ?

>> Your own recently mentioned implementation (bean-sql) creates multiple
>> tables but it could actually be done in one table if you simply regard
>> SQLite3 (the overhead and fuss to new users mitigates against the others)
>> as a very efficient storage method and just take it from there.
>>
>
>No you can't.
>How would you represent the two-level tree structure of "transactions <-
>1:N -> postings" with
>- a single relational table, and
>- no duplication (otherwise it's a derived form and not suitable for
>representation)
>?

exactly as beancount and ledger-cli do now with line numbers as a kludge
to represent a transaction if you want to be perverse, otherwise you put
them next to each other or blob them, your choice, but not a problem
that says only a plain text file makes this better because I haven't
thought about it enough

contemporary dbs can easily split out stuff from a blob better than you
spending years refining some transactional thing that only you care
about, promise

>If you have multiple tables, you end up having to do joins and
>deduplication.

joins yes, deduplication no

why do you think deduplication would be necessary? genuinely puzzled

>With a single table, you have duplication of information.

nope, a single table could be just as simple as a plain text file

>At the moment I output separate tables for transactions and postings, which
>can be joined, but it still doesn't allow you to make great queries that
>involve the kind of filtering that is convenient for generating, e.g.
>balance sheets.

as above [1] , you are incorrect

>The Beancount Query Language will specifically address these issues by:
>- allowing filtering to occur at two levels in a single query:
> * filtering at the entries level (with predicates on postings allowed),
>plus

free with SQL

> * filtering at the postings level
>- support inventory arithmetic and aggregation functions on inventory
>objects and rendering functions (e.g., COST()).
>- support flattening of multiple lots (to produce lists of holdings each
>with their cost basis)

free with SQL if you know what you are doing

>- explicitly support the "close" operation as syntax, that inserts
>transactions to clear income statement accounts to equity and removes past
>history

Ah, now you see, I disagree with that from an accounting and
mathematical POV. I explicitly don't want that. Transactions are a
continuum over time, reporting on them may be necessary because one or
other government says so but does what actually happened change because
I have to state a position on a date to an authority ? I say that is a
report not a transaction, my accounts close when I die.

>I claim that with the features above you can explicitly model all the
>queries being done in Ledger, HLedger and Beancount.

I've read you say things like that before. Ho hum.

>> Within a few minutes you will start seeing your data and wanting to make
>> some impression on it like "I want to put the transaction and transaction
>> lines seperately" ... which is what you did :)
>
>
>I don't understand this sentence.

What is unclear?

you wrote bean-sql to represent a plain text beancount file as more than
one table, why ? because it represented the data well is my suggestion.

What I wonder about sometimes, Martin, is how much you have to be hit on
the head, in a nice way, of course.

--
Wm...

Martin Blais

unread,
Oct 19, 2014, 4:23:19 PM10/19/14
to Wm..., bean...@googlegroups.com, ledger-cli
On Sun, Oct 19, 2014 at 4:38 AM, Wm... <tcn...@tarrcity.demon.co.uk> wrote:
Sat, 18 Oct 2014 13:30:15 <CAK21+hOon=Aph+bBTSMwxw-5p08-GyrTC8sOXJZdmOpWZL8MSA@mail.gmail.com>

Martin Blais <bl...@furius.ca> wrote...

On Sat, Oct 18, 2014 at 12:45 PM, Wm... <tcn...@tarrcity.demon.co.uk> wrote:

Thu, 2 Oct 2014 23:37:09 <CAK21+hMx7wgt0imEtHp_WTFuYaqLTPJ+OwFHM+wjPS_
77s...@mail.gmail.com>
Martin Blais <bl...@furius.ca> wrote...

[not sub'd to hledger so dropped from ng list]

 It's always important to me to state the reasons why this is not possible
as a regular table: filtering has to be operable on complete transactions
(with all legs), and operations on inventories of commodities have to be
possible. These are the reasons that we cannot just spit out the data to
some SQLite table and then use that (I'd like to see someone argue the
opposite).


Given that you seem to write your thoughts out I've missed the doc where
you say what the reasons are it can't be done in SQLite (or postgres or
mysql) tables.  Bear in mind that GnuCash uses any of the 3 mentioned dbs
as storage methods alongside XML (cf plain text file).


I'm not so much interested in the storage method; a relational database of
XML can represent the internals of Beancount alright, that's easy,

ok

BTW I meant OR, not OF. A relational database, an XML file, a JSON file, and many more are suitable formats for this simple dataset. Representing the data is not the problem. Specifying outputs suitable to produce common accounting reports (in the face of cost basis, multiple currencies) is the minor challenge at hand.

 


the
issue is making simple queries on tree-structured data that produces
reports suitable for accounting, with operations on inventories of
positions.

I'm not sure I'm getting this.  Surely most of the objects (point in time, something, amount of that something, who had it before and who had it afterwards) are easily defined and represented with a bit of glue saying that-one-over-there and this-one-here belong together, let's call it a transaction ?

This is a simple and common database-related problem.

You have at least two tables:
One table contains information attached to the transaction.
The other table contains information attached to the posting.

Let's illustrate: let us simplify things grossly and say that transactions have only 

  (entry-id, date, flag, payee) 

fields and that postings have only 

  (posting-id, entry-id, account, amount, currency) 

fields. 

You can join the two tables on some id (easily doable) to obtain a table of postings plus their corresponding entry data.
If so, you get rows of

  (entry-id, date, flag, payee, posting-id, account, amount, currency)
  ...

The rub here is that you have duplication of the entry information. For example, given the following transaction:

  2014-09-12 * "WALMART"
    Assets:Checking    -45.00 USD
    Expenses:Toys    45.00 USD

You would obtain

  (1000, 2014-09-12, "*", "WALMART", 20000, "Assets:Checking", -45.00, "USD")
  (1000, 2014-09-12, "*", "WALMART", 20001, "Expenses:Toys", 45.00, "USD")

Do you not see that the entry fields are duplicated?
Ok, so you might think, this is not much of a problem.

Well, you _can_ generate many meaningful reports this way, but not all (read on). This is conceptually how Ledger works AFAIK: the filter applies against the list of postings, as they appear in the file (Ledgerfolk: please correct me if I'm wrong, this is how I understand it to be).



Deduplication

The problem I referred to as deduplication is that of combining rows of the same transaction. This is how you would normally want to render a journal's entries. For example:

  2014-09-12 * "WALMART"
    Assets:Checking    -45.00 USD
    Expenses:Toys    15.00 USD
    Expenses:Toys    30.00 USD

This should ideally render as a single entry in a journal for Expenses:Toys. Beancount's journal rendering routines do this automatically. A GROUP BY for all entry columns for a result set on a list of postings would be awkward.



Operations on Inventories

A more difficult problem is that the Ledgerverse systems admit any number and type of units to be stored and accumulated in association with an account. Accounts don't have associated currencies. This means that summing the changes defined by postings is not a simple arithmetic operation. Inventory objects are more like "mappings" that contains associations of (currency, cost, acquisition-date) to a number of units. Adding a new position has one of two effects: it either creates a new lot, or if there is already a lot for that (currency, cost, acquisition-date), it adds units to that lot. Reducing a position is handled differently by Ledger and Beancount: in Ledger, the operation works the same as with the addition of new lots, except with a negative number. It allows positive and negative lots in the same inventory. This naturally supports other methods of booking (you can average all the cost bases, for example) but has the unfortunate effect of admitting impossible situations, such as an inventory with a negative cost basis. Reducing a position in Beancount is more strict: it requires that an existing lot exist to be removed from. Therefore, it never allows for both positive and negative numbers for the same lot. This creates some complications of its own, in particular when attempting to book positions at average cost (the lots of an inventory need to be reaggregated) but has the advantage of preventing impossible situations and resulting only in correct inventories. Both methods have benefits and difficulties.

Such operations on inventories are difficult to model in SQL. I don't think it can be modeled easily. I think it's preferable to just provide a query engine that natively supports such an inventory object upon which positions can be aggregated.

My new syntax will allow the user to join on the resulting inventory positions to flatten them out to detailed lists of holdings (these are useful to compute attributes of portfolios). For example, if the balance of an account (or a group of accounts) is 20 GOOG, 25 AAPL and 30 MSFT, by default a result would output a single row like this:

    ....     20 GOOG, 25 AAPL, 30 MSFT

So the user should be able to provide a special option ("FLATTEN") like this:

  SELECT ... FROM ... WHERE ... FLATTEN;

to produce one result row for each lot, like this:

   .... 20 GOOG
   .... 25 AAPL
   .... 30 MSFT




Filtering on Entries

Another issue is that it's quite common to perform filtering on transactions separately from filtering on postings. This has implicitly been the default in Beancount. For instance, the most common thing I do is to bring up the web interface, then drill down in the particular year I'm interested in (this is a filter on all transactions for that year), and then to bring up a journal/register listing for a particular account (this is a filter on the postings). There are many use cases. Here's another one: filter to the list of transactions for a particular tag, e.g., transactions associated with a travel/trip, and then show all postings from accounts under "Expenses:Transportation:...".

Now, it is true that both the previous examples are selectable on a single joined table of postings & transactions (IF you have set of operations on the list of tags, for the last one). Here's one that is slightly more difficult: 

  "filter all the transactions that have at least one posting to an account which begins with "Assets:US:Vanguard" and report all postings associated with these transactions with an account beginning with 'Income'"

This is also doable, like this:

  sqlite> select account, sum(number), currency from postings where id in (select distinct t.id  from transactions as t join postings where account glob 'Assets:US:Vanguard:*') and account glob "Income:*" group by account, currency;

but it's slightly awkward. Since this is so common, I'd rather do something like this:

  SELECT account, sum(change) FROM has_account('Assets:US:Vanguard:*') WHERE account ~ "Income:*" group by account;

Finally, and more importantly, the ability to perform transformations on the list of transactions, such as the "close" operation I describe later in this email is impossible in SQL. This operation will be available via syntax like this:

  SELECT .... FROM .... CLOSE ON 2014-01-01  WHERE ...

In this example, "CLOSE ON 2014-01-01" means "summarize all the entries before Jan 1st 2014".




Did you know that on a gnucash stored db
===
Select
  Sum(splits.value_num),
  Sum(splits.quantity_num)
From
  splits
===
gives me 0, 0 ?

[1] see below

that *is* the simplest balance sheet

How useful is that balance sheet?
Are you able to deal with multiple currencies?
How about rendering the values at cost? At market value?
How about rendering of each lot independently?

I'm going to try to fiddle with GnuCash to understand more clearly how it tracks cost basis and what the resulting value and quantity numbers are and summarize it.



what is it you are struggling to represent ?

surely, in spite of what you say, it is complexity you want rather than simplicity ?

The minimal amount of complexity that will provide adequate reports.

(I suspect it's possible that your ledger is very simple and you might not have encountered any of the situations we're attempting to deal with.)



Your own recently mentioned implementation (bean-sql) creates multiple
tables but it could actually be done in one table if you simply regard
SQLite3 (the overhead and fuss to new users mitigates against the others)
as a very efficient storage method and just take it from there.


No you can't.
How would you represent the two-level tree structure of "transactions <-
1:N -> postings" with
- a single relational table, and
- no duplication (otherwise it's a derived form and not suitable for
representation)
?

exactly as beancount and ledger-cli do now with line numbers as a kludge to represent a transaction if you want to be perverse, otherwise you put them next to each other or blob them, your choice, but not a problem that says only a plain text file makes this better because I haven't thought about it enough

I don't understand what you're trying to say.


contemporary dbs can easily split out stuff from a blob better than you spending years refining some transactional thing that only you care about, promise

Years? Yesterday alone I wrote a pretty complete SQL parser and today I'm writing an interpreter for it.

Here:

You need to understand that despite its ubiquity and familiarity, SQL isn't a silver bullet for everything. It is by no means perfect. Its does not express hierarchical data formats well, which is partly why object databases like MongoDB are enjoying a fair amount of popularity (not so long ago all the rage was XML-based datastores with better defined schemas). You can also see traces of this in all the ways that people writing an access language for hierarchical formats have had to get creative in order to provide access to nested and repeated fields, e.g. http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf
Also see how the different SQL database implementations each have their own warts and limitations.

It is entirely reasonable to extend SQL and propose specializations for particular use cases--everyone does it. You seem to think that SQL is some sort of superior technology that has solved all data access specification problems. SQL is a way to specify access to your data, but it is still the case that application writers need to occasionally combine the output of multiple queries with some code to perform some operations. I think that with minimal extensions I can define a variant that works well for our little accounting database.



 
If you have multiple tables, you end up having to do joins and
deduplication.

joins yes, deduplication no

why do you think deduplication would be necessary?  genuinely puzzled

(See detailed explanation above with example about what I called "deduplication".)



With a single table, you have duplication of information.

nope, a single table could be just as simple as a plain text file

At the moment I output separate tables for transactions and postings, which
can be joined, but it still doesn't allow you to make great queries that
involve the kind of filtering that is convenient for generating, e.g.
balance sheets.

as above [1] , you are incorrect

(I think I expressed myself clearly enough with the above.)



The Beancount Query Language will specifically address these issues by:
- allowing filtering to occur at two levels in a single query:
 * filtering at the entries level (with predicates on postings allowed),
plus

free with SQL

 * filtering at the postings level
- support inventory arithmetic and aggregation functions on inventory
objects and rendering functions (e.g., COST()).
- support flattening of multiple lots (to produce lists of holdings each
with their cost basis)

free with SQL if you know what you are doing

I explained the problems above.

If you prefer to use straight-up SQL, bean-sql creates the corresponding database for you and you can use that if you wish. Ledger has an equivalent conversion to SQL (I haven't looked at the tables it produces but it's fair to assume there's more than one table). You're welcome to copy (within the constraints of GPL) and change the script to make it output its tables differently if you wish, this should be simple. In fact, we welcome this kind of experimentation. Show us how you can generate all the same reports using SQL only.



- explicitly support the "close" operation as syntax, that inserts
transactions to clear income statement accounts to equity and removes past
history

Ah, now you see, I disagree with that from an accounting and mathematical POV.  I explicitly don't want that.  Transactions are a continuum over time, reporting on them may be necessary because one or other government says so but does what actually happened change because I have to state a position on a date to an authority ?  I say that is a report not a transaction, my accounts close when I die.

I think I might need to explain more clearly what I mean by "close" here. You read this wrong. I'm referring to previous discussions that involved "closing the year." The process of "closing" in Beancount refers to 

- the insertion of transactions at a particular date that transfer the balances of the Income and Expenses accounts to an Equity account (previous earnings) to zero them out

- the insertion of a conversion entry to account for changes due to currency conversion (to address the same problem you've been referring to as "Trading accounts")

- the removal of previous transactions before the given date and simultaneous insertion of "opening balances" transactions that maintains the initial balance of those accounts

The idea is that after closing at a particular date, you have (1) removed history and replaced it by summarizing transactions that initialize the balances correctly and (2) income statement accounts are zero'ed out. This results in a list of entries prepared to report on an exercise period that begins on the given date.

You can find the source code in this file:

I'm going to explain this operation with some simple diagrams in a presentation I'm preparing. This is not experimental work... I've been generating balance sheets and income statements with this process on my entries for years now. This works well.


 
I claim that with the features above you can explicitly model all the
queries being done in Ledger, HLedger and Beancount.

I've read you say things like that before. Ho hum.

So the claim here is an interesting one actually, it's not just me "saying things like that": so far, Ledger, Beancount and HLedger all have produced reports generated by separate custom procedures for producing "balances" and "journals" (or "registers" as Ledger calls them). What I'm claiming is that we can _unify_ the generation of both types of reports from a single syntax, simply by 

(1) supporting aggregation functions that work on the inventory container objects that we use internally to sum up position changes from postings, and
(2) by providing access to a *special* column (I call it "balance"), whose value is dynamically computed to be the running sum of all the postings rendered so far.

A "journal" just iterates over its postings, optionally rendering the running balance by selecting the "balance" column:

  SELECT date, account, change, balance FROM ...

A "balance" report is simply an aggregation on the position of each posting:

  SELECT account, sum(change) FROM ... GROUP BY account

So the end result should be that instead of having multiple "custom" report codes, we should be able to define a single query language that addresses both types of outputs ... and possibly many more. I think there's potentially great power hiding in that generalization!



Within a few minutes you will start seeing your data and wanting to make
some impression on it like "I want to put the transaction and transaction
lines seperately" ... which is what you did :)


I don't understand this sentence.

What is unclear?

you wrote bean-sql to represent a plain text beancount file as more than one table, why ? because it represented the data well is my suggestion.

I still don't understand what you intend to say with this, but see my example above. It should be clear that two tables with a join is the minimum required to represent the data correctly.

So here's the thing: we're interested in solving problems like these:
- Computing returns on portfolios in the face of contributions and distributions, including commissions and other costs;
- Automatically handling cost basis for washed sales;
- Tracking of invoices and their related payments;
- Tracking investments with average cost basis;
and much, much more.

Do you have any solutions for problems like these?
If so, can we see some code?
How about some ideas in a sufficient amount of detail we can discuss them?

You're welcome to complain, but unless you provide sufficient and specific detail so we can have a technical conversation about the problems (and not the people), it's not very productive. Coded solutions are best.


Wm...

unread,
Oct 19, 2014, 10:15:08 PM10/19/14
to bean...@googlegroups.com, ledge...@googlegroups.com
Sun, 19 Oct 2014 15:23:10
<CAK21+hMYzN3BXaCJPV++RxZCm=h7fUxSNXWgQj...@mail.gmail.com>
Martin Blais <bl...@furius.ca> wrote...

Wm:
Only if you throw in unnecessary stuff, a query need not include junk

>Do you not see that the entry fields are duplicated?

Nope, they aren't duplicated

>Ok, so you might think, this is not much of a problem.

It is worse than that, I see no duplication at all. I see legs / splits
/ whatever you want to call them of a transaction.

>Well, you _can_ generate many meaningful reports this way, but not all
>(read on). This is conceptually how Ledger works AFAIK: the filter applies
>against the list of postings, as they appear in the file (Ledgerfolk:
>please correct me if I'm wrong, this is how I understand it to be).

STOP: you are arguing beancount think vs ledger-cli think

I am not part of that argument!

>Deduplication
>
>The problem I referred to as deduplication is that of combining rows of the
>same transaction. This is how you would normally want to render a journal's
>entries. For example:
>
> 2014-09-12 * "WALMART"
> Assets:Checking -45.00 USD
> Expenses:Toys 15.00 USD
> Expenses:Toys 30.00 USD
>
>This should ideally render as a single entry in a journal for
>Expenses:Toys. Beancount's journal rendering routines do this
>automatically. A GROUP BY for all entry columns for a result set on a list
>of postings would be awkward.

Puzzled: why do you think SQL would struggle with this?

I admire your thinking in the sense that you seem to ignore input from
the real world but don't understand why you can't see that a query can
produce a result. Is it possible you are unfamiliar with SQL? <-- a yes
or no will do

>Operations on Inventories
>
>A more difficult problem is that the Ledgerverse systems admit any number
>and type of units to be stored and accumulated in association with an
>account. Accounts don't have associated currencies. This means that summing
>the changes defined by postings is not a simple arithmetic operation.

yup, you can have sums of something if you want, all you have to do is
say there is a something associated with a number

>Inventory objects are more like "mappings" that contains associations of
>(currency, cost, acquisition-date) to a number of units. Adding a new
>position has one of two effects: it either creates a new lot, or if there
>is already a lot for that (currency, cost, acquisition-date), it adds units
>to that lot. Reducing a position is handled differently by Ledger and
>Beancount:

... and don't we know it! Do you think I'd be here if I didn't already
know about the differences? FFS, man!

>in Ledger, the operation works the same as with the addition of
>new lots, except with a negative number. It allows positive and negative
>lots in the same inventory. This naturally supports other methods of
>booking (you can average all the cost bases, for example) but has the
>unfortunate effect of admitting impossible situations, such as an inventory
>with a negative cost basis.

Not impossible if you have bought stuff that is yet to be delivered.
Ordinary business.

> Reducing a position in Beancount is more
>strict:

we know that, I am asking why are you going to all this trouble? I
think it is something personal because it doesn't reflect ordinary
commerce

>it requires that an existing lot exist to be removed from.

nope

>Therefore, it never allows for both positive and negative numbers for the
>same lot. This creates some complications of its own, in particular when
>attempting to book positions at average cost (the lots of an inventory need
>to be reaggregated) but has the advantage of preventing impossible
>situations and resulting only in correct inventories. Both methods have
>benefits and difficulties.

I think you are much to concerned with exactness. There is a name for
this but I like you enough not to say it.

>Such operations on inventories are difficult to model in SQL. I don't think
>it can be modeled easily. I think it's preferable to just provide a query
>engine that natively supports such an inventory object upon which positions
>can be aggregated.

OK, surely the limitation here is your imagination rather than any
practical db implementation?

>My new syntax

I've heard that once to many times and is the reason I won't encourage
people to maintain anything --> beancount. in simple terms you can't
make up your mind what you want

>will allow the user to join on the resulting inventory
>positions to flatten them out to detailed lists of holdings (these are
>useful to compute attributes of portfolios). For example, if the balance of
>an account (or a group of accounts) is 20 GOOG, 25 AAPL and 30 MSFT, by
>default a result would output a single row like this:
>
> .... 20 GOOG, 25 AAPL, 30 MSFT

it shouldn't, actually

>So the user should be able to provide a special option ("FLATTEN") like
>this:
>
> SELECT ... FROM ... WHERE ... FLATTEN;
>
>to produce one result row for each lot, like this:
>
> .... 20 GOOG
> .... 25 AAPL
> .... 30 MSFT

which SQL would do naturally

What are you trying to do?

I stopped reading, sorry

I'll try and find the time tomorrow to read the rest but to be honest I
don't see your point. I raise the issue of, "if no-one is getting my
point should I still be making it after a point in time ?"

Unless I have missed something (and I have tried reading what you have
said) you aren't solving any real world accounting problems.

--
Wm...

Reply all
Reply to author
Forward
0 new messages