An interesting "feature by coincidence"

480 views
Skip to first unread message

John Wiegley

unread,
Apr 13, 2014, 3:14:25 PM4/13/14
to ledge...@googlegroups.com
Lately I've taken to splitting transactions from financial institutions into
their own files, rather than having everything piled into one gigantic file.
This allows me to adopt a "one-file-per-statement" approach which makes heavy
reconciling jobs a lot easier. I take the approach of asserting at the
beginning and end of each file that it matches the opening and ending balances
on the corresponding statement.

However, there was a key limitation to this approach when dealing with balance
transfers. For example, in Checking.dat file I might have a credit card
payment, while in MasterCard.dat I also have the same payment. I need to
represent it on both sides for the opening/ending balances to match.

So, a little while back I added the notion of "UUID" metadata tags, with some
special magic which says:

- If two transactions have the same UUID, they must also have the same
number of postings, to the same accounts, in the same amounts. The date,
payee and metadata are not checked for the notion of "transactional
identity".

- Only keep the first such transaction.

What I didn't know until today is that *balance assertions are performed
separately in each of these equivalent transactions*. So even though Ledger's
reporting behaves as though it only saw the first such transactions, each
separate transaction with the same UUID can have its own balance assertions.

I found this out because I sat down today to add this very feature to Ledger.
My first step was to write a test proving that it didn't work, when in fact it
is doing just what I needed it to do!

Here's what this looks like in code:

2012-01-01 Test
Expenses:Unknown $100.00
Liabilities:MasterCard

2012-01-02 Test
Expenses:Unknown $100.00
Liabilities:MasterCard

2012-01-03 Test
Expenses:Unknown $100.00
Liabilities:MasterCard

2012-01-04 Test
; UUID: foo
Liabilities:MasterCard $300.00 = $0
Assets:Checking

2012-01-01 Test
Assets:Checking $150.00
Income

2012-01-02 Test
Assets:Checking $150.00
Income

2012-01-03 Test
Assets:Checking $150.00
Income

2012-01-04 Test
; UUID: foo
Liabilities:MasterCard $300.00
Assets:Checking $-300.00 = $150.00

Both balance assertions are in effect, which you can see by changing either
one of them. And yet only the first transaction is present in the final
report.

John

John Wiegley

unread,
Apr 13, 2014, 8:54:55 PM4/13/14
to ledge...@googlegroups.com
>>>>> John Wiegley <jo...@newartisans.com> writes:

> I found this out because I sat down today to add this very feature to
> Ledger. My first step was to write a test proving that it didn't work, when
> in fact it is doing just what I needed it to do!

Found a slight wrinkle here: It works fine for one transaction, as in the
test, but when there are multiple, they collide with each other (because the
system doesn't know yet that a UUID will occur multiple times).

I think this will mean postponing balance assertion checking until the very
end of the parse, so that I can be sure to have resolved all identities.

I also want to add the following:

- A flag to ignore balance assertions altogether. This can be useful when
trying to track something down.

- If multiple entries with the same UUID are found, merged their metadata in
some way.

John

John Wiegley

unread,
Apr 14, 2014, 12:19:56 AM4/14/14
to ledge...@googlegroups.com
>>>>> John Wiegley <jo...@newartisans.com> writes:

> - A flag to ignore balance assertions altogether. This can be useful when
> trying to track something down.

This is now an additional meaning applied to --permissive.

> - If multiple entries with the same UUID are found, merged their metadata in
> some way.

Still not sure how to handle this in any correct way, so I'm deferring in
order to think about it.

The solution I had to come up with to properly support cross-file balance
assertions was to add a new concept: deferred postings. For example:

2014/04/13 Foo
; UUID: foouuid
Liabilities:Credit Card $100 = some value
<Assets:Checking>

and then later:

2014/04/13 Foo
; UUID: foouuid
Liabilities:Credit Card $100
Assets:Checking = some value

What happens is that when the first transaction is seen, the postings whose
accounts are marked with angle brackets are not applied at that time. It is
as if they had not been seen, except they will be used for balancing
calculations, as usual.

Later, when another transaction of the same UUID is seen, deferred postings
from that UUID, for the posted account, are applied as if they had been seen
then. Note: it applies the previously seen posting, not the posting from the
later transaction. Doing so would be a bit more complex, although I'm open to
thinking of a safe way to do it.

Lastly, I intend to add an error message if there are any "leftover" deferred
transaction at the end of parsing the journal.

John

Alexandre Rademaker

unread,
Apr 14, 2014, 7:40:44 AM4/14/14
to ledge...@googlegroups.com
Hello John,

What do you mean by one file per statement? Could you elaborate on that?

Alexandre
Sent from my iPhone

Craig Earls

unread,
Apr 14, 2014, 7:48:18 AM4/14/14
to ledge...@googlegroups.com, ledge...@googlegroups.com
He means using a separate text file for each banking institution instead of throwing all transactions into a single large file. 
--
Craig


--

---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


John Wiegley

unread,
Apr 14, 2014, 9:30:10 AM4/14/14
to ledge...@googlegroups.com
>>>>> Alexandre Rademaker <arade...@gmail.com> writes:

> What do you mean by one file per statement? Could you elaborate on that?

For example, I have these files now:

include MasterCard/2014/140117.dat
include MasterCard/2014/140214.dat
include MasterCard/2014/140318.dat

One file for each banking statement sent to me from my institution, so that
the file concludes with an assertions that the closing balances match.

John

Zack Williams

unread,
Apr 14, 2014, 11:49:51 AM4/14/14
to ledge...@googlegroups.com
On Sun, Apr 13, 2014 at 12:14 PM, John Wiegley <jo...@newartisans.com> wrote:
> However, there was a key limitation to this approach when dealing with balance
> transfers. For example, in Checking.dat file I might have a credit card
> payment, while in MasterCard.dat I also have the same payment. I need to
> represent it on both sides for the opening/ending balances to match.

Given the same problem (multiple files, one per statement, same
transaction in multiple) I came up with a Transfer" account that
balances out to zero when both sides of a transaction exist:

https://groups.google.com/forum/#!topic/ledger-cli/K7EgJQuEQ_M

It isn't as elegant as this solution (I'd love to have a single
transaction per transaction), but avoids having to apply UUID's to
things - is that a manual or automatic step in your process?

- Zack

Johann Klähn

unread,
Apr 14, 2014, 4:32:40 PM4/14/14
to ledge...@googlegroups.com
On Mon, Apr 14, 2014 at 5:49 PM, Zack Williams <zdw...@gmail.com> wrote:
It isn't as elegant as this solution (I'd love to have a single
transaction per transaction), but avoids having to apply UUID's to
things - is that a manual or automatic step in your process?

The ledger 'convert' command will automatically create an UUID by hashing the CSV line that was used to create that transaction.
When provided with account and payee definitions the convert command works really well:

account Aufwand:Einkauf:Lebensmittel
    payee ^(Aldi|Alnatura|Kaufland|REWE)$
payee Aldi
    alias ^ALDI SUED SAGT DANKE

In this example "ALDI SUED SAGT DANKE" appears on my bank statement but I want "Aldi" in my ledger.
When I put the above in my ledger file and call `ledger convert CSV_FILE -f JOURNAL --account "Aktiva:Bank" --rich-data --invert`
(plus some fighting with the csv format my bank uses) the output will be something like:

2013/05/21 * Aldi
    ; CSV: 21.05.2013,21.05.2013,ALDI SUED SAGT DANKE,…etc…
    ; Imported: 2014/04/14
    ; UUID: 5011a74eb…58aa38a71fdc6475
    Aufwand:Einkauf:Lebensmittel              16,37€
    Aktiva:Bank                                     -16,37€ = …account balance…

Stefano Zacchiroli

unread,
Apr 14, 2014, 4:41:11 PM4/14/14
to ledge...@googlegroups.com
On Mon, Apr 14, 2014 at 08:49:51AM -0700, Zack Williams wrote:
> Given the same problem (multiple files, one per statement, same
> transaction in multiple) I came up with a Transfer" account that
> balances out to zero when both sides of a transaction exist:
>
> https://groups.google.com/forum/#!topic/ledger-cli/K7EgJQuEQ_M

Interesting. I stumbled upon this very same problem just yesterday and
ended up, at first, implementing your solution --- even though I used an
"Accounts payable:Credit card" account, because in my case I have a
significant delay between the transaction listed on my credit card
statement and the corresponding one in my bank statement, so that seemed
to make sense.

After reading John's post today, I've switch to his solution, which
looks more elegant. However, I've also noticed a drawback: ledger
implements the UUID trick, but hledger doesn't. In a sense, the transfer
account solution is more portable.

FWIW, my current use case for hledger is to expose (on a private
website) transactions to my family, whereas I use ledger-cli for my own
processing of the very same transactions. If there are better solutions
to expose (even read-only) ledger transactions on the web, I'd like to
hear about them.

Cheers.
--
Stefano Zacchiroli . . . . . . . za...@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Zack Williams

unread,
Apr 14, 2014, 6:07:55 PM4/14/14
to ledge...@googlegroups.com
On Mon, Apr 14, 2014 at 1:32 PM, Johann Klähn <kljo...@gmail.com> wrote:
> The ledger 'convert' command will automatically create an UUID by hashing
> the CSV line that was used to create that transaction.

Wouldn't the "UUID" be different if the lines were different? I'd
imagine that the line in the CSV from a credit card company wouldn't
be identical to the same transaction as provided by a bank. Or do you
need to manually copy the UUID into both files?

As an aside, if this is how UUID's are used in ledger, it's going to
confuse people that are used to UUID's being unique, randomly
generated, and meaningless (ala those generated by "uuidgen" or
similar), rather than meaningful matchable hashes of content. Maybe
calling them a "hash" or even better using the hash method as the name
("md5", "sha1", etc.) would help disambiguate the situation.

- Zack

Zack Williams

unread,
Apr 14, 2014, 6:43:09 PM4/14/14
to ledge...@googlegroups.com
On Mon, Apr 14, 2014 at 3:07 PM, Zack Williams <zdw...@gmail.com> wrote:
> As an aside, if this is how UUID's are used in ledger, it's going to
> confuse people that are used to UUID's being unique, randomly
> generated, and meaningless (ala those generated by "uuidgen" or
> similar), rather than meaningful matchable hashes of content. Maybe
> calling them a "hash" or even better using the hash method as the name
> ("md5", "sha1", etc.) would help disambiguate the situation.

Or maybe just "id" - either "hash" or "UUID" is are overly specific,
and if all that's being done here is a straight text comparison (per
John's "foo" example), the SQL-esque "id" (or even "uid") would be
much less confusing.

- Zack
Reply all
Reply to author
Forward
0 new messages