Scaling Ledger - Automation

Russell Adams

unread,

Feb 24, 2012, 9:59:18 PM2/24/12

to ledge...@googlegroups.com

Out of the many issues I've had "scaling up" automation has been
fairly easy for my specific case. It's worth bringing up because it is
unlikely that a large Ledger would be entirely written by
hand. Whether you are dealing with stock values, or bank and credit
card statements automation ought to be the first priority.

Ledger's goal is to provide reporting on the data files, but creating
those files is left as an exercise to the user. Perhaps this is
another place where a UI could be useful, as an editor that
compliments the command line reporting.

I'm surprised when I hear from many Ledger users that they hand create
so many txns. When forced to manually enter, I use yasnippet, but 99%
of my txns are created programatically.

I utilize a single credit card as often as practical while traveling,
so that I can import that data reliably from my bank. Using this as my
primary data feed ensures I catch any unusual transactions (ie: fraud,
cancellation fees, etc).

I wrote CSV2Ledger to automate the import of CSV data into the Ledger
format, and to automate as much account, category, file and metadata
matching as I possibly could. This is such a common task that Ledger
and Hledger have some new automation options, and there are many
competing projects for importing data into Ledger.

Automation requires that I have a well defined storage method; in my
case importing only to a single transient queue file with manual
relocation to permanent storage distributed across multiple files.

I'm heavily dependent on deduplication because my CSV files I download
often have overlapping date ranges. I added the ability to tag each
txn with an md5sum of the original CSV to CSV2Ledger for this purpose,
and use rough text matching (ie: grep) with optional caching to
prevent duplication.

I hope to look into using OFX files later and the unique ID they
assign each txn at the bank instead of md5, but thats a future item.

------------------------------------------------------------------
Russell Adams RLA...@AdamsInfoServ.com

PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/

Fingerprint: 1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3

Zack Williams

unread,

Feb 25, 2012, 12:21:56 AM2/25/12

to ledge...@googlegroups.com

On Fri, Feb 24, 2012 at 7:59 PM, Russell Adams
<RLA...@adamsinfoserv.com> wrote:
> Out of the many issues I've had "scaling up" automation has been
> fairly easy for my specific case. It's worth bringing up because it is
> unlikely that a large Ledger would be entirely written by
> hand. Whether you are dealing with stock values, or bank and credit
> card statements automation ought to be the first priority.
>
> Ledger's goal is to provide reporting on the data files, but creating
> those files is left as an exercise to the user. Perhaps this is
> another place where a UI could be useful, as an editor that
> compliments the command line reporting.

I tend to agree with you here. I've got a similar collection of
scripts of various kinds, ranging from simple automation to a
semi-complete scanned pdf -> tesseract OCR -> fuzzy matching
classification engine -> MacRuby file sorting gui, which I haven't
hooked up to matching ledger transactions yet.

There's definitely a place out there for a "munge ledger files in
interesting ways" tool - for example, if it could split or sort a
ledger file by criteria? Or invert all transactions in a ledger file?
While this violates the idea of the ledger files standing on their
own as input only to ledger calculation programs, there's nothing to
say that a text editor is the only suitable tool for modifying that
input.

> I utilize a single credit card as often as practical while traveling,
> so that I can import that data reliably from my bank. Using this as my
> primary data feed ensures I catch any unusual transactions (ie: fraud,
> cancellation fees, etc).
>
> I wrote CSV2Ledger to automate the import of CSV data into the Ledger
> format, and to automate as much account, category, file and metadata
> matching as I possibly could. This is such a common task that Ledger
> and Hledger have some new automation options, and there are many
> competing projects for importing data into Ledger.

Have you tried hledgers CSV conversion? I tried using both, and while
CSV2ledger has more features, and found I preferred hledger's single
configuration file, and the fact that it didn't modify that file when
used.

> I'm heavily dependent on deduplication because my CSV files I download
> often have overlapping date ranges. I added the ability to tag each
> txn with an md5sum of the original CSV to CSV2Ledger for this purpose,
> and use rough text matching (ie: grep) with optional caching to
> prevent duplication.

A side note there's a 2 line pull request from me on github that
switches the CSV2ledger code to use hashlib instead of md5 which is
deprecated and throws warnings in recent versions of Python.

- Zack

Russell Adams

unread,

Feb 25, 2012, 1:39:23 AM2/25/12

to ledge...@googlegroups.com

On Fri, Feb 24, 2012 at 10:21:56PM -0700, Zack Williams wrote:
> On Fri, Feb 24, 2012 at 7:59 PM, Russell Adams
> <RLA...@adamsinfoserv.com> wrote:
> > Out of the many issues I've had "scaling up" automation has been
> > fairly easy for my specific case. It's worth bringing up because it is
> > unlikely that a large Ledger would be entirely written by
> > hand. Whether you are dealing with stock values, or bank and credit
> > card statements automation ought to be the first priority.
> >
> > Ledger's goal is to provide reporting on the data files, but creating
> > those files is left as an exercise to the user. Perhaps this is
> > another place where a UI could be useful, as an editor that
> > compliments the command line reporting.

I've got code atm that picks txns by paragraph for reorganizing, but
that's a good point that we may just need better bulk manipulation tools.

Nope. CSV2Ledger does almost everything I need. Its a generic rules
matching and transformation engine, so it's quite versatile.

>
> > I'm heavily dependent on deduplication because my CSV files I download
> > often have overlapping date ranges. I added the ability to tag each
> > txn with an md5sum of the original CSV to CSV2Ledger for this purpose,
> > and use rough text matching (ie: grep) with optional caching to
> > prevent duplication.
>
> A side note there's a 2 line pull request from me on github that
> switches the CSV2ledger code to use hashlib instead of md5 which is
> deprecated and throws warnings in recent versions of Python.
>
> - Zack
>

CSV2Ledger's on launchpad, and in Perl... Sure you're thinking the
right one?

Thanks.

Zack Williams

unread,

Feb 25, 2012, 9:19:01 AM2/25/12

to ledge...@googlegroups.com

> CSV2Ledger's on launchpad, and in Perl... Sure you're thinking the
> right one?

Ah, my bad - I was thinking icsv2ledger. Sorry, it's been a long week.
Also the md5 thing threw me off as both scripts tag transactions
with it.