best way to slurp bank records?

264 views
Skip to first unread message

Eric Abrahamsen

unread,
Sep 23, 2012, 9:34:48 AM9/23/12
to ledge...@googlegroups.com
I've got some text files from my Chinese banks formatted according to
their own system. I'm planning on just using Python to slurp the file
in, but I'll need to write it out to ledger format. What's the best way
of doing that? I didn't see anything in the /python or /contrib folders
that look like ledger writers. Should I just use a string template?

Given that I'm conversant in python and bash, and not likely to be
learning any C/C++ anytime soon (though I can copy-and-paste with the
best of them), what's my best option for writing a ledger file from
python/bash data?

Thanks!
Eric

Zack Williams

unread,
Sep 24, 2012, 12:51:35 AM9/24/12
to ledge...@googlegroups.com
On Sun, Sep 23, 2012 at 6:34 AM, Eric Abrahamsen
<er...@ericabrahamsen.net> wrote:
> I've got some text files from my Chinese banks formatted according to
> their own system. I'm planning on just using Python to slurp the file
> in, but I'll need to write it out to ledger format. What's the best way
> of doing that? I didn't see anything in the /python or /contrib folders
> that look like ledger writers. Should I just use a string template?

I personally separated the tasks - I have one script that goes from
input file (in my case, PDF's from banks) to a CSV file, then another
script that goes from CSV to ledger. There's a plethora of tools
that go from CSV to ledger.

> Given that I'm conversant in python and bash, and not likely to be
> learning any C/C++ anytime soon (though I can copy-and-paste with the
> best of them), what's my best option for writing a ledger file from
> python/bash data?

I'd figure out whatever templating library you like in Python if you
want to handle that portion of it.

If you want something to start with, I wrote my own in ruby (mainly
because it has the ERB templating engine built in):
https://github.com/zdw/rubycsv

I was using hledger for this, but needed to generate complex
multi-line entries, such as ones with purchase/tax/shipping broken out
from a single CSV line. These are common problems when dealing with
payment processors like Square and Paypal.

- Zack

thierry

unread,
Sep 24, 2012, 6:19:46 PM9/24/12
to ledge...@googlegroups.com
Hi Eric,

I personally use a homemade perl script based on pdftotext, which then parse the text output and write ledger syntax.

As suggested by Zack, I could have done PDF to CSV and then using a CSV to Ledger tool. But none satisfied my needs.
Anyway, here is my personal notes about some CSV to Ledger tools. Pros are noted '+', Cons are noted '-'.

Included in John Wiegley's ledger, there is the "convert" command
See ./test/baseline/cmd-convert.test
- can not rename payee

Included in Simon Michael's hledger, csv files can be imported natively
http://hledger.org/MANUAL.html#other-file-formats
+ can rename payee

csv2ledger, from Russell Adams
https://code.launchpad.net/~rladams/csv2ledger/main
- does not support unicode
+ can rename payee

icsv2ledger, from Quentin Stafford-Fraser
https://github.com/quentinsf/icsv2ledger
- code does not look to be unicode compliant
- can not rename payee

Reckon, from  Andrew Cantino
https://github.com/iterationlabs/reckon
- can not rename payee

--
Thierry

Peter Ross

unread,
Sep 24, 2012, 9:10:17 PM9/24/12
to ledge...@googlegroups.com
On 25 September 2012 08:19, thierry wrote:
> icsv2ledger, from Quentin Stafford-Fraser
> https://github.com/quentinsf/icsv2ledger
> - code does not look to be unicode compliant
> - can not rename payee
>
You can rename the payee now.

It also has the advantage that it remembers the renaming of payee (and
account), so that if you have the exact same payee in the future it
automatically suggests your renaming.

It also has the advantage that it does tab completion on account names
and payees, so that you can type just a few letters for the payee and
press tab to get complete payee name.

I personally use this tool (and contributed the rename payee code) and
for me it works how I want.

Eric Abrahamsen

unread,
Sep 26, 2012, 2:42:21 AM9/26/12
to ledge...@googlegroups.com
Thanks for all the responses! Given that I've really only got the one
format to handle, and that I can see what icsv2ledger is doing in terms
of writing ledger entries from Python, I'll probably just skip the
intermediary CSV step and go straight to ledger.

Thanks again,

Eric

thierry

unread,
Sep 29, 2012, 8:37:29 AM9/29/12
to ledge...@googlegroups.com
Thanks Peter for this message. I re-discovered the tool, and finally switched to it! The 'i' of icsv2ledger ('i' meaning interactive) is what convinced me. I forked and pulled some changes in github.

I have others changes but they may break some existing functionalities. I write below what I have in mind, that may start a discussion about them:
- I want to use icsv2ledger through unix pipes. That means ability to read from stdin (I think this is easy implementing this using fileinput python module), and ability to write to stdout. This latter will break existing way of working, as the default is "csv_filename" which csv extension is replaced with .ledger. I procrastinate on how to implement it: 3 different outputs can be used : stdout, csv_filename.ledger or newfilename. But the unix philosophy would teach me to use only stdout and newfilename. But this break the existing. And also, this may suppress the ability to treat several input files. Any advise welcomed.
- I believe that --no-output-tags and --read-file can be moved to config file (instead of executable option)

Thierry

Peter Ross

unread,
Sep 30, 2012, 2:08:48 AM9/30/12
to ledge...@googlegroups.com
On 29 September 2012 22:37, thierry <thierry....@free.fr> wrote:
> Thanks Peter for this message. I re-discovered the tool, and finally
> switched to it! The 'i' of icsv2ledger ('i' meaning interactive) is what
> convinced me. I forked and pulled some changes in github.
>
> I have others changes but they may break some existing functionalities. I
> write below what I have in mind, that may start a discussion about them:
> - I want to use icsv2ledger through unix pipes. That means ability to read
> from stdin (I think this is easy implementing this using fileinput python
> module), and ability to write to stdout. This latter will break existing way
> of working, as the default is "csv_filename" which csv extension is replaced
> with .ledger. I procrastinate on how to implement it: 3 different outputs
> can be used : stdout, csv_filename.ledger or newfilename. But the unix
> philosophy would teach me to use only stdout and newfilename. But this break
> the existing. And also, this may suppress the ability to treat several input
> files. Any advise welcomed.

I think that the best option would be to use "-" as the filename which
represents stdin and when it's "-" then just send the output to stdout
rather than opening a file based on the input filename.

> - I believe that --no-output-tags and --read-file can be moved to config
> file (instead of executable option)
>
Yes that would be fine as well.

thierry

unread,
Sep 30, 2012, 3:31:34 PM9/30/12
to ledge...@googlegroups.com
Hi,

I have another change to propose to icsv2ledger. Why is there an account mapping file AND a payee mapping file? Why not merging into a single file?
The two files have CSV format, first column is regexp, and second is account or payee. What bother me is that I have to change the regexp in both files. I would propose to have a single CSV file with 3 columns regexp,account,payee, (and if needed a variable in config file to use only one column or the other). Thanks for any feedback.

Thierry

Zack Williams

unread,
Oct 1, 2012, 1:29:17 AM10/1/12
to ledge...@googlegroups.com
On Sun, Sep 30, 2012 at 12:31 PM, thierry <thierry....@free.fr> wrote:
> I have another change to propose to icsv2ledger. Why is there an account
> mapping file AND a payee mapping file? Why not merging into a single file?

The rubycsv code I wrote does this. Actually it has a generic
"tablematch" function that can do an arbitrarily large number of regex
mappings. And it can do multi-entry transactions. And it's not
ledger-specific - it can do CSV to anything if that's a plus for
anyone.

The only downside compared to any of the other CSV programs is that
it's not interactive, and you have to write the mapping tables as a
ruby array.

- Zack

Russell Adams

unread,
Oct 1, 2012, 2:24:22 AM10/1/12
to ledge...@googlegroups.com
Not to beat a dead horse further, but CSV2Ledger (my perl variant)
does the same thing. Dynamic renaming based on payee, account
matching, even file matching using a YAML format.

It's a common wheel to reinvent.

Best of luck.


------------------------------------------------------------------
Russell Adams RLA...@AdamsInfoServ.com

PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/

Fingerprint: 1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3

thierry

unread,
Oct 1, 2012, 5:40:57 PM10/1/12
to ledge...@googlegroups.com


On Monday, October 1, 2012 8:24:29 AM UTC+2, Russell Adams wrote:

On Sun, Sep 30, 2012 at 10:29:17PM -0700, Zack Williams wrote:
> On Sun, Sep 30, 2012 at 12:31 PM, thierry <thierry....@free.fr> wrote:
> The only downside compared to any of the other CSV programs is that
> it's not interactive, and you have to write the mapping tables as a
> ruby array.

Not to beat a dead horse further, but CSV2Ledger (my perl variant)
does the same thing. Dynamic renaming based on payee, account
matching, even file matching using a YAML format.
 
I was too using a personal perl script, doing the mapping from a table. But I tried the 'i' of icsv2ledger, and felt in love with this 'i'. On contrary, rubycsv and CSV2ledger does not (yet?) implement this 'interactiveness'. Also CSV2ledger is not unicode compliant, and I am not living in an ASCII world :-).

Peter Ross

unread,
Oct 1, 2012, 5:55:10 PM10/1/12
to ledge...@googlegroups.com
It was the interactiveness that was the key feature for me as well. I
came from using MS Money, and so was used to it automatically
suggesting payees and accounts from the string supplied by the bank,
so I wanted something similar when I processed my download from the
bank.

Russell Adams

unread,
Oct 1, 2012, 8:00:26 PM10/1/12
to ledge...@googlegroups.com
> > I was too using a personal perl script, doing the mapping from a table. But
> > I tried the 'i' of icsv2ledger, and felt in love with this 'i'. On contrary,
> > rubycsv and CSV2ledger does not (yet?) implement this 'interactiveness'.
> > Also CSV2ledger is not unicode compliant, and I am not living in an ASCII
> > world :-).
>
> It was the interactiveness that was the key feature for me as well. I
> came from using MS Money, and so was used to it automatically
> suggesting payees and accounts from the string supplied by the bank,
> so I wanted something similar when I processed my download from the
> bank.

Interactive sounds like a nice feature! I hadn't tried that.

Normally I import hundreds of records... So my focus isn't on one at a
time data entry, I want bulk and standardized records.

I'm keen on John's new emacs mode that's letting me query my files for
items to clear / update (ie: my queue in my workflow). I can now see
one line per transaction, and hit enter on them to jump to the right file.

Peter Ross

unread,
Oct 1, 2012, 8:05:52 PM10/1/12
to ledge...@googlegroups.com
On 2 October 2012 10:00, Russell Adams <RLA...@adamsinfoserv.com> wrote:
>> > I was too using a personal perl script, doing the mapping from a table. But
>> > I tried the 'i' of icsv2ledger, and felt in love with this 'i'. On contrary,
>> > rubycsv and CSV2ledger does not (yet?) implement this 'interactiveness'.
>> > Also CSV2ledger is not unicode compliant, and I am not living in an ASCII
>> > world :-).
>>
>> It was the interactiveness that was the key feature for me as well. I
>> came from using MS Money, and so was used to it automatically
>> suggesting payees and accounts from the string supplied by the bank,
>> so I wanted something similar when I processed my download from the
>> bank.
>
> Interactive sounds like a nice feature! I hadn't tried that.
>
> Normally I import hundreds of records... So my focus isn't on one at a
> time data entry, I want bulk and standardized records.
>
icsv2ledger does have a mode where if it can match the payee
automatically then it doesn't prompt the user but just uses the data
immediately. In that case you only get prompted for payees that
you've never seen before.

I don't use that because I'm not processing so many records that it's
such a big deal.

However I still suggest use the tool which best fits your workflow,
and the joy of using open formats means we can all use the best tool
that fulfils our needs.

Zack Williams

unread,
Oct 1, 2012, 9:35:10 PM10/1/12
to ledge...@googlegroups.com
On Mon, Oct 1, 2012 at 2:40 PM, thierry <thierry....@free.fr> wrote:
>
> I was too using a personal perl script, doing the mapping from a table. But
> I tried the 'i' of icsv2ledger, and felt in love with this 'i'. On contrary,
> rubycsv and CSV2ledger does not (yet?) implement this 'interactiveness'.
> Also CSV2ledger is not unicode compliant, and I am not living in an ASCII
> world :-).

My take on this is that implementing an interactive interface wasn't
worth the added complexity of handling and storing user input (rubycsv
is only 70 lines per sloccount, less than 100 if you include a complex
template).

That's not to say that interactive it's not a nice feature - it was
just a lot more work and didn't match my workflow which has fairly
deep categories that I'd prefer not to mistype.

My solution is this - rubycsv has the concept of a default value to
assign if no better match is found in the table. This is handy in
many ways (unspecified currency being another), but one way I use it
is to assign "Unknown" as the category. The process works as
follows:

1. Run the CSV -> ledger conversion.
2. Run "ledger -f infile.lgr print Unknown" to show all unidentified
transactions.
3. Add appropriate entries to the match table via text editor.
4. Repeat steps 1-3 until no transactions are in the "Unknown" category.

- Zack

Eric Abrahamsen

unread,
Oct 2, 2012, 10:33:37 PM10/2/12
to ledge...@googlegroups.com
On Tue, Oct 02 2012, Russell Adams wrote:

>> > I was too using a personal perl script, doing the mapping from a table. But
>> > I tried the 'i' of icsv2ledger, and felt in love with this 'i'. On contrary,
>> > rubycsv and CSV2ledger does not (yet?) implement this 'interactiveness'.
>> > Also CSV2ledger is not unicode compliant, and I am not living in an ASCII
>> > world :-).
>>
>> It was the interactiveness that was the key feature for me as well. I
>> came from using MS Money, and so was used to it automatically
>> suggesting payees and accounts from the string supplied by the bank,
>> so I wanted something similar when I processed my download from the
>> bank.
>
> Interactive sounds like a nice feature! I hadn't tried that.
>
> Normally I import hundreds of records... So my focus isn't on one at a
> time data entry, I want bulk and standardized records.
>
> I'm keen on John's new emacs mode that's letting me query my files for
> items to clear / update (ie: my queue in my workflow). I can now see
> one line per transaction, and hit enter on them to jump to the right
> file.

I realized, with no small embarrassment, that I don't really need an
external tool, since I use emacs for maintaining my ledger files, and
for everything else. In the course of writing some elisp functions to
consume bank records I noticed the new version of ledger-mode, and am
considering using it -- do you have any pointers or things to watch out
for? I noticed strings in there that still point to directories on
John's computer :)

E

Toby Gee

unread,
Oct 3, 2012, 11:27:09 AM10/3/12
to ledge...@googlegroups.com
Sorry - I wrote this two days ago and it's been held up in moderation, I guess.

Since then I tried icsv2ledger, and managed to do most of the below (I'm sure csv2ledger can also do it, of course), so I now have two slightly different questions, one hopefully almost trivial.

- is there an easy way in all of this (perhaps in Ledger itself?) to change the sign of all of the transactions for some account? One of my credit cards uses the opposite sign convention to all my other accounts.

- how do people deal with avoiding duplicates? Several of these importing options produce MD5 hashes from the original line in the csv file, which should presumably make this straightforward.

With my new icsv2ledger setup, each account now produces a ledger file, and I then manually copy it into my main accounts file. Of course, I could write a script to append it to the main accounts file, but is there a better way to import the transactions, avoiding any duplicates, to deal with csv files with overlapping date ranges?

On 1 Oct 2012, at 11:01, Toby Gee <tob...@gmail.com> wrote:

>
>
> So - is there a way to do this, either by the importing program analysing the accounts.dat file, or by manually setting up some rules for converting from payee to liability?
>
> I'm wondering if CSV2Ledger can do this, based on the above remark:
>
> "Dynamic renaming based on payee, account matching, even file matching using a YAML format."
>
> However I'm not sure how this works in practice - do I just need to read the comments in the perl file, learn what YAML is etc?
>

Russell Adams

unread,
Oct 3, 2012, 1:26:52 PM10/3/12
to ledge...@googlegroups.com
Inline below.

On Wed, Oct 03, 2012 at 04:27:09PM +0100, Toby Gee wrote:
> Sorry - I wrote this two days ago and it's been held up in moderation, I guess.
>
> Since then I tried icsv2ledger, and managed to do most of the below
> (I'm sure csv2ledger can also do it, of course), so I now have two
> slightly different questions, one hopefully almost trivial.
>
> - is there an easy way in all of this (perhaps in Ledger itself?) to
> - change the sign of all of the transactions for some account? One
> - of my credit cards uses the opposite sign convention to all my
> - other accounts.

CSV2Ledger has a negate option you can set based on input filename or
on the command line.

> - how do people deal with avoiding duplicates? Several of these
> - importing options produce MD5 hashes from the original line in the
> - csv file, which should presumably make this straightforward.

CSV2Ledger takes an md5sum of the original csv line and stores both
with the generated txn as metadata. Those md5sums are used to dedup
new data. There's even a caching option for working with many large
files to speed up processing.

I rely heavily on this, because when I download my credit card data it
always overlaps.

> With my new icsv2ledger setup, each account now produces a ledger
> file, and I then manually copy it into my main accounts file. Of
> course, I could write a script to append it to the main accounts
> file, but is there a better way to import the transactions, avoiding
> any duplicates, to deal with csv files with overlapping date ranges?

I use the concept of a queue file (Queue.dat) that all my imports
goto. Once they are assigned a primary expense report, they are moved
by a separate script to an expense report specific file and that is
added to my include list.

> On 1 Oct 2012, at 11:01, Toby Gee <tob...@gmail.com> wrote:
>
> >
> >
> > So - is there a way to do this, either by the importing program
> > analysing the accounts.dat file, or by manually setting up some
> > rules for converting from payee to liability?

CSV2Ledger supports several rules layers:

- Preprocess: Arbitrary regexp replacements to perform on the CSV
data prior to converting. Useful for payee renaming.

- File level: Sets command line arguments by filename (ie: import
columns, negation, fuzzy date matching, receipt file tagging
parameters, default accounts, etc)

- Account matching by payee: Using a regexp by payee, assign source
and destination accounts.

> >
> > I'm wondering if CSV2Ledger can do this, based on the above
> > remark:
> >
> > "Dynamic renaming based on payee, account matching, even file
> > matching using a YAML format."
> >
> > However I'm not sure how this works in practice - do I just need
> > to read the comments in the perl file, learn what YAML is etc?
> >
>

Enjoy!

Peter Ross

unread,
Oct 3, 2012, 5:54:13 PM10/3/12
to ledge...@googlegroups.com
On 4 October 2012 01:27, Toby Gee <tob...@gmail.com> wrote:
> Sorry - I wrote this two days ago and it's been held up in moderation, I guess.
>
> Since then I tried icsv2ledger, and managed to do most of the below (I'm sure csv2ledger can also do it, of course), so I now have two slightly different questions, one hopefully almost trivial.
>
> - is there an easy way in all of this (perhaps in Ledger itself?) to change the sign of all of the transactions for some account? One of my credit cards uses the opposite sign convention to all my other accounts.
>
Not with icsv2ledger currently.


> - how do people deal with avoiding duplicates? Several of these importing options produce MD5 hashes from the original line in the csv file, which should presumably make this straightforward.
>
Yes but icsv2ledger doesn't do anything with these hashes.

I presume the idea was to run ledger looking for transactions tagged
with the hash, and if there is one then icsv2ledger should ignore the
transaction. Patches welcome.

> With my new icsv2ledger setup, each account now produces a ledger file, and I then manually copy it into my main accounts file. Of course, I could write a script to append it to the main accounts file, but is there a better way to import the transactions, avoiding any duplicates, to deal with csv files with overlapping date ranges?

Implement the above code and add it to icsv2ledger (if that's the tool
you wish to continue to use), otherwise it has to be a manual process,
or switch to another tool.

Zack Williams

unread,
Oct 3, 2012, 8:22:24 PM10/3/12
to ledge...@googlegroups.com
> - is there an easy way in all of this (perhaps in Ledger itself?) to change the sign of all of the transactions for some account? One of my credit cards uses the opposite sign convention to all my other accounts.

This feature is in many of the CSV to ledger conversion tools already.

> - how do people deal with avoiding duplicates? Several of these importing options produce MD5 hashes from the original line in the csv file, which should presumably make this straightforward.

One solution that's useful but not pure is the "transfer account
pattern" I wrote about on this list 2012-02-19.

Basically, if you have say a credit card and a checking account, and
pay one from the other, you create a 3rd "transfer" account that, when
both are balanced, has a zero balance.

> With my new icsv2ledger setup, each account now produces a ledger file, and I then manually copy it into my main accounts file. Of course, I could write a script to append it to the main accounts file, but is there a better way to import the transactions, avoiding any duplicates, to deal with csv files with overlapping date ranges?

For the first part of this, Ledger supports including other files via:

!include /path/to/file

I use this heavily. For the latter parts... you'll have to figure
out what to do. Personally, I don't download overlapping date ranges
(as I scrape end of month PDF statements), so it's not a problem.

- Zack
Reply all
Reply to author
Forward
0 new messages