Issue #215: Library of importers, for different filetypes and sources (blais/beancount)

37 views
Skip to first unread message

Michael Droogleever

unread,
Feb 13, 2018, 3:38:54 PM2/13/18
to bean...@googlegroups.com
New issue 215: Library of importers, for different filetypes and sources
https://bitbucket.org/blais/beancount/issues/215/library-of-importers-for-different

Michael Droogleever:

People are lazy. New users want to see software work for them before they commit to using it. For most new beancount users, especially those that are not already using a double entry accounting system, this involves importing data exported from their financial institutions. Beancount does not include many importers at the moment, there is no formal way of requesting for new ones, and anyone who does make one for themselves has no obvious way of sharing this with other users. The end result, frequently, is many unknown repos containing individual importers.

Suggestion:

+ Provide a simple way for those who do not wish to program an importer to provide sample files that need to be imported (and give guidelines for anonymising them).

+ Use this to make a list of importers which need to be made, ideally sorted using some sort of vote system to determine interest.

+ Create a procedure, naming scheme, save location etc. for adding importers. Invite people to make PRs to add importers. Simplify the process and encourage tests.

+ Finally, advertise the list of importers included with beancount, "batteries included", and provide simple generic guides on how to use them.

The goal is to have a sizeable coverage of the major financial institutions around the world and their file formats. This will hopefully increase the interest in this software and plain text accounting.


Michael Droogleever

unread,
Feb 22, 2018, 4:54:35 PM2/22/18
to Beancount
Hi Martin,

Thanks for the answer, it should have been obvious this was not a new idea, but I had sadly not come across LedgerHub nor its post-mortem in my reading; apologies for getting you to write it all out again. Seems you had a really good attempt but got a strong dose of realism about the amount of interest. Would probably need a critical mass of existing content to really get going.

If the CSV importer refactor I attempted does get pulled in, I might have a go at a revival (but limiting it to beancount specific). I'd probably solve the anonymity issues by encouraging that patches be sent in by email which will then use a dummy account to create PRs..., and maybe manually recreate the exact same beancount data in all the various formats instead of trying to anonymise personal data, avoiding risking leaking something. Regression testing will be affected, but I don't think stability is that important. It might also be possible to find enough info online about various bank formats to create a good number importers from the get go.
I'll post here if/when I do something.
 

Hi Michael,

I suspect you may not be familiar with the history on this; there's a fair amount of background.

The "ingest" code that lives inside of Beancount today was born in a different project I made, called LedgerHub, whose intent was precisely to do this. You can read more about it here:https://docs.google.com/document/d/11u1sWv7H7Ykbc7ayS4M9V3yKqcuTY7LJ3n1tgnEN2Hk/ (design doc) Beancount wasn't originally going to contain any importing code, it was to be done there, and the dream was to have people contribute many importers.

It did not really take off; I have found that people do not share their importers, or that there is insufficient overlap between all our institutions that it's not worthwhile to share them. I wrote a detailed post-mortem here: https://docs.google.com/document/d/1Bln8Zo11Cvez2rdEgpnM-oBHC1B6uPC18Qm7ulobolM/

Moveover, anonymizing downloads from institutions is really time-consuming and uncertain and I found myself unable to really do that, even for all the importers I shared. The best way to test your importers is to run regression tests on real files, and that is best done in the privacy of a personal code repository. Finally, I think that even sharing the list of the particular financial institutions you use can consist in a security liability.

I have since ported over all the "common" code to beancount.ingest, where it lives today. The CSV importer that lives there is intended as an example, though it's growing to become more and more useful (it's the basis for my own CSV importers, for instance).

If you really believe in the importer sharing idea I would encourage you to start another repository and do this. This should not be integrated in the Beancount repo, however.

Thanks, and happy to discuss more on the mailing-list,

Reply all
Reply to author
Forward
0 new messages