Your ingest workflow?

95 views
Skip to first unread message

Marvin Ritter

unread,
Oct 19, 2021, 3:42:14 AM10/19/21
to Ledger
Hi,

I'm once more troubled by changes in the statements provided by one of my banks. It's probably easy to fix but I thought I use the opportunity to learn what others do.

My current workflow:

1. Manually download statements (most financial institutes provide me with CSV files but I also have 1-2 PDFs). When possible I always download all transactions for the current year and replace my current file. This is easy but logging into ~5 banks, clicking through the menus and downloading the correct file takes a bit. This is one of the reasons I update my ledger less often than I would like to.
2. Run importers for all statement files for the current year. I implemented the `beancount.ingest.importer.ImporterProtocol` interface for my banks and mostly this just works. I manually map the statement file to the Importer since identifying importers wasn't always reliable. While it mostly works this is the part were things likely fail because a bank changed the format of the statements or an importer has a bug.
3. Merge new entries with my ledger. I just rely on knowing the latest transaction per account in my ledger and only adding newer entries.
4. Manually categorize. This works well. I also rely on a small plugin to find transactions between my accounts and mark once of them as a duplicate of the other.

The above workflow works but it could be smoother. So I wonder what others do and what I could learn from it.

In general I think that Beancount is awesome (and from what I read v3 will be even better). And instead of cooking my own solution I would rather contribute to existing solutions to make the import process smoother for everyone. One way I see is to recommend best practices (or a single workflow) and encourage people to collect importers (and maybe also tools for fetching statements). A bit of consistency might save us all some time here. Wouldn't your ingest workflow be covered by the steps below?

1. Fetch statements

The first step is always to fetch the transaction history from the financial institute. Automating this would be nice but seems a lot of work (websites keep changing and come in many languages) and can easily become a security risk. We shouldn't encourage users to store their passwords in plaintext. I think we should allow multiple paths here:

- User manually downloads CSV or PDF files. When banks provide multiple formats we should document which format is expected in the next step.
- Automate download by scraping the bank website (https://github.com/jbms/finance-dl seems like a good approach). This is nice for users with many accounts and who know what they are doing.
- Use APIs. The only example I know is Wise which provide a nice API to securely fetch the list of transactions by uploading a public key and keeping a token in a environment variable.

2. Parse statements into Beancount entries

I think the current `Importer` interface works. In my current workflow I don't use the ability to identify and sort files but that might change. And the rest is just a function mapping from the file with statements to a list of Beancount entries.

The latter is the biggest trouble right now. We are missing a repository that provides importers for the majority of financial institutions out there. Implementing one importer is easy but keeping 5-10 importers up to date is a lot of work. I think this maintenance could be shared but collecting importers into a single repository. This is the main issue I would like to solve.

Luckily I think performance is not critical here. Files are usually small and I only care about new files.

3. Merge with ledger

This is again tricky. I don't want to overwrite any transactions in the ledger and I don't want to create duplicates. The best solution I found so far:

Find the most recent transaction for each of my accounts in my ledger. Take all entries from the previous step that are newer than the transaction and append them to my ledger.

This works reasonable well.

4. Categorize


I have a script that adds some tags to the transactions for common transactions that I do (same grocery store twice a week) but overall I don't mind adding the second account and a comment manually.

5. Commit

This is optional. I keep my whole ledger (raw statements + beancount files) under version control. This makes it super easy to revert back to the last commit if something above went wrong.


As said, I'm mostly curious to hear what others do and how we can leverage potential overlap.

Regards,
Marvin

Martin Michlmayr

unread,
Oct 19, 2021, 4:58:13 AM10/19/21
to ledge...@googlegroups.com
* Marvin Ritter <marvin...@gmail.com> [2021-10-19 00:42]:
> I'm once more troubled by changes in the statements provided by one of my
> banks. It's probably easy to fix but I thought I use the opportunity to
> learn what others do.

Just curious because you posted this to the ledger mailing list, talk
about the beancount ingest API and speak about importing into your
"ledger": are you using the beancount ingest API and then produce
ledger entries, or do you use beancount and this should have posted to
the beancount list?

Several aspects you describe (e.g. downloading statements) apply
equally to ledger and beancount but others (creating entries) are
quite different.
--
Martin Michlmayr
https://www.cyrius.com/

Daryl Manning

unread,
Feb 4, 2022, 11:31:38 PM2/4/22
to Ledger
Marvin,

I used ledger-cli (not beancount) but I get almost completely automated categorization via Reckon (which is a bayesian predictor of current category from past categories). Super handy and has reduced importing and categorizing workload a magnitude.
Blog post here. Author of the ruby gem very responsive and helpful. YMMV.


ciao !
Daryl.
Reply all
Reply to author
Forward
0 new messages