Branching transaction import logic (i.e. partially automating imports)

118 views
Skip to first unread message

Danny

unread,
Feb 9, 2024, 8:46:03 PM2/9/24
to Beancount
Background

I've been using beancount for a few years now. I just have a couple credits cards and a bank account, nothing especially complex, but I feel secure knowing I have a registry of where all my money has gone. Also the process of getting transactions into beancount is my check on spending, letting me notice anything suspicious.

However, its just way too labor intensive. I already use beancount-import, but still get bogged down in hundreds of $2.90 subway payments, the grocery store, and sandwiches from the same handful of places.

What I'm Looking For

I need a less time-consuming workflow. I discovered Red's five minute ledger, and agree completely with the philosophy. However I think I need a way to separate transactions from any given account into two separate streams.

To better illustrate, this is my ideal pipeline:
  1. Download transactions manually or automatically where possible (csv and ofx)
  2. Run code that has a set of predefined expense category rules (e.g. amazon automatically to a zero-sum category, grocery store below certain dollar value)
  3. Separate the categorized transactions and pass the remaining ones to beancount-import
  4. Write everything to the ledger like normal
I haven't found any examples of branching the transaction pipeline like this, so my question is whether its even plausible within the framework of beancount importers. My back up plan is to write a more or less hardcoded script that will do it all, but I'm hoping for a more flexible approach!

Red S

unread,
Feb 10, 2024, 9:35:09 AM2/10/24
to Beancount

I'm not familiar with beancount-importer, but this should work out of the box if you use smart_importer, with nothing for you to really do. As it says:

smart_importer only works by appending onto incomplete single-legged postings

So in the importer you write, simply leave the ones you want uncategorized with no further action, and smart_importer will categorize them, and leave the other ones untouched.

Trivial to do with beancount_reds_importers too: simply override this method to return either your pre-defined postings, or None if you want it to be auto-categorized by smart_importer.

Red S

unread,
Feb 10, 2024, 9:43:39 AM2/10/24
to Beancount
Thinking about your question further: I find that when I import transactions, credit cards are the only ones that deal with expenses and need categorization. I find that smart_importer gets almost all of them right. I do skim through the import as a part of my workflow, but it takes less than a minute to rapidly look through a month's data (say, a 100 transactions), and find the occasional one that needs to be fixed.

Yet another way to address this problem is to glance at my monthly expenditure categories in a report (I use Fava), where big miscategorizations (a rare occurrence) stand out fairly obviously.

Chary Chary

unread,
Feb 10, 2024, 10:45:23 AM2/10/24
to Beancount
I still maintain most of my records in Excel file where transactions also gets categorized (mostly automated via VBA macro, based on regular expressions).
I started with Excel, already had 15 years worth of data in excel and I am still comfortable there.
Another reason is that Excel allows for massive recategorization by simply filtering correct rows and pulling the category down.

However then I have automations which converts Excel files to bean files. 

I then have a master bean file, which links them all together via include statement. I then import all the records in pandas dataframes, check for errors and reconcile. This is done in Jupyter notebook

So, overall my process looks like this
Beanpand (2).png

Martin Blais

unread,
Feb 10, 2024, 3:03:05 PM2/10/24
to bean...@googlegroups.com
Here's how I do it (from my import script):



def process_entry(entry):
    ... do something...
    return entry


def process_extracted_entries(extracted_entries_list, ledger_entries):
    """Filter the extracted entries to save on time."""
    return [(filename, [process_entry(entry) for entry in entries], account, importer)
            for filename, entries, account, importer in extracted_entries_list]


if __name__ == '__main__':
    main = Ingest(importers, hooks = [process_extracted_entries])
    main()



--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/1d463106-57c7-447c-b374-087ab60943ddn%40googlegroups.com.

Daniel Farnand

unread,
Feb 10, 2024, 6:58:02 PM2/10/24
to bean...@googlegroups.com
Thank you all for super helpful answers!

I hadn't realized that smart importer wouldn't touch already categorized transactions (obvious in retrospect though!). Thinking about it with that in mind, I'm going to give it a try: import -> rules -> smart_importer. From there I'll pass everything to some other code to show aggregate info and highlight anomalies (probably ad hoc after the import for now, though I'll look into maybe adding it as a second hook with code like what Martin showed).

Red - do you know if there's a way to have smart-importer mark (by tag or metadata) the transactions that it updates? If not I can probably just mark the ones categorized via rules, i.e. those not processed by smart import!

You received this message because you are subscribed to a topic in the Google Groups "Beancount" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beancount/OxWzJz46yvo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhOgxJu8-83Nb%3DfNSQp-7VfC0gsdX8d-%2BxL%3D6jJBOz%2B_1Q%40mail.gmail.com.

Red S

unread,
Feb 10, 2024, 10:52:42 PM2/10/24
to Beancount
Red - do you know if there's a way to have smart-importer mark (by tag or metadata) the transactions that it updates? If not I can probably just mark the ones categorized via rules, i.e. those not processed by smart import!

I don't know unfortunately, but doing what you said (adding metadata to the ones you categorized) would  be trivial, and that's the way I'd go.
Reply all
Reply to author
Forward
0 new messages