Auto-Categorizing Transactions

4,088 views
Skip to first unread message

Andy L

unread,
Oct 19, 2018, 10:34:30 PM10/19/18
to Ledger
I'm downloading CSV transaction data and converting to ledger format.

Now I'm wondering what is the best way to auto-categorize transactions.  I don't want to manually assign payee accounts, tags and comments.

I'm thinking of using a rules engine.  (see http://www.ruleby.org)  The categorization rules would be applied during the CSV-to-Ledger generation.  The idea of using a rules engine is inspired by Tiller AutoCat (see https://www.tillerhq.com/autocat/)

I'm curious to know how others are auto-categorizing transactions.  What tools and techniques do you use??

Dmitry Kakurin

unread,
Oct 20, 2018, 5:14:23 PM10/20/18
to ledge...@googlegroups.com
Hi Andy,
It does both things you are looking for: downloads transactions and automatically classifies them.
The hardest (and least automated) task right now is creating config file with info about your bank accounts. You need to rename sample.direct2ledger.yaml to direct2ledger.yaml and fill in the fields as instructed by comments.
But once it’s done it is a breeze to use. Auto classification works so well for me that I rarely even do corrections, so it’s pretty much 100% automated.

I went thru a similar path as you starting on :-):
1. First I was downloading CSVs from all my banks and using into-ledger to do auto-classification.
2. Downloading per-account was a chore, so I’ve configured Mint.com to aggregate all my accounts, downloaded a single CSV from mint and wrote mint2ledger tool to import and classify.
3. Realized that all by banks support OFX export and wrote direct2ledger tool that does everything automatically now.

P.S. Both into-ledger and mint2ledger tools are also available on https://github.com/Dimagog.

Thank you, Dmitry Kakurin.

From: ledge...@googlegroups.com on behalf of Andy L <an...@r210.com>
Sent: Friday, October 19, 2018 19:34
To: Ledger
Subject: Auto-Categorizing Transactions
 
--

---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email toledger-cli+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jungle Boogie

unread,
Oct 20, 2018, 5:30:39 PM10/20/18
to ledge...@googlegroups.com
Hi,

How would this work if I buy groceries and snacks at the grocery store and I want to have these in their respective category? 

What kind of auto categorizing do you do? 

Dmitry Kakurin

unread,
Oct 20, 2018, 11:00:45 PM10/20/18
to ledge...@googlegroups.com
If it's a single transaction, then obviously it cannot do it automatically. You'll have to split it in your ledger file manually.
However, if it's 2 payments, then it would try to guess based on the amount (I do some feature engineering for price classes).

By default, it uses Naive Bayesian classification, but you can use a switch to use TF-IDF instead.
TF-IDF tends to work better when you are just starting and ledger is small.

- Dmitry

Andy L

unread,
Oct 21, 2018, 4:44:03 PM10/21/18
to Ledger
Dmitry - Many thanks for your great examples. 

I'll follow your lead and use classifiers instead of rules engines. 

I did some experiments using Bayes and TfIdf classifier and had encouraging results.

Here is the classifier library I used: https://github.com/alexandru/stuff-classifier

I also like your CLI approach & will copy it.  In ruby, there's a nice tty toolkit (https://github.com/piotrmurach/tty) that has components (tty-box, tty-cursor, tty-screen) for building UI features like the ones you used in `into-ledger`.

Thanks again, Andy

John Wiegley

unread,
Oct 24, 2018, 1:18:31 AM10/24/18
to Andy L, Ledger
>>>>> "AL" == Andy L <an...@r210.com> writes:

AL> I'm downloading CSV transaction data and converting to ledger format. Now
AL> I'm wondering what is the best way to auto-categorize transactions.  I
AL> don't want to manually assign payee accounts, tags and comments.

There's also a 'ledger convert <file.csv>' command...

John

Brett Wyrick

unread,
Dec 13, 2018, 5:29:14 PM12/13/18
to Ledger
You can do all of this within Google Sheets, using two different tabs.

One tab will be for your keywords that will match to Categories.
  • The AutoCategories tab will have two columns: Name one range "AutoKeywords" and the other column "AutoCats" (short for Auto-Category).
  • In the "AutoKeywords" column, put "7-11". In the "AutoCats" column, put "Auto Fuel".
Other tab will be for Transactions:
  • The Transactions tab will be a .CSV feed from your bank, etc. You'll likely have some sort of "Description" for the transaction (i.e. 7-11 ANAHEIM #103230).
  • Create an extra column to this feed named "Category", and add this formula:
=arrayformula(INDEX(AutoCats,MATCH(TRUE,ISNUMBER(SEARCH(AutoKeywords,$A1)),0)))
  • The formula above is assuming that the Transaction Description is in cell A1.
  • So, if the Transaction Description contains "7-11", the Category will automatically be "Auto Fuel".
This searches all the keywords and matches them with the "AutoCat".

Let me know if you've got any questions. 

Holly Hudson

unread,
Jun 17, 2019, 8:50:14 AM6/17/19
to Ledger
Hey, I really want to use your tool, Dmitry, but I can't get it to download from my bank.  ledger-autosync works, so it's not on the bank's side (unfortunately ledger-autosync ignores my tediously-entered payee fields for each account and doesn't classify anything, so it's not a solution for me -- not sure if yours does, but I can't get far enough to test it out).

My bank has no bankID listed on ofxhome, only an FID, and your code wants both:  http://www.ofxhome.com/index.php/institution/view/483
I've tried leaving it blank in the yaml, but direct2ledger complains that it's missing.  I've tried various other numbers from my working ofxclient.ini file, but all result in "bad request".  What do you do if there's no bank ID?

This bank doesn't offer csv downloads, which is pretty terrible, so I'm stuck trying to make ofx work.

My other option is rolling up my sleeves and trying to figure out how to make ledger-autosync use the payee fields.  I'm not the most advanced programmer so that would be a long slog for me, so I'm trying to just get *something* to work without having to edit code.
To unsubscribe from this group and stop receiving emails from it, send an email toledge...@googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages