Automatically flagging expenses of interest

170 views
Skip to first unread message

Red S

unread,
Jan 19, 2024, 5:45:35 AM1/19/24
to Beancount
I'm curious, has anyone setup Beancount scripts or reports to flag expenses that might need further attention? The situation that made me think about this is a quarterly bill that doubled multiple times after years of being stable, which is an obvious red flag.

Unlike in the past, virtually of my Beancount interactions are highly automated, which combined with the fact that time is at a premium these days, causes me to miss details like this.

In this particular case, a rule to flag expenses that deviate from their norm over a certain time period (monthly, annually) might be simple to write, but I was wondering for a more general, perhaps fancier solution that would learn to distinguish what's normal and call attention to what's not, as rules based solutions tend to be incomplete and require constant fiddling.

Chary Chary

unread,
Jan 23, 2024, 11:08:31 AM1/23/24
to Beancount
Sounds like a good opportunity for deep learning classification problem.

Red S

unread,
Jan 24, 2024, 12:24:16 AM1/24/24
to Beancount
Definitely! That's what I had in mind. Would you or others on this list have experience in how to frame the problem from a deep learning classification problem, what tools/libraries to use, and such? Pointers appreciated.

Yichu Zhou

unread,
Jan 24, 2024, 1:05:43 PM1/24/24
to bean...@googlegroups.com
There is a kind of machine learning problem called outlier detection. I think sciki-learn library is a good starting point if we want to use ML techniques. But in our case, I feel the definition of “abnormal” varies on different personal situations. It might be tricky to formulate the problem properly. 

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/58ec4e60-530f-4055-990b-03e374251a29n%40googlegroups.com.

Eric Altendorf

unread,
Jan 24, 2024, 1:09:54 PM1/24/24
to bean...@googlegroups.com
This would probably be more useful if users can provide their own examples of abnormal and normal expenses.  In that case, the model itself is probably not very difficult; I imagine a variety of off the shelf toolkits would work.  To me, the harder part seems like making the workflow smooth and robust -- deciding how users would flag outliers, run the classifier, correct misclassifications, cause retraining to happen, etc.

Red S

unread,
Jan 24, 2024, 4:34:07 PM1/24/24
to Beancount

Very interesting, thank you all. I wonder if a single user's journal would suffice for a learning dataset in this case. For me, expenses across categories of interest are those have been stable for years. Plus, I’m willing to deal with false positives (but preferably not false negatives).

There is a kind of machine learning problem called outlier detection. I think sciki-learn library is a good starting point

Excellent, thank you for the helpful pointers! A quick search brought up these, which I’ve noted down to look into when I have time(TM):
https://scikit-learn.org/stable/modules/outlier_detection.html
https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_outlier_detection.html

Timothy Jesionowski

unread,
Jan 24, 2024, 7:55:29 PM1/24/24
to bean...@googlegroups.com
Maybe take a page from fraud detection and try to annotate each transaction with the hour of the day, day of the week, day of the month, and accumulated transaction volume/value when the transaction occurred. Even throw in a biweekly time period if you're feeling fancy? The best thing would be to include the location if you have it, but I don't think you do.


Sincerely,
Timothy Jesionowski


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Red S

unread,
Jan 27, 2024, 9:42:39 PM1/27/24
to Beancount
Interesting future idea. My current goal is to simply find outliers by value. Eg: if my electricity bill is twice what it normally is, to flag that. 

But as you observed, fraud detection might be possible by simply grabbing all data that an institution provides, stuffing it into metadata, and exposing it to the learning algorithm.

Reply all
Reply to author
Forward
0 new messages