beangulp's duplicate detection, thoughts on how to make more customisable?

45 views
Skip to first unread message

Justus Pendleton

unread,
Apr 3, 2025, 5:48:17 PM4/3/25
to Beancount
I've run into two small annoyances with the default duplicate detection in beangulp and wanted to see if anyone had suggestions on changing the API to make it a bit more amenable to easy customisation by importers.

Basically I'm looking for a way to fine-tune control the duplicate detection based on the payee.

Scenario 1:

You buy coffee every day. It's always the same payee and always the same amount so, naturally, beangulp will flag them all as duplicates. Maybe turn off duplicate detection entirely for this payee? Or set the window to 0 days (i.e. only detect duplicates for same day transactions)?

Scenario 2:

Aldi, for some reason, takes FOREVER to fully post to my bank account. Like 4-5 days, so they always fall outside the sliding window. So for this I'd want to set the window to 5 days.

In both cases it feels like I want some knob to say "for THIS payee" use a bigger or smaller sliding window. But that's not how extract.mark_duplicate_entries expects to work at all. I could implement a hackish wrapper around similar.heuristic_comparator but not sure that's really the right place for this to live either.

Cheers,
Justus
Reply all
Reply to author
Forward
0 new messages