Deduplication & merging questions

103 views
Skip to first unread message

Eric Altendorf

unread,
Jul 17, 2023, 6:22:21 PM7/17/23
to bean...@googlegroups.com
I see hooks for dup deduction on importers, but the doc comments don't make it clear how those functions are used.  Poking around the code, it appears that these are only run to dedup items within a single import.

Is there any functionality to automatically match up legs of a transaction that come from different importers, e.g., a transfer from one account to another?

thanks,
ericc

Einar S. Idsø

unread,
Jul 17, 2023, 8:10:12 PM7/17/23
to bean...@googlegroups.com
Eric,

’ve successfully used 
https://github.com/jbms/beancount-import to deduplicate from multiple files.

I’ve also seen the issues you describe in your other posts, and they look very familiar to what I worked on a couple of years ago. My solution is not great, but it’s good enough for my porposes which are to keep track of my crypto and get the numbers needed for tax reports.

I am currently in the middle of a long vacation, no PC, so cannot assist any further for the next couple of weeks, unfortunately. Happy to collaborate with you if you can wait. Or if you find a solution before I am back, then that’s great too.

Some key points:
- crypto
- HIFO
- beancount v2
- multiple sources w/importers
- timestamp w/timezone for sorting
- deduplication

Cheers,
Einar

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAFXPr0vp0MtWcDr0ueMuvXFREO5G1oTGC0UvD%2BRXD%2BdbLG9D%3Dg%40mail.gmail.com.

Red S

unread,
Jul 17, 2023, 9:01:34 PM7/17/23
to Beancount

Eric Altendorf

unread,
Jul 17, 2023, 10:07:22 PM7/17/23
to bean...@googlegroups.com
Thanks both, these look promising, I'll investigate.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 30, 2023, 11:40:57 AM7/30/23
to bean...@googlegroups.com
One of the things that was never done is to specify deduplicating by import source, which would make a lot of sense.
Some institutions, e.g. Amex, have really nice unique ids on each transaction that can be used to dedup exactly (if preserved).  Some don't.  The heuristics I'm using today are imperfect... this needs improvement.


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Marvin Ritter

unread,
Aug 5, 2023, 9:34:21 AM8/5/23
to bean...@googlegroups.com
I have a little plugin (attached) that looks for transactions between my accounts and marks similar ones as duplicates (and removes one). It can handle some small differences and has done a surprisingly good job for me.

The plugin works better when you keep separate bean files per Asset account (which I do).
For avoiding re-importing transactions my Importers look for the last transaction in the corresponding bean file and only add newer transactions. This assumes that transactions don't change after their date. Occasionally this leads to errors if companies (e.g. car rental, hotel) blocked money on my debit card, I run an import and then they release it. But overall it works well. For a while I tried writing heuristics to distinguish my manual changes (which I want to keep) and upstream changes. Even having unique IDs didn't make this easy.


dedup.py

Red S

unread,
Aug 5, 2023, 10:40:37 PM8/5/23
to Beancount
Basically very similar to what zerosum does, except zerosum generalizes it and is configurable. It also uses an intermediate account which can be beneficial.
Reply all
Reply to author
Forward
0 new messages