Duplicate matching for importers

168 views
Skip to first unread message

Shane Koster

unread,
Jun 29, 2016, 9:21:20 AM6/29/16
to Beancount
Hi Martin,

I'm working on creating some importers and I've run into an issue with imported transactions not matching existing transactions. I think I know what the problem is and I'm wondering if perhaps I'm missing something or if it might be a bug/oversight.

The issue is that my current transactions look something like this:

2016-05-15 * "In-n-out Burger"
   
Expenses:Dining-Entertainment                              12.56 USD
   
Liabilities:Chase:CreditCard



When I run my importer, it doesn't match this transaction due to the fact that the second posting is "automatic". I found the relevant code in similar.py:

def amounts_map(entry):
   
...
   
for posting in entry.postings:

       
# Skip interpolated postings.
       
if posting.meta and interpolate.AUTOMATIC_META in posting.meta:
           
continue
   
...


Should it really be ignoring the automatic postings in this case? I've tested it and if I add the amount to the second posting, it will see it as a duplicate as expected. The syntax doesn't require amounts on all legs of the transactions, right?


Shane

Jason Chu

unread,
Jun 29, 2016, 10:37:35 AM6/29/16
to Beancount
This may actually explain something that I've noticed recently also.  I was pretty sure transfers from one account to another were matched when I imported from the second account (exactly the situation you described except two Asset accounts).  I'm fairly certain this result is actually a regression vs. how it used to work.

I don't write the beancount, but it seems like this isn't an ideal situation.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/d5229535-8558-4fef-ba65-1682c9a56cf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Blais

unread,
Jun 29, 2016, 7:48:08 PM6/29/16
to Beancount
The similarity heuristic is definitely just a quickly cooked heuristic.
I often see it fail myself.
Needs improvement.
Maybe I should make it overrideable.

(Note: If you keep your imported sources and have a way to create a 1:1 correspondence between the converted and edited (final) transaction, that's a problem that would be suitable for a classification problem eventually.)


--

Martin Blais

unread,
Jun 29, 2016, 8:56:18 PM6/29/16
to Beancount
I think the original rationale for skipping the automatic posting is that only the user would insert automatic postings.
This similarity function is bad, I need to improve it with something more robust eventually. It's annoying me too.

Feel free to send me a patch with a better idea. I need to be diligent about starting to accumulate real test cases for this.

This classification is like spam: commenting out a new transaction is very bad, but failing to detect a duplicate isn't a big deal, especially if it comes before the last duplicated transaction (because when you look at the output and it's in dated order it'll be obvious that it was misclassified.)



On Wed, Jun 29, 2016 at 1:21 PM, Shane Koster <shane....@gmail.com> wrote:

--

Shane Koster

unread,
Jun 30, 2016, 10:37:26 AM6/30/16
to Beancount
Thanks for the input Martin. I'll play around with it a bit more and see what I can come up with. At least now I know that I'm not missing something obvious.

Shane
Reply all
Reply to author
Forward
0 new messages