Auto assign to account / rename payees

Florian Lindner

unread,

Apr 27, 2019, 6:28:34 PM4/27/19

to Beancount

Hello,

I am new into beancount / ledger and currently think about how to do my importing. I have written an importer for the csv statements from my bank. Two question I have:

+ I would like to automatically rename the payee of some frequently occurring transaction, such as shopping groceries and assign them to accounts. Is there a canonical way do that or should I just hack it into the importer?

+ How can I detect duplicates when I try to import the same transaction twice?

Thanks,

Florian

Martin Blais

unread,

Apr 27, 2019, 7:44:21 PM4/27/19

to Beancount

On Sat, Apr 27, 2019 at 6:28 PM Florian Lindner <mailin...@xgm.de> wrote:

Hello,

I am new into beancount / ledger and currently think about how to do my importing. I have written an importer for the csv statements from my bank. Two question I have:

+ I would like to automatically rename the payee of some frequently occurring transaction, such as shopping groceries and assign them to accounts. Is there a canonical way do that or should I just hack it into the importer?

You should built it into your importer.

This task is simple enough there's really no need for the library to provide common code to do that.

You can roll your own.

+ How can I detect duplicates when I try to import the same transaction twice?

That's a more difficult question.

You should implement your own duplication detection code.

It's not entirely obvious how to do this for everybody; the definition of what's a duplicate depends on how much you manually massage your transactions.

I haven't really tried very hard to generalize this well, so it's best you define your own code for that.

Generally speaking, the ingestion framework is minimal and more or less a DIY affair.

It tries to provide a basic set of tools but it's certainly not a full solution, some coding is certainly needed.

I don't know what a "full" solution might look like, there's too much variation to attempt to cover everyone's needs (IMHO).

Thanks,

Florian

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/42c62fc8-64b9-48c6-a5b1-ca135532059d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Lindner

unread,

Apr 28, 2019, 5:22:49 AM4/28/19

to Beancount

Am Sonntag, 28. April 2019, 01:44:07 CEST schrieb Martin Blais:

> On Sat, Apr 27, 2019 at 6:28 PM Florian Lindner <mailin...@xgm.de> wrote:

>

> > Hello,

> >

> > I am new into beancount / ledger and currently think about how to do my

> > importing. I have written an importer for the csv statements from my bank.

> > Two question I have:

> >

> > + I would like to automatically rename the payee of some frequently

> > occurring transaction, such as shopping groceries and assign them to

> > accounts. Is there a canonical way do that or should I just hack it into

> > the importer?

> >

>

> You should built it into your importer.

> This task is simple enough there's really no need for the library to

> provide common code to do that.

> You can roll your own.

Ok, you're right, that's easy.

> + How can I detect duplicates when I try to import the same transaction

> > twice?

> >

>

> That's a more difficult question.

> You should implement your own duplication detection code.

> It's not entirely obvious how to do this for everybody; the definition of

> what's a duplicate depends on how much you manually massage your

> transactions.

> I haven't really tried very hard to generalize this well, so it's best you

> define your own code for that.

Some brainstorming:

+ When beancount/fava talk about duplicates, it seems that it mostly refers to duplicate transactions created by transferring from credit card to checkings and import statements for both.

+ Save the original CSV line as metadata "source-line:". Alternatively, build some unique tuple of (original payee, date, amount) and save that as meta data. For each entry to import, query beancount for an with matching metadata. Ledger does it like that when --rich-data is given. It computes a hash (called UUID) from the input line. Is there a distinct name of that metadata field you suggest? Fava mentions a __source__ key, but that seems to be removed before commiting (https://github.com/beancount/fava/blob/master/fava/help/import.md).

+ Using the payee from beancount is not a good idea, as it usually has been modified manually.

What are your thoughts?

Best,

Florian

Patrick Ruckstuhl

unread,

Apr 28, 2019, 11:12:32 AM4/28/19

to bean...@googlegroups.com

Hi Florian,

On 28.04.2019 11:22, Florian Lindner wrote:

Am Sonntag, 28. April 2019, 01:44:07 CEST schrieb Martin Blais:

> On Sat, Apr 27, 2019 at 6:28 PM Florian Lindner <mailin...@xgm.de> wrote:

>

> > Hello,

> >

> >

> > I am new into beancount / ledger and currently think about how to do my

> > importing. I have written an importer for the csv statements from my bank.

> > Two question I have:

> >

> >

> > + I would like to automatically rename the payee of some frequently

> > occurring transaction, such as shopping groceries and assign them to

> > accounts. Is there a canonical way do that or should I just hack it into

> > the importer?

> >

>

> You should built it into your importer.

> This task is simple enough there's really no need for the library to

> provide common code to do that.

> You can roll your own.

Ok, you're right, that's easy.

You might also want to have a look at smart importer

https://github.com/beancount/smart_importer

This has some machine learning based approaches to automatically set payees and accounts

> + How can I detect duplicates when I try to import the same transaction

> > twice?

> >

>

> That's a more difficult question.

> You should implement your own duplication detection code.

> It's not entirely obvious how to do this for everybody; the definition of

> what's a duplicate depends on how much you manually massage your

> transactions.

> I haven't really tried very hard to generalize this well, so it's best you

> define your own code for that.

Some brainstorming:

+ When beancount/fava talk about duplicates, it seems that it mostly refers to duplicate transactions created by transferring from credit card to checkings and import statements for both.

+ Save the original CSV line as metadata "source-line:". Alternatively, build some unique tuple of (original payee, date, amount) and save that as meta data. For each entry to import, query beancount for an with matching metadata. Ledger does it like that when --rich-data is given. It computes a hash (called UUID) from the input line. Is there a distinct name of that metadata field you suggest? Fava mentions a __source__ key, but that seems to be removed before commiting (https://github.com/beancount/fava/blob/master/fava/help/import.md).

+ Using the payee from beancount is not a good idea, as it usually has been modified manually.

What are your thoughts?

There's actually some infrastructure around for this in core beancount and some more with the smart_importer

https://github.com/beancount/smart_importer/blob/master/smart_importer/detector.py

DuplicateDetector will set the correct __duplicate__ metadata based on a specified matching algorithm

apply_hooks(MyImporter(), [PredictPostings(), DuplicateDetector()]),

The default algorithm compares stuff like amount, accounts and dates but you can also customize it. e.g. I have cases where I actually get a reference number and want to use that one, I store the reference number into the meta as 'ref'

class ReferenceDuplicatesComparator: def __call__(self, entry1, entry2): return 'ref' in entry1.meta and 'ref' in entry2.meta and entry1.meta['ref'] == entry2.meta['ref']

apply_hooks(MyImporter(), [PredictPostings(), DuplicateDetector(comparator=ReferenceDuplicatesComparator())]),

Regards,

Patrick

Zhuoyun Wei

unread,

Apr 30, 2019, 9:38:28 AM4/30/19

to Florian Lindner, Beancount

On Sat, Apr 27, 2019, at 18:28, Florian Lindner wrote:
> + How can I detect duplicates when I try to import the same transaction twice?

You should try zerosum plugin [1], possibly with my patch [2].

I used to do de-dup by add `__duplicate__` meta key to the transaction by matching narrations when importing. However, zerosum plugin provides a much simpler way: Just posting each duplicating transaction to a "ZeroSumAccount", and the plugin will match them for you. This way, you do not need to do de-dup at all.

Another good thing about using zerosum is that you do not have to "smudge" the date of one transaction while doing non-instantaneous transactions (e.g. from your checking account to your credit card).

[1] https://github.com/redstreet/beancount_plugins_redstreet/tree/master/zerosum
[2] https://github.com/redstreet/beancount_plugins_redstreet/pull/2

--
Zhuoyun Wei

Reply all

Reply to author

Forward