Issue #257: Create a section classifier for new transactions (blais/beancount)

54 views
Skip to first unread message

Martin Blais

unread,
Mar 26, 2018, 4:39:48 PM3/26/18
to bean...@googlegroups.com
New issue 257: Create a section classifier for new transactions
https://bitbucket.org/blais/beancount/issues/257/create-a-section-classifier-for-new

Martin Blais:

There's nothing like that in the Beancount codebase. I've thought about building something to automatically insert imported transactions in the right "section" (I personally use org-mode, where each section corresponds to an institution and its related group of accounts) but it's unclear whether that would generalize.

I think you could turn this into a simple classification problem. Given some syntax for splitting up an input file into sections (e.g., some regular expression matching on a title or separator), you now have groups of transactions and inputs. Somehow reduce this to a simple model for classifying which section an incoming transaction matches with highest probability and insert it there. Or more appropriately - since transactions are imported in groups - find the section that best matches all the transactions in the imported files and insert at the end there.


On Sun, Mar 11, 2018 at 12:28 PM, Michael Droogleever <droo...@gmail.com> wrote:
I believe it is against the design of beancount, but is there any existing code which attempts to add transactions to an existing beancount file. Assuming the entries in the file are grouped by asset account, it would need to append the entry to the subsection of entries all from the same account.

Responsible: blais
Message has been deleted

Martin Blais

unread,
Mar 27, 2018, 10:23:15 PM3/27/18
to Beancount
I could promote the custom entry as a native one.
Let's call it "marker".
Would that be useful?


On Tue, Mar 27, 2018 at 2:04 AM, <dominik.aumayr...@gmail.com> wrote:
Fava has such a feature (and code), see the "insert-entry"-option here: https://fava.pythonanywhere.com/example-with-budgets/help/options/

Basically it works like this: You can add custom "insert-entry" entries with a RegEx, and when adding a transaction, the position where it should go in the file is determined by those custom entries.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/4694718a-6afc-4346-80de-c7c78bd87de6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dominik Aumayr

unread,
Mar 30, 2018, 8:22:27 AM3/30/18
to keesjochem via Beancount
I think that could be very useful. I think there needs to be some discussion about the syntax before you start implementing.

Would it look like this?

```beancount

2018-03-30 * "Hooli Shop" "Buy hoodie"
Expenses:Fashion 10.00 USD
Assets:Cash

marker Assets:US:*

2018-03-31 * "Walgreens" "Groceries"
Expenses:Groceries 45.00 USD
Assets:US:BoA:Checking

```


> Am 28.03.2018 um 04:22 schrieb Martin Blais <bl...@furius.ca>:
>
> I could promote the custom entry as a native one.
> Let's call it "marker".
> Would that be useful?
>
>
> On Tue, Mar 27, 2018 at 2:04 AM, <dominik.aumayr...@gmail.com> wrote:
> Fava has such a feature (and code), see the "insert-entry"-option here: https://fava.pythonanywhere.com/example-with-budgets/help/options/
>
> Basically it works like this: You can add custom "insert-entry" entries with a RegEx, and when adding a transaction, the position where it should go in the file is determined by those custom entries.
>
> Am Montag, 26. März 2018 22:39:48 UTC+2 schrieb Martin Blais:
> New issue 257: Create a section classifier for new transactions
> https://bitbucket.org/blais/beancount/issues/257/create-a-section-classifier-for-new
>
> Martin Blais:
>
> There's nothing like that in the Beancount codebase. I've thought about building something to automatically insert imported transactions in the right "section" (I personally use org-mode, where each section corresponds to an institution and its related group of accounts) but it's unclear whether that would generalize.
>
> I think you could turn this into a simple classification problem. Given some syntax for splitting up an input file into sections (e.g., some regular expression matching on a title or separator), you now have groups of transactions and inputs. Somehow reduce this to a simple model for classifying which section an incoming transaction matches with highest probability and insert it there. Or more appropriately - since transactions are imported in groups - find the section that best matches all the transactions in the imported files and insert at the end there.
>
>
> On Sun, Mar 11, 2018 at 12:28 PM, Michael Droogleever <droo...@gmail.com> wrote:
> I believe it is against the design of beancount, but is there any existing code which attempts to add transactions to an existing beancount file. Assuming the entries in the file are grouped by asset account, it would need to append the entry to the subsection of entries all from the same account.
>
> Responsible: blais
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> To post to this group, send email to bean...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/4694718a-6afc-4346-80de-c7c78bd87de6%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> To post to this group, send email to bean...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhMKca%3DWb8Jv45k9tHWHZ%2Bv2vbsG14yXtq4RyC6M9tB1mg%40mail.gmail.com.

Martin Blais

unread,
Mar 31, 2018, 12:00:04 AM3/31/18
to Beancount
On Fri, Mar 30, 2018 at 8:22 AM, Dominik Aumayr <dom...@aumayr.name> wrote:
I think that could be very useful. I think there needs to be some discussion about the syntax before you start implementing.

Would it look like this?

```beancount

2018-03-30 * "Hooli Shop" "Buy hoodie"
   Expenses:Fashion    10.00 USD
   Assets:Cash

marker Assets:US:*

2018-03-31 * "Walgreens" "Groceries"
   Expenses:Groceries    45.00 USD
   Assets:US:BoA:Checking

```

Hmm, the few non-directive keywords I have at the moment don't have proper metadata (e.g. line number).
I was thinking of making it a directive instead.
e.g.

2018-03-30 marker "...any string..."

The date has no meaning (but it you can think of a useful meaning I can add one.)

Also, the account name that's there, how is it used? Would it not be more general to make it a general string?


 

> Am 28.03.2018 um 04:22 schrieb Martin Blais <bl...@furius.ca>:
>
> I could promote the custom entry as a native one.
> Let's call it "marker".
> Would that be useful?
>
>
> On Tue, Mar 27, 2018 at 2:04 AM, <dominik.aumayr.notifications@gmail.com> wrote:
> Fava has such a feature (and code), see the "insert-entry"-option here: https://fava.pythonanywhere.com/example-with-budgets/help/options/
>
> Basically it works like this: You can add custom "insert-entry" entries with a RegEx, and when adding a transaction, the position where it should go in the file is determined by those custom entries.
>
> Am Montag, 26. März 2018 22:39:48 UTC+2 schrieb Martin Blais:
> New issue 257: Create a section classifier for new transactions
> https://bitbucket.org/blais/beancount/issues/257/create-a-section-classifier-for-new
>
> Martin Blais:
>
> There's nothing like that in the Beancount codebase. I've thought about building something to automatically insert imported transactions in the right "section" (I personally use org-mode, where each section corresponds to an institution and its related group of accounts) but it's unclear whether that would generalize.
>
> I think you could turn this into a simple classification problem. Given some syntax for splitting up an input file into sections (e.g., some regular expression matching on a title or separator), you now have groups of transactions and inputs. Somehow reduce this to a simple model for classifying which section an incoming transaction matches with highest probability and insert it there. Or more appropriately - since transactions are imported in groups - find the section that best matches all the transactions in the imported files and insert at the end there.
>
>
> On Sun, Mar 11, 2018 at 12:28 PM, Michael Droogleever <droo...@gmail.com> wrote:
> I believe it is against the design of beancount, but is there any existing code which attempts to add transactions to an existing beancount file. Assuming the entries in the file are grouped by asset account, it would need to append the entry to the subsection of entries all from the same account.
>
> Responsible: blais
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

> To post to this group, send email to bean...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/4694718a-6afc-4346-80de-c7c78bd87de6%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

> To post to this group, send email to bean...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Dominik Aumayr

unread,
Mar 31, 2018, 2:21:25 AM3/31/18
to bean...@googlegroups.com
> Hmm, the few non-directive keywords I have at the moment don't have proper metadata (e.g. line number).
> I was thinking of making it a directive instead.
> The date has no meaning (but it you can think of a useful meaning I can add one.)

If at all possible, I would avoid the date, as I cannot come up with a useful meaning. Would it be possible to add metadata (line number) to the non-directive keywords?

> Also, the account name that's there, how is it used? Would it not be more general to make it a general string?

Yes, a string representing a wildcard-style pattern, eg "Assets:US:*", or even a string representing a RegEx (as it does in Fava).
> > To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> > To post to this group, send email to bean...@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/4694718a-6afc-4346-80de-c7c78bd87de6%40googlegroups.com.
> >
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Beancount" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> > To post to this group, send email to bean...@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhMKca%3DWb8Jv45k9tHWHZ%2Bv2vbsG14yXtq4RyC6M9tB1mg%40mail.gmail.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> To post to this group, send email to bean...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/B436BAF8-B56D-4FED-B9A1-F1B408AE6BD1%40aumayr.name.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> To post to this group, send email to bean...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhM-bVNW8N7hdC1qv9GBFui_wDuV8viUOzYZKuBJXxRiKA%40mail.gmail.com.

Patrick Ruckstuhl

unread,
Mar 31, 2018, 2:24:55 AM3/31/18
to Beancount
I think the date might actually have a meaning. E.g. if you split files by year you might have one marker in one file and another in the other and based on the date of the transaction it should go to the right one.

Martin Blais

unread,
Mar 31, 2018, 12:41:45 PM3/31/18
to Beancount
That's probably good enough. So the unique key for a marker would be (year, string).

The other design option would be for the non-dated directive to produce some sort of a map, and fail if the markers aren't unique.

Finally, there's always the option of creating a markerless method (the classification I suggested earlier).



On Sat, Mar 31, 2018 at 2:24 AM, 'Patrick Ruckstuhl' via Beancount <bean...@googlegroups.com> wrote:
I think the date might actually have a meaning. E.g. if you split files by year you might have one marker in one file and another in the other and based on the date of the transaction it should go to the right one.
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Patrick Ruckstuhl

unread,
Apr 3, 2018, 9:14:22 AM4/3/18
to bean...@googlegroups.com
I think the date might actually have a meaning. E.g. if you split files by year you might have one marker in one file and another in the other and based on the date of the transaction it should go to the right one.

Jakob Schnitzer

unread,
Apr 3, 2018, 9:14:22 AM4/3/18
to bean...@googlegroups.com
On Fri, Mar 30, 2018 at 11:59:40PM -0400, Martin Blais wrote:
>On Fri, Mar 30, 2018 at 8:22 AM, Dominik Aumayr <dom...@aumayr.name> wrote:
>
>> I think that could be very useful. I think there needs to be some
>> discussion about the syntax before you start implementing.
>>
>> Would it look like this?
>>
>> ```beancount
>>
>> 2018-03-30 * "Hooli Shop" "Buy hoodie"
>> Expenses:Fashion 10.00 USD
>> Assets:Cash
>>
>> marker Assets:US:*
>>
>> 2018-03-31 * "Walgreens" "Groceries"
>> Expenses:Groceries 45.00 USD
>> Assets:US:BoA:Checking
>>
>> ```
>
>
>Hmm, the few non-directive keywords I have at the moment don't have proper
>metadata (e.g. line number).
>I was thinking of making it a directive instead.
>e.g.
>
>2018-03-30 marker "...any string..."
>
>The date has no meaning (but it you can think of a useful meaning I can add
>one.)
>
>Also, the account name that's there, how is it used? Would it not be more
>general to make it a general string?

Why not just have them as options? Since the parser, which validates the
options, has access to the line number, it shouldn't be hard to store
the line numbers.

The dates are currently used (see below), but I think it would be better
without them - the current mechanism really only works well for the
"insert the latest entries at the end of a section" scenario. Inserting
earlier entries than some already in the section would create an
unordered mess.

This is the documentation for the current 'insert-entry' fava-option.

>This option can be used to specify where entries are inserted. The
>argument to this option should be a regular expression matching account
>names. This option can be given multiple times. When adding an entry,
>the account of the entry (for a transaction, the account of the last
>posting is used) is matched against all insert-entry options and the
>entry will be inserted before the datewise latest of the matching
>options. If the entry is a Transaction and no insert-entry option
>matches the account of the last posting the account of the second to
>last posting and so on will be tried. If no insert-entry option matches
>or none is given, the entry will be inserted at the end of the main
>file.

>
>
>
>>
>> > Am 28.03.2018 um 04:22 schrieb Martin Blais <bl...@furius.ca>:
>> >
>> > I could promote the custom entry as a native one.
>> > Let's call it "marker".
>> > Would that be useful?
>> >
>> >
>> > On Tue, Mar 27, 2018 at 2:04 AM, <dominik.aumayr...@gmail.com>
>> an email to beancount+...@googlegroups.com.
>> > To post to this group, send email to bean...@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/
>> msgid/beancount/4694718a-6afc-4346-80de-c7c78bd87de6%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "Beancount" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to beancount+...@googlegroups.com.
>> > To post to this group, send email to bean...@googlegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/
>> msgid/beancount/CAK21%2BhMKca%3DWb8Jv45k9tHWHZ%2Bv2vbsG14yXtq4RyC6M9tB1mg%
>> 40mail.gmail.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to beancount+...@googlegroups.com.
>> To post to this group, send email to bean...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/beancount/B436BAF8-B56D-4FED-B9A1-F1B408AE6BD1%40aumayr.name.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>--
>You received this message because you are subscribed to the Google Groups "Beancount" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
>To post to this group, send email to bean...@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhM-bVNW8N7hdC1qv9GBFui_wDuV8viUOzYZKuBJXxRiKA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages