Are there existing_entries for plugins, like with importers?

78 views
Skip to first unread message

John Koala

unread,
Sep 12, 2022, 7:11:39 AM9/12/22
to Beancount
Hi,

Yes, sorry, in the context of V2 still I'm afraid...

Perhaps you already know of a "fuzzy string matcher" for transaction narrations/payees?

I didn't have much luck with "smart_importer" and decided the scipy/numpy/etc dependency was a PITA so am (or was) thinking to knock up a plugin to complete my imported transactions.

Is a plugin the correct idea?

I noted that the importer is provided with an `existing_entries` list of transactions, which seems a very useful suite of items to match against.  But can I reach that from the plugin?

Where/how? and is that even a good idea?  (its not going to re-read the entire history for every imported transaction is it? Hmm, I'd tolerate that nonetheless :-))  I'm assuming a `beancount.loader.load_file` inside the plugin would create some recursive sillyness?

Thanks.

Daniele Nicolodi

unread,
Sep 13, 2022, 6:43:25 PM9/13/22
to bean...@googlegroups.com
On 12/09/2022 11:54, John Koala wrote:
> Hi,
>
> Yes, sorry, in the context of V2 still I'm afraid...
>
> Perhaps you already know of a "fuzzy string matcher" for transaction
> narrations/payees?

I am not sure I grok the question: how could a fuzzy string matcher be
specialized for transaction narrations or payees?

> I didn't have much luck with "smart_importer" and decided the
> scipy/numpy/etc dependency was a PITA so am (or was) thinking to knock
> up a plugin to complete my imported transactions.

Fuzzy matching strings is not all there is to write a machine learning
classifiers. I think that 'pip install scikit-learn' is immensely easier
than rolling your own algorithms.

Maybe if you provide more details on how smart_importer does not work
for you, someone can help you in making it work.

> Is a plugin the correct idea?

I don' think so.

A plugin operates on the transactions read from a ledger after beancount
after booking (the process for which all the postings in all the
transactions are balanced, padding amounts are calculated, lots are
computed, etc...). The transactions processed in this phase already need
to have all postings completed.

Also, a plugin does not have a way to serialize the completed
transactions into a ledger. Unless you hack something together, your
plugin would run every time you load your ledger and will have to do its
job again. This would make fixing any mistake the automatic
categorization algorithms does rather cumbersome.

Why do you thing a plugin is a better approach?

> I noted that the importer is provided with an `existing_entries` list of
> transactions, which seems a very useful suite of items to match
> against.  But can I reach that from the plugin?

That what? A plugin as access to all the transactions in the ledger on
which Beancount is operating. In this context there isn't the notion of
another ledger to which a batch of transactions will be added to.

> Where/how? and is that even a good idea?  (its not going to re-read the
> entire history for every imported transaction is it? Hmm, I'd tolerate
> that nonetheless :-))  I'm assuming a `beancount.loader.load_file`
> inside the plugin would create some recursive sillyness?

The beancount parser and loader are capable of loading more than one
file in the same process. However, there is no protection from a plugin
that recursively tries to load the Beancount ledger from which it has
been invoked. If you want to try the ledger filename is available as the
"filename" entry to the "options_map" passed to the plugin entry point.

The way I would approach this, if you want a solution independent from
the import framework, is to use beancount.parser.parse_file() to parse
the transactions from a ledger, use the technique you like the most to
complete or rewrite the transactions, and write them back with
beancount.parser.printer.print_entries().

Cheers,
Dan

John Koala

unread,
Sep 15, 2022, 7:24:26 AM9/15/22
to Beancount
Thanks, Dan, for answering.

WRT the question in the subject line:
```
filename = Path(__file__).parent.parent / "master.beancount"
entries, errors, options = loader.load_file(filename)
```
gives me all the entries,  whereas `parser.parse_file` does not.
Dunno why.  It was a bit of a rabbit-hole looking down parse_file.
Maybe its the includes. Wtever. Going with load_file for now.

FWIW plugins:

Attempting it did indeed lead to a recursive loop to nowhere. 
Didn't think it a better approach; just not clear what is for what.

So, it seems I am looking for a script.  In the past I've built scripts which generate beancount compliant
text, but am ready to start using the beancount library, if only I could find my way in.
The html docs certainly reduce "friction" vs. the google docs-oddly-it seems so minor-but-there it is.
The ecosystem is all so...um... googly :-/ no offense intended.

WRT smart_importer:

There was some trauma due to a bug in scipy:
https://github.com/beancount/smart_importer/issues/116
Got past that, got it all to run and... nothing.  :shrugs:  Life is too short. 
That alone sucked up my entire afternoon to "get the accounts done".

WRT ML vs "fuzzy"

For a csv/json import from a bank, the text inputs of "payee" and "narration" are all it has to go on. 
Ok, maybe the ML can glean a little extra weight from the value but, when it won't install...
'overkill' comes to mind.
Previously I've done a regex thing on the payee and that worked well enough for my purposes, but I got a new bank (new importer) and ideas!
Fuzzy matching the strings with old entries seemed like a natural/incremental progression over regex.


Reply all
Reply to author
Forward
0 new messages