On 12/09/2022 11:54, John Koala wrote:
> Hi,
>
> Yes, sorry, in the context of V2 still I'm afraid...
>
> Perhaps you already know of a "fuzzy string matcher" for transaction
> narrations/payees?
I am not sure I grok the question: how could a fuzzy string matcher be
specialized for transaction narrations or payees?
> I didn't have much luck with "smart_importer" and decided the
> scipy/numpy/etc dependency was a PITA so am (or was) thinking to knock
> up a plugin to complete my imported transactions.
Fuzzy matching strings is not all there is to write a machine learning
classifiers. I think that 'pip install scikit-learn' is immensely easier
than rolling your own algorithms.
Maybe if you provide more details on how smart_importer does not work
for you, someone can help you in making it work.
> Is a plugin the correct idea?
I don' think so.
A plugin operates on the transactions read from a ledger after beancount
after booking (the process for which all the postings in all the
transactions are balanced, padding amounts are calculated, lots are
computed, etc...). The transactions processed in this phase already need
to have all postings completed.
Also, a plugin does not have a way to serialize the completed
transactions into a ledger. Unless you hack something together, your
plugin would run every time you load your ledger and will have to do its
job again. This would make fixing any mistake the automatic
categorization algorithms does rather cumbersome.
Why do you thing a plugin is a better approach?
> I noted that the importer is provided with an `existing_entries` list of
> transactions, which seems a very useful suite of items to match
> against. But can I reach that from the plugin?
That what? A plugin as access to all the transactions in the ledger on
which Beancount is operating. In this context there isn't the notion of
another ledger to which a batch of transactions will be added to.
> Where/how? and is that even a good idea? (its not going to re-read the
> entire history for every imported transaction is it? Hmm, I'd tolerate
> that nonetheless :-)) I'm assuming a `beancount.loader.load_file`
> inside the plugin would create some recursive sillyness?
The beancount parser and loader are capable of loading more than one
file in the same process. However, there is no protection from a plugin
that recursively tries to load the Beancount ledger from which it has
been invoked. If you want to try the ledger filename is available as the
"filename" entry to the "options_map" passed to the plugin entry point.
The way I would approach this, if you want a solution independent from
the import framework, is to use beancount.parser.parse_file() to parse
the transactions from a ledger, use the technique you like the most to
complete or rewrite the transactions, and write them back with
beancount.parser.printer.print_entries().
Cheers,
Dan