According to examples/import.py hooks take two parameters
Args:
extracted_entries_list: A list of (filename, entries) pairs, where
'entries' are the directives extract from 'filename'.
ledger_entries: If provided, a list of directives from the existing
ledger of the user. This is non-None if the user provided their
ledger file as an option.
Returns:
A possibly different version of extracted_entries_list, a list of
(filename, entries), to be printed.
But "ledger_entries" is never None -- that is, it is non-None even if a user didn't provide their ledger file as an option.
This is because of the deduplicate logic in __init__.py/_extract which extends existing entries before calling hooks
# Deduplicate.
for filename, entries, account, importer in extracted:
importer.deduplicate(entries, existing_entries)
existing_entries.extend(entries)
# Invoke hooks.
for func in ctx.hooks:
extracted = func(extracted, existing_entries)
This was somewhat surprising to me (especially since it was contrary to the quasi-documentation/comment I quoted) as I wouldn't expect (or want) to have the newly imported entries merged into the existing entries before hooks have run.
Is this intended? Is there another easy way to get the pristine set of entries from within a hook short of just running beancount.loader myself?
My actual use case: I want to know the most recent Balance statement of an account in the ledger, which I am using as a proxy for "last imported date". But the most recent Balance statement I actually find will the one auto-generated by beangulp and then merged into existing_entries.
Cheers,
Justus