Announcing beancount-import: powerful, semi-automatic transaction import

740 views
Skip to first unread message

Jeremy Maitin-Shepard

unread,
Oct 4, 2018, 7:52:57 PM10/4/18
to bean...@googlegroups.com
I'd like to announce beancount-import, a tool for semi-automatically importing transactions from external data sources, with support for merging and reconciling imported transactions with each other and with existing transactions in the beancount journal.  The UI is web based.

You can find detailed information, setup and configuration instructions, and examples to run on test data at:

This tool differs from the existing transaction import functionality in beancount in several important ways:
 - it includes metadata in imported entries that allow it to reliably associate entries in the journal with external data, so that you don't need to manually track what has already been imported.
 - there is automatic prediction of unknown accounts (currently based on a learned decision tree classifier)
 - rather than operating in a purely append mode, there is a powerful matching mechanism that can propose matches between existing transactions and new imported transactions, which handles transfers between accounts, manually entered transactions, and many other cases.  Matches to existing transactions are handled by editing the journal in place to add any additional postings/metadata, which is accomplished safely using the journal_editor module that is part of beancount-import, but which is also useful independently.

There is existing, well-tested support for a variety of data sources, including OFX files (checking/savings/credit card/investment/retirement accounts), downloaded Mint.com transactions, Venmo, Amazon.com order invoices, and others.  There is also a clean API for defining new data sources.

I'm also releasing the related package finance-dl, available at https://github.com/jbms/finance-dl, which is useful for automatically downloading data that can then be fed into beancount-import.  This tool currently supports a number of services including the OFX protocol, Mint.com, Amazon.com, and Venmo, and can also be extended to support other websites.

This is a completely rewritten successor to the original, much more limited beancount-import tool that I released several years ago.  I've been developing this tool over the past several years, and have been successfully using it to maintain my beancount journal containing many thousands of transactions.  I finally got around to cleaning it up and polishing it for release so that it may be useful to others.

I know that there is some overlap with other transaction import tools being developed.  I'm very open to finding ways that we can collaborate/combine our efforts.

Regarding the name, if there is strong opposition to using beancount in the name, I'm happy to change it to something else.

Martin Blais

unread,
Oct 4, 2018, 9:27:01 PM10/4/18
to Beancount
This looks great! :-)
Thank you for sharing your workflow with everyone.
(Link added to the contrib doc.)



--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAKJfoCE_9_kEzfh75p9i76AzC9iThCEkeD%2BnWVR_qjDVtZ7fTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Maitin-Shepard

unread,
Oct 5, 2018, 1:37:41 AM10/5/18
to bean...@googlegroups.com
A key question is whether it would be possible to unify the beancount-import concept of a source with the beancount concept of an ingester.

The main differences are:
- beancount-import expects the source to generate full transactions, rather than partial transactions, but may specify some accounts as Expenses:FIXME to indicate an unknown account.  Beancount ingesters generate partial transactions with missing postings if accounts are unknown.
- beancount-import additionally expects data sources to:
 1. automatically avoid importing already-imported transactions in a reliable, non-heuristic way
 2. Optionally indicate which postings are considered cleared for a set of accounts for which it declares itself authoritative,
 3. Optionally specify how to extract features from postings/transactions for automatic prediction of unknown accounts.
All of these extra things are done by adding source-specific metadata to the generated transactions.

It is easy to convert a partial transaction into a full transaction as expected by beancount-import by adding an extra Expenses:FIXME posting with the residual.  The extra metadata required cannot be generated automatically, though.

Justus Pendleton

unread,
Oct 6, 2018, 4:45:08 AM10/6/18
to Beancount
On Friday, October 5, 2018 at 6:52:57 AM UTC+7, Jeremy Maitin-Shepard wrote:
 - it includes metadata in imported entries that allow it to reliably associate entries in the journal with external data, so that you don't need to manually track what has already been imported.

Can you give a point to the code for how this works? It is something that individual Source implementations can override or control?

I ask because I have a YNAB4 importer[1] which uses metadata on each transaction to achieve a similar result. In YNAB, each transaction has a UUID, so by shoving that in beancount metadata, I can keep track of what has already been imported.

I'm vaguely interested in rewriting the YNAB importer to use your framework -- since yours has lots of other nice things -- but I don't really want to have to deal with re-confirming years worth of already imported YNAB transactions. So ideally I could have a YNAB Source in your framework that UUID. Is something like that possible?

Jeremy Maitin-Shepard

unread,
Oct 6, 2018, 6:58:32 AM10/6/18
to bean...@googlegroups.com
On Sat, Oct 6, 2018 at 1:45 AM Justus Pendleton <just...@gmail.com> wrote:
On Friday, October 5, 2018 at 6:52:57 AM UTC+7, Jeremy Maitin-Shepard wrote:
 - it includes metadata in imported entries that allow it to reliably associate entries in the journal with external data, so that you don't need to manually track what has already been imported.

Can you give a point to the code for how this works? It is something that individual Source implementations can override or control?
Yes, this is entirely controlled by the individual Source.


I ask because I have a YNAB4 importer[1] which uses metadata on each transaction to achieve a similar result. In YNAB, each transaction has a UUID, so by shoving that in beancount metadata, I can keep track of what has already been imported.

I'm vaguely interested in rewriting the YNAB importer to use your framework -- since yours has lots of other nice things -- but I don't really want to have to deal with re-confirming years worth of already imported YNAB transactions. So ideally I could have a YNAB Source in your framework that UUID. Is something like that possible? 


To define a YNAB source, you could extend the Source class in

In the __init__ method you would load the YNAB database into memory.

You would define a method named prepare that is given a JournalEditor object, which has an entries property, which has the list of directives loaded from the journal, and an accounts property that is a dict that maps the account name str -> account Open objects.

This method would basically do everything that your existing code already does, and it would be pretty easy to adapt it: you would use the accounts dict provided to find which accounts have a ynab_name metadata field and build your account_mapping dict, build up your previous_imports list (which you might want to change to a set for efficiency) by iterating through the entries in the journal, add any relevant warnings or errors to the SourceResults object provided, and add any new transactions to the SourceResults object as well.

If you wanted to avoid having to manually specify the ynab_name to beancount account mapping, you could instead create the postings with Expenses:FIXME as the account, and with a ynab_account_name metadata field on each posting indicating the ynab account name, and set:

self.example_posting_key_extractors['ynab_account_name'] = None

in the __init__ method of your Source.

Then the account names would be predicted automatically, but could be manually overridden.  (It seems like what you have already is more convenient, though.)

The source that behaves closest to what you want to do is the amazon source in beancount_import/source/amazon.py. 

One thing I'm unclear on is what sort of workflow you want to have.

From the documentation of your beancount-ynab project, it sounds like you manually enter transactions into YNAB, and then periodically import them automatically into beancount.

With beancount-import, you would still manually confirm each transaction as it is being imported, as that lets you choose whether to import it as a new transaction or accept a proposed match that merges it with other transactions, and also lets you specify/confirm any unknown accounts, edit the narration, etc.  Even in cases where you are just confirming the transaction and there are no unknown accounts that were predicted, I still find it useful to confirm the transaction since that lets me know about it (as it is coming from an external source, so I haven't seen it before).

In your case, though, it sounds like you've already manually entered the transactions in YNAB, so you don't necessarily want to manually confirm or edit each transaction when importing into beancount.

Do you always import into YNAB manually, or do you somehow import external data into YNAB?

 

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Justus Pendleton

unread,
Oct 6, 2018, 10:29:38 PM10/6/18
to Beancount

On Saturday, October 6, 2018 at 5:58:32 PM UTC+7, Jeremy Maitin-Shepard wrote:
One thing I'm unclear on is what sort of workflow you want to have.

From the documentation of your beancount-ynab project, it sounds like you manually enter transactions into YNAB, and then periodically import them automatically into beancount.
[...]
In your case, though, it sounds like you've already manually entered the transactions in YNAB, so you don't necessarily want to manually confirm or edit each transaction when importing into beancount.

Do you always import into YNAB manually, or do you somehow import external data into YNAB?

This is a fair point :)

I do always manually input all YNAB data. It has the ability to import CSV/OFX/QIF but I've never used it because I don't live in the US and I'm not sure any of the banks I use offer those :). I also live in a very cash based society so ~90% of my transactions are in cash and have to be entered manually anyway.

I was thinking it would be nicer to not have to manually specify the account mapping and let beancount-import's magic work instead. But maybe that's not a big enough improvement to be worth it -- especially since what I have now is already working. Maybe I'll have a play with it anyway and see how it turns out.....

Martin Blais

unread,
Oct 9, 2018, 7:31:24 PM10/9/18
to Beancount
On Sat, Oct 6, 2018 at 4:45 AM Justus Pendleton <just...@gmail.com> wrote:
On Friday, October 5, 2018 at 6:52:57 AM UTC+7, Jeremy Maitin-Shepard wrote:
 - it includes metadata in imported entries that allow it to reliably associate entries in the journal with external data, so that you don't need to manually track what has already been imported.

Can you give a point to the code for how this works? It is something that individual Source implementations can override or control?

I ask because I have a YNAB4 importer[1] which uses metadata on each transaction to achieve a similar result. In YNAB, each transaction has a UUID, so by shoving that in beancount metadata, I can keep track of what has already been imported.

BTW you should consider using a ^link for that; they were designed for that purpose.
When there's a unique transaction id available from the files I import, I always add them as a link.


I'm vaguely interested in rewriting the YNAB importer to use your framework -- since yours has lots of other nice things -- but I don't really want to have to deal with re-confirming years worth of already imported YNAB transactions. So ideally I could have a YNAB Source in your framework that UUID. Is something like that possible?

--

Jeremy Maitin-Shepard

unread,
Oct 9, 2018, 11:50:52 PM10/9/18
to bean...@googlegroups.com
I guess a link has the advantage that it would support multiple uuids, if you for some reason wanted to combine multiple YNAB transactions into a single beancount transaction.  However, the limited character set supported by links/tags can make it a bit challenging to encode the relevant information.  In this case it seems like ynab.xxxxx would work fine for this case, though.

In beancount-import, the existing data sources currently use transaction or posting metadata fields for all of that type of information.  It is normally recorded on a per-posting basis -- only some specialized sources, like the amazon one, use transaction metadata.

If per-posting links were supported, that would solve part of the problem, though it would still be complicated/inconvenient to encode the information in the link character set.  Using metadata fields also conveniently enforces the restriction that there be only a single value.

Justus Pendleton

unread,
Oct 10, 2018, 10:16:23 AM10/10/18
to Beancount
On Wednesday, October 10, 2018 at 6:31:24 AM UTC+7, Martin Blais wrote:
BTW you should consider using a ^link for that; they were designed for that purpose.
When there's a unique transaction id available from the files I import, I always add them as a link.

Could you explain that a bit more? The documentation for links says "You may think of the link as a special kind of tag that can be used to group together a set of financially related transactions over time". Except in this case there is no "group" of transactions; there would only be exactly one transaction with the link. Since there's no group, I guess it didn't naturally occur to me to use them for this. Are there any benefits to using a link instead of metadata? (Ergonomics, easier searching, better UI handling?)

Martin Blais

unread,
Oct 14, 2018, 8:29:00 PM10/14/18
to Beancount
No benefit. Same as tags, but I've been using them differently. I use links to link together multiple related transactions (of which, initially, if imported as suggested here, there would be a unique one for each imported transaction, but you can then link others manually). Tags tend to be used for larger groups of transactions, e.g. a trip, a category.

In the end those could be merged. I've had it in the back of my mind to merge them if I ever write 3.0.


Reply all
Reply to author
Forward
0 new messages