Best approach to combine multiple data sources

246 views
Skip to first unread message

Nasko Oskov

unread,
Oct 4, 2022, 12:14:12 AM10/4/22
to Beancount
Hey folks,
I'm trying to wrap my mind around the different parts of beancount in order to automate the import of my Amazon purchases. Using their Chase card allows me to have a mapping from a credit card transaction to Amazon order id. From Amazon, I can download a csv file with all items with their order ids.

If I use an importer, then I can get the ofx file to generate transactions, but it doesn't have any other information than the transaction payee and the amount. I decided to write a plugin that takes each transaction and maps the payee to order id, so I can record this as metadata.

The problem I'm hitting is that plugin transforms transactions when they are fed into tools, but does not update my beancount file. I assume this is because the beancount file is source of truth and we want it to be as immutable as possible. Am I correct in my understanding?
Importer seems to be the right place to combine all the bits of data together to generate the transaction correctly from the very get go, but that means I can't use either beancount-import or beancount-reds-importers as easily. 

Any suggestions on how to approach solving this?
Thanks in advance!
Nasko

P.S. Ideally, I'd integrate smart_import so I can train it on the data, since a bunch of transactions are repetitive.

Ben Blount

unread,
Oct 4, 2022, 12:21:36 PM10/4/22
to Beancount
Hey Nasko,
Yes plugins are intended to be ephemeral transformations / validations on the ledger. It reads the ledger file(s) on disk, and applies plugins to the in-memory representation of the ledger.

My recommendation is to think of your ledger as a pipeline. Some stages of it can be ephemeral and regenerated on demand.
One way to achieve this with importing is to use multiple ledger files that are included from the main ledger file. Some of those ledgers could be completely regenerated on demand by an importer, others would be for your manual transactions.
You can also break them down by year, which can be handy for if your ledger gets large and you want to speed up parsing.

The approach that I've seen most often is that the importers are run and permanently merge their new transactions into the ledger. beancount-import has a web-ui for this, while Fava's importer / beangulp use commands to import all transactions from the importer at once.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/5d6ddfbb-5cd5-4994-b17f-fd98a2817c89n%40googlegroups.com.

Red S

unread,
Oct 4, 2022, 3:52:09 PM10/4/22
to Beancount
On Tuesday, October 4, 2022 at 9:21:36 AM UTC-7 b...@bben.us wrote:
The approach that I've seen most often is that the importers are run and permanently merge their new transactions into the ledger. beancount-import has a web-ui for this, while Fava's importer / beangulp use commands to import all transactions from the importer at once.

This article illustrates the above. beancount-reds-imports uses a simple script to append the new lines to the existing ledger.

Red S

unread,
Oct 4, 2022, 4:19:04 PM10/4/22
to Beancount
I'm trying to wrap my mind around the different parts of beancount in order to automate the import of my Amazon purchases. Using their Chase card allows me to have a mapping from a credit card transaction to Amazon order id. From Amazon, I can download a csv file with all items with their order ids.

Here is a slightly different approach that has dependably worked for me for years to import Amazon purchases, and addresses all the issues you brought up (of matching during import):

Solution
1. Credit card transactions are booked to a Zero sum account. Smart importer does this automatically:
2022-09-02 * "AMZN Mktp US*1FASU238B"
  Liabilities:Credit-Cards:MyCard  -18.92 USD
  Assets:Zero-Sum-Accounts:Amazon-Purchases

2. Amazon purchases are booked from the Zero sum account to the expense account. smart_importer will book repeating transactions correctly. Others need to be booked manually:
2022-09-03 * "Cool Stealth Canoe, Black"
  card: "Gift Certificate/Card and Visa - 1234"
  Assets:Zero-Sum-Accounts:Amazon-Purchases  -18.92 USD
  Expenses:Outdoor-Activities:Equipment

3. In most cases, the transactions in the Zero sum account will sum to zero, and you are all done! Optionally, if you use the zero sum plugin,  it will match transactions above in memory, that can help in case you have unmatched, left over transactions.

Pros and Cons
  • With this approach, there is no dependency on the credit card used to include metadata that maps to the Amazon order id. I could even pay with an Amazon gift card (even partially), and everything works out of the box.
  • The approach decouples the payment with the classification of transactions. This way, you are not forced to import and classify purchases each time you download your credit card transactions (which is fully automated for me, and takes like 5 seconds, which means I tend to do it frequently). Your pending amazon purchase details are neatly booked into the Assets account as a negative balance, and thus also reflect your state correctly.
  • Of course, the decoupling means you can't see the expense account on the payment transaction directly. I find that this doesn't matter to me at all. If you find that it does (I'd be curious as to why), it's pretty simple to solve by adding a trivial beancount-link feature to the zerosum plugin to link the two transactions. Fava will then display a link.

Implementation
I use Beancount's built in csv importer, which suffices, along with smart_importer, both seen in the config below:

CONFIG = [
    # GDPR Download
    # limitations: doesn't include coupons and discounts
    apply_hooks(csv.Importer({Col.DATE: 'Order Date',
                  Col.NARRATION1: 'Product Name',
                  Col.AMOUNT: 'Total Owed',
                  Col.LAST4: 'Payment Instrument Type',
                  },
                  'Assets:Zero-Sum-Accounts:Amazon-Purchases',
                  'USD',

                  ("Website", "Order ID", "Order Date", "Purchase Order Number", "Currency", "Unit Price",  "Unit Price Tax",
                   "Shipping Charge", "Total Discounts", "Total Owed", "Shipment Item Subtotal", "Shipment Item Subtotal Tax",
                   "ASIN", "Product Condition", "Quantity", "Payment Instrument Type", "Order Status", "Shipment Status",
                   "Ship Date", "Shipping Option", "Shipping Address", "Billing Address", "Carrier Name & Tracking Number",
                   "Product Name", "Gift Message", "Gift Sender Name", "Gift Recipient Contact Details"),

                  encoding='utf-8-sig',
                  institution='Amazon',
                  invert_sign=True),
       [PredictPostings()]),
]

Amazon used to provide a "Category" column (eg: Kitchen, Sporting, etc.) that worked together with smart_importer to book almost any, never-seen transaction correctly. They don't any more.

Extensions
I screenscrape (copy-paste) my Amazon gift card transactions from the webpage as a tab-separated file, and import this as a payment method. At the end of this, I have a gift card balance that usually agrees with what my Amazon account shows. I'll share this importer later.

All the above takes perhaps 3-4 minutes + the time taken to book the purchases. I personally do it only once in several months.

Red S

unread,
Oct 4, 2022, 4:37:45 PM10/4/22
to Beancount
Now written up as an article here.

Martin Blais

unread,
Oct 5, 2022, 8:32:54 PM10/5/22
to bean...@googlegroups.com
Creative solution, using an intermediate account.

I wonder if just an importer that requires both the credit card account and the Amazon orders file in order to produce output isn't simpler. It's just a join between these two datasets, and and CC importer could just fail when the other file's not given to it.

Now of course, beangulp is designed with the idea that imports come from a single file... you probably want to just write a Python script with custom code. 

Just an idea, (the Amazon thing's been nagging me too, I do it manually and I'm annoyed each time),




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
Message has been deleted

Red S

unread,
Oct 6, 2022, 4:26:19 AM10/6/22
to Beancount
On Wednesday, October 5, 2022 at 5:32:54 PM UTC-7 bl...@furius.ca wrote:
Creative solution, using an intermediate account.

I wonder if just an importer that requires both the credit card account and the Amazon orders file in order to produce output isn't simpler. It's just a join between these two datasets, and and CC importer could just fail when the other file's not given to it.

I did just that at some point, and the annoyances I ran into include:
- having to have both sets of data together to do the import. Amazon GDPR requests for orders data take 24-48 hrs after submission (is there a better way?). So I'm stuck on importing my credit card data until the GDPR request goes through
- Dates don't match: if I don't download both at exactly the same time, one is ahead or behind the other, meaning I have to point the scripts at historical downloads
- I tend to use a bunch of different credit cards for purchases, compounding this problem
- I tend to make full and partial payments using Amazon gift cards, which means I need to import all my payment sources along with my Amazon order history together, and it gets a bit more complicated than a simple join

Of course, YMMV. Some/all of these may not apply to you, so a join-at-import might work well for you.

Here's yet another idea that may be the best of all worlds: the thread posted by Archimedes just now was very timely: a nice solution would be to do each side of the import (payment / orders) independently, have the importer rewrite an existing matching entry if found; and if not found, book it to the intermediate account (for later rewriting when the other side is imported).
Reply all
Reply to author
Forward
0 new messages