beancount_reds_importers for investment, banking

598 views
Skip to first unread message

redst...@gmail.com

unread,
Jan 29, 2021, 10:03:54 PM1/29/21
to Beancount
With importing is the "ugly" side of personal finance, I've tried hard to automate them away so I spend almost no time importing, and instead spend my time analyzing and understanding. I've found maintainability and extensibility (adding new importers) to be key. Here, I share a couple of significant updates to my importer framework.

Installation: pip install beancount-reds-importers

1) the source parser (aka the "reader") is now separate from the transaction creator. This makes it easy to combine any:
- current readers: ofxreader, csvreader
- current creators: investments, banking (including credit card)

Constructing entries for investment accounts can be complex and ugly due to the special casing required, and the variances between brokers. Separating these two allows us to reuse well tested "investments" code with ofx, csv, and other readers. More on this later.

2) "Commodity leaf" accounts are now supported, optionally. I recently moved my accounts to using these as a part of making returns computations easier with what is now Martin's beangrow. The importers will now optionally output transactions like:

2018-01-01 * "Buy"
  Assets:Investments:HOOLI  10 HOOLI @ {2 USD}
  Assets:Investments:USD

2018-01-01 * "Dividends"
  Assets:Investments:USD  10 USD
  Income:Investments:HOOLI

There are several other features that I'll write about later.

redst...@gmail.com

unread,
Jan 30, 2021, 12:55:51 AM1/30/21
to Beancount
Here are the basics about the importers. I'd previously posted a quick version here.

The investment importer solves a number of issues including:
  • stock and fund transactions including buys and sells (lot matching, however,  is left to the user)
    • handles commissions, fees
  • money market transactions are detected and use price conversions instead of being held at cost, for simplicity
  • balance assertions are extracted from the source when they exist. most ofx files contain these. these are critical to ensuring integrity of the import with no manual effort
    • available cash computation
  • produces price entries (many brokerages include ticker prices of all active tickers in the account on the date of download, in addition to of course, the prices gleaned from transactions)
  • cusip to fund matching from user-supplied database (some brokerage houses identify funds only by cusip or other custom tickers in the ofx)

The code is designed to minimize the importer code necessary to account for variances in ofx format across brokerage houses

The banking importer is pretty basic. Currently, it produces only a single posting per account, with the assumption that the other posting(s) will be automatically filled in by smart_importer.

Readers include ofx and csv. Either importer can mixed and matched with either source. Readers:
- use filename extension and filename matching to quickly identify mismatches to speed up bean-identify and bean-extract

redst...@gmail.com

unread,
Jan 30, 2021, 1:54:54 AM1/30/21
to Beancount
petl is an etl library for tables that Martin recently shared. It makes working with .csv files really easy. Given writing and maintaining importers can be time consuming (not to mention not-much-fun), I thought I's share a walkthrough of how easy petl makes it, below. I hope this inspires others to write and share importers. (My beancount_reds_importers framework makes a few other things easier as well).

This csv importer uses petl to read a .csv into a table for manipulation. The output of this reader is a list of namedtuples corresponding roughly to ofx transactions. The following steps achieve this (based on my comments in the code):
  • read csv into petl table:
  • skip header and footer rows (configurable)
    • rdr = rdr.skip(self.skip_head_rows)
    • rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1)
  • prepare_raw_columns: an overridable method to help get the raw table in shape. As an example, the schwab importer does the following
    • rdr.cutout('') # remove the last column, which is empty
    • for rows with interest, the date column contains text such as: '11/16/2018 as of 11/15/2018'. We convert these into a regular parseable date: '11/16/2018'
              def cleanup_date(d):
                  """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
                  return d.split(' ', 1)[0]
              rdr = rdr.convert('Date', cleanup_date)

    • add a 'tradeDate' column, which is a copy of the 'Date' column, to correspond to the importer API
      • rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
    • add a a 'total' column, which is a copy of the 'Amount' column, to correspond to the importer API
  • rename columns: columns headers are renamed to standardize them to the importer API, using a supplied dictionary. For the included schwab importer, that looks like:
            self.header_map = {
            "Action":      'type',
            "Date":        'date',
            "tradeDate":   'tradeDate',
            "Description": 'memo',
            "Symbol":      'security',
            #etc.}
            rdr = rdr.rename(self.header_map)
  • convert_columns: this fixes up the actual data in each column. The base class does the following:
    • map types to standard types. The standard types that the importer-API uses are loosely based on ofx  standards. For example, the schwab importer needs this mapping:
        self.transaction_type_map = {
            'Bank Interest':      'income',
            'Buy':                'buystock',
            'Cash Dividend':      'dividends',
            'MoneyLink Transfer': 'transfer',
            'Reinvest Dividend':  'dividends',
            'Reinvest Shares':    'buystock',
            'Sell':               'sellstock',
            }
        rdr = rdr.convert('type', self.transaction_type_map)
    • numbers are parsed from string and convered into Decimal type. Non-numeric characters like '$' are removed.
    • dates are parsed and converted into datetime type.
  • The table is now ready for use by the importer. petl makes each row available via petl's namedtuples!

redst...@gmail.com

unread,
Jan 30, 2021, 5:09:54 AM1/30/21
to Beancount
The code from my previous post in this thread, for the schwab csv importer ends up looking like the following. The code below minimally expresses the semantics of the schwab csv format, leaving the heavy lifting to the ofxread and investment modules.

from beancount_reds_importers.libimport import investments, csvreader

class Importer(investments.Importer, csvreader.Importer):
    def custom_init(self):
        self.max_rounding_error = 0.04
        self.account_number_field = 'number'
        self.filename_identifier_substring = '_Transactions_'
        self.header_identifier = '"Transactions  for account ' + self.config.get('custom_header', '')
        self.get_ticker_info = self.get_ticker_info_from_id
        self.date_format = '%m/%d/%Y'
        self.skip_head_rows = 1
        self.skip_tail_rows = 1
        self.funds_db_txt = 'funds_by_ticker'

        self.header_map = {
            "Action":      'type',
            "Date":        'date',
            "tradeDate":   'tradeDate',
            "Description": 'memo',
            "Symbol":      'security',
            "Quantity":    'units',
            "Price":       'unit_price',
            "Amount":      'amount',
            "total":       'total',
            "Fees & Comm": 'fees',

            }
        self.transaction_type_map = {
            'Bank Interest':      'income',
            'Buy':                'buystock',
            'Cash Dividend':      'dividends',
            'MoneyLink Transfer': 'transfer',
            'Reinvest Dividend':  'dividends',
            'Reinvest Shares':    'buystock',
            'Sell':               'sellstock',
            }
        self.skip_transaction_types = ['Journal']

    def prepare_raw_columns(self, rdr):
        rdr = rdr.cutout('')  # clean up last column

        def cleanup_date(d):
            """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
            return d.split(' ', 1)[0]
        rdr = rdr.convert('Date', cleanup_date)
        rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
        rdr = rdr.addfield('total', lambda x: x['Amount'])
        return rdr

redst...@gmail.com

unread,
Feb 1, 2021, 10:20:56 PM2/1/21
to Beancount
Continuing this series, I've added file format readers to read csv and xlsx files with multiple tables. This seems to be a common format I've come across from multiple providers.

Annoyingly, there is no standard specification or even a sane way that all providers follow. This reader assumes that any line with a single field marks the beginning of a new section, with that field as the title of the section. The next line is assumed to be the header for that new section.

This is a file format reader for beancount_reds_importers that converts:
---- examples.csv -----
downloaded on: blah blah
section1
date,transactions,amount
2020-02-02,3,5.00
2020-02-02,3,5.00
section2
account_num,balance,date
123123,1000,2020-12-31
23048,2000,2020-12-31
end_of_file
-----------------------

to this data structure:
self.alltables =  {'section1': <petl table of section 1>
                   'section2': <petl table of section 2>
                  }

where each value is a petl table.


This file format is common enough to warrant this reader.
The xlsx_multitable reader is built on top of this reader

redst...@gmail.com

unread,
Feb 1, 2021, 10:31:17 PM2/1/21
to Beancount
Auto-generated balance assertions are key to ensuring integrity of your imports. Ofx files typically contain balances, while csv files typically do not. Schwab is an exception: it lets you download a separate csv file that lists the current balances of each account you own. However, the resulting format means there are multiple tables (one per account) in the csv

The new "schwab_csv_balances" importer makes use of the new multitable reader. It lets you specify just the section you want to extract, and turns that section's table into balance assertions.

Red S

unread,
Jun 7, 2022, 1:51:08 PM6/7/22
to Beancount
Beancount Red's Importers 0.5.0 has been released.

pip install beancount-reds-importers

Includes new tools:
- `ofx-summarize`: Quick and dirty way to summarize a .ofx file, and peek inside it
- `bean-download`: [Download account statements automatically](https://reds-rants.netlify.app/personal-finance/direct-downloads/)
  (for supporting institutions), from your configuration of accounts. Multi-threaded.

The commands include shell auto-completion (tab-to-complete) via click. `bean-download`, in particular, can complete the account or account groups you want to download, which can be handy.


CHANGES:
## New Tools (see README.md)
- bean-download
- ofx-summarize

## New Importers:
- vanguard screenscraped


## Features and Improvements:
- Add a single-table xlsx reader (#24)
- consider isin to be a substring
- support custom entries (commodities, open/close, etc.). see #30
- Add a build_metadata function that can be overridden by custom importers
  (savingsandloan)
- csv_multitable_reader: add head/tail skipping, comments (#27) (savingsandloan)
- reworked get_target_acct_custom: now, get_target_acct_custom() will be allowed to
  either return a target account (for special cases), or return a None, which will allow
  the default implementation to run
- vanguard: catch capital gains distributions using description
- rename capgains_lt to capgainsd_lt (distribution)
- add settleDate to csv
- allow filename_pattern override via config
- change filename_identifier_substring to regex pattern
- commodity_leaf is now configurable via string replacement
- Make sorting of paycheck postings configurable (#21)
- If the `sort_postings` option in the importer config is set to `False`, then related
  postings are left next to each other. The option defaults to `True` to match the
  existing behavior. (Gary Peck)
- add IMPORTER_NAME for debugging
- schwab_csv_balances: support more general filenames
- schwab csv importer: add more transaction types
- add ticker to get_target_acct_custom()


## Bugs fixed
- investments: generate balance + price entries even if end_date is missing
- fix bug mentioned in #26 (use cash_account for cash transfers)
- fix bug causing ofxreader to fail on multiple calls by bean-extract
- fix bug that read the same file upon multiple calls from bean-extract
- fix target account for buy/sell
- fix csvreader identification bug
- commodity_leaf: filing string
- fix several commodity leaf account bugs
- file_date was not returning value
- Handle sign flipping for all account types in paycheck importer (#20) (Gary Peck)
- Fix missing return value in xlsx_multitable_reader.is_section_title() (#18) (Gary Peck
- schwab file name
- add and expose handler for price and cost both zero. Closes #15
- Fix handling for readers when available_cash is not present. (Matt Brown)

Red S

unread,
Jan 22, 2023, 2:50:10 PM1/22/23
to Beancount
Beancount Red's Importers 0.6.0 has been released. Thanks to several contributors who were a part of this release!
pip install beancount-reds-importers

New importers (five); several small features; many improvements

## 0.6.0 (2023-01-22)

### New Importers

- add schwab_checking_csv. [Red S]
- importer: amazon gift card importer. [Red S]
- add Discover credit card csv importer. [Red S]
- importer: add capitalone 360 ofx. [Red S]
- add becu (Boeing Employees Credit Union) (#34) [Patrick Baker]

### Improvements

- Better missing security reporting (#43) [thehilll]

  * if get_ticker_info_from_id finds an id not present in fund_info.py try to use self.ofx.security_list
  to report more useful information.  At least in Fidelity ofx files both the symbol and security name
  are present, so in most cases the required additions to fund_info.py are already in the file.

  * Slightly better reportings...print a summary line before list of securities

  * remove unused f string

  * 1. Comments to explain what is going on
  2. Key the extracted securities dict by CUSIP (not symbol) which matches what is found
       in the transaction entries (not for bonds symbol generally matches CUSIP)
  3. Values of dict are now a tuple of (symbol, cusip, name) which for stocks should be all
       that is needed for fund_info.py

  * Remove another unnecessary f string

- Convert value to str before applying regex in remove_non_numeric (#32) [Balazs Keresztury]
- add overridable security_narration() method; use it in Fidelity. [Red S]
- add ability to use smart importer with investments (#36) [Patrick Baker]
- bean-download accepts a comma separated institution list. [Red S]
- bean-download display. [Red S]
- add skip_transaction to banking. [Red S]
- minor: workday importer now specifies filing account. [Red S]
- minor: add overridable post process step for csv. [Red S]
- minor: add etrade to direct download template. [Red S]
- minor: Chase filename regex is now [Cc]hase. [Red S]
- minor: allow overriding payee in investments. [Red S]

### Fixes

- fidelity rounding errors are frequently higher than current value. [Red S]
- date format wasn't getting set during bean-file. [Red S]
- #41 add 'check' to ofx transaction types. [Red S]
- #41 add 'payment' to ofx transaction types. [Red S]
  for investment accounts with banking features
- #40 check if ofx file provides balances instead of assuming it does. [Red S]
- #37 document `filename_pattern` in the demo example. [Red S]
- schwab_csv: update csv fields; custom payee. [Red S]
- csvreader returned blank when skip_comments was not present. [Red S]
- handle vanguard annoyance. [Red S]
- workday: skip last row to zero. [Red S]
- banking: fix the payee/narration order issue caused by 9057fcf j 9057fcf: #33. allow importers to override payee and narration fields. [Red S]
- catch ofxparse exceptions. [Red S]
- fix; schwab_checking_csv: withdrawals should be negative. [Red S]
- schwab_csv bug fixes. [Red S]
- #33. allow importers to override payee and narration fields. [Red S]
- when matching ofx security_id to funds_db prefer a complete match, if none found fall back to substring (#45) [thehilll] 
Reply all
Reply to author
Forward
0 new messages