beancount_reds_importers for investment, banking

redst...@gmail.com

unread,

Jan 29, 2021, 10:03:54 PM1/29/21

to Beancount

With importing is the "ugly" side of personal finance, I've tried hard to automate them away so I spend almost no time importing, and instead spend my time analyzing and understanding. I've found maintainability and extensibility (adding new importers) to be key. Here, I share a couple of significant updates to my importer framework.

Installation: pip install beancount-reds-importers

Github link

1) the source parser (aka the "reader") is now separate from the transaction creator. This makes it easy to combine any:

- current readers: ofxreader, csvreader

- current creators: investments, banking (including credit card)

Constructing entries for investment accounts can be complex and ugly due to the special casing required, and the variances between brokers. Separating these two allows us to reuse well tested "investments" code with ofx, csv, and other readers. More on this later.

2) "Commodity leaf" accounts are now supported, optionally. I recently moved my accounts to using these as a part of making returns computations easier with what is now Martin's beangrow. The importers will now optionally output transactions like:

2018-01-01 * "Buy"

Assets:Investments:HOOLI 10 HOOLI @ {2 USD}

Assets:Investments:USD

2018-01-01 * "Dividends"

Assets:Investments:USD 10 USD

Income:Investments:HOOLI

There are several other features that I'll write about later.

redst...@gmail.com

unread,

Jan 30, 2021, 12:55:51 AM1/30/21

to Beancount

Here are the basics about the importers. I'd previously posted a quick version here.

The investment importer solves a number of issues including:

stock and fund transactions including buys and sells (lot matching, however, is left to the user)
- handles commissions, fees
money market transactions are detected and use price conversions instead of being held at cost, for simplicity
balance assertions are extracted from the source when they exist. most ofx files contain these. these are critical to ensuring integrity of the import with no manual effort
- available cash computation
produces price entries (many brokerages include ticker prices of all active tickers in the account on the date of download, in addition to of course, the prices gleaned from transactions)
cusip to fund matching from user-supplied database (some brokerage houses identify funds only by cusip or other custom tickers in the ofx)

The code is designed to minimize the importer code necessary to account for variances in ofx format across brokerage houses

The banking importer is pretty basic. Currently, it produces only a single posting per account, with the assumption that the other posting(s) will be automatically filled in by smart_importer.

Readers include ofx and csv. Either importer can mixed and matched with either source. Readers:
- use filename extension and filename matching to quickly identify mismatches to speed up bean-identify and bean-extract

redst...@gmail.com

unread,

Jan 30, 2021, 1:54:54 AM1/30/21

to Beancount

petl is an etl library for tables that Martin recently shared. It makes working with .csv files really easy. Given writing and maintaining importers can be time consuming (not to mention not-much-fun), I thought I's share a walkthrough of how easy petl makes it, below. I hope this inspires others to write and share importers. (My beancount_reds_importers framework makes a few other things easier as well).

This csv importer uses petl to read a .csv into a table for manipulation. The output of this reader is a list of namedtuples corresponding roughly to ofx transactions. The following steps achieve this (based on my comments in the code):

read csv into petl table:
- rdr = etl.fromcsv(file.name)
skip header and footer rows (configurable)
- rdr = rdr.skip(self.skip_head_rows)
- rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1)
prepare_raw_columns: an overridable method to help get the raw table in shape. As an example, the schwab importer does the following
- rdr.cutout('') # remove the last column, which is empty
- for rows with interest, the date column contains text such as: '11/16/2018 as of 11/15/2018'. We convert these into a regular parseable date: '11/16/2018'
          def cleanup_date(d):
              """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
              return d.split(' ', 1)[0]
          rdr = rdr.convert('Date', cleanup_date)
- add a 'tradeDate' column, which is a copy of the 'Date' column, to correspond to the importer API
  - rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
- add a a 'total' column, which is a copy of the 'Amount' column, to correspond to the importer API
rename columns: columns headers are renamed to standardize them to the importer API, using a supplied dictionary. For the included schwab importer, that looks like:

self.header_map = {

            "Action":      'type',
            "Date":        'date',
            "tradeDate":   'tradeDate',
            "Description": 'memo',
            "Symbol":      'security',
            #etc.}

rdr = rdr.rename(self.header_map)

convert_columns: this fixes up the actual data in each column. The base class does the following:
- map types to standard types. The standard types that the importer-API uses are loosely based on ofx standards. For example, the schwab importer needs this mapping:

        self.transaction_type_map = {
            'Bank Interest':      'income',
            'Buy':                'buystock',
            'Cash Dividend':      'dividends',
            'MoneyLink Transfer': 'transfer',
            'Reinvest Dividend': 'dividends',
            'Reinvest Shares':    'buystock',
            'Sell':               'sellstock',
            }

rdr = rdr.convert('type', self.transaction_type_map)

numbers are parsed from string and convered into Decimal type. Non-numeric characters like '$' are removed.
dates are parsed and converted into datetime type.

The table is now ready for use by the importer. petl makes each row available via petl's namedtuples!

redst...@gmail.com

unread,

Jan 30, 2021, 5:09:54 AM1/30/21

to Beancount

The code from my previous post in this thread, for the schwab csv importer ends up looking like the following. The code below minimally expresses the semantics of the schwab csv format, leaving the heavy lifting to the ofxread and investment modules.

from beancount_reds_importers.libimport import investments, csvreader

class Importer(investments.Importer, csvreader.Importer):
    def custom_init(self):
        self.max_rounding_error = 0.04
        self.account_number_field = 'number'
        self.filename_identifier_substring = '_Transactions_'
        self.header_identifier = '"Transactions for account ' + self.config.get('custom_header', '')
        self.get_ticker_info = self.get_ticker_info_from_id
        self.date_format = '%m/%d/%Y'
        self.skip_head_rows = 1
        self.skip_tail_rows = 1
        self.funds_db_txt = 'funds_by_ticker'

        self.header_map = {
            "Action":      'type',
            "Date":        'date',
            "tradeDate":   'tradeDate',
            "Description": 'memo',
            "Symbol":      'security',

            "Quantity":    'units',
            "Price":       'unit_price',
            "Amount":      'amount',
            "total":       'total',
            "Fees & Comm": 'fees',

            }
        self.transaction_type_map = {
            'Bank Interest':      'income',
            'Buy':                'buystock',
            'Cash Dividend':      'dividends',
            'MoneyLink Transfer': 'transfer',
            'Reinvest Dividend': 'dividends',
            'Reinvest Shares':    'buystock',
            'Sell':               'sellstock',
            }

        self.skip_transaction_types = ['Journal']

    def prepare_raw_columns(self, rdr):
        rdr = rdr.cutout('') # clean up last column

        def cleanup_date(d):
            """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
            return d.split(' ', 1)[0]
        rdr = rdr.convert('Date', cleanup_date)

rdr = rdr.addfield('tradeDate', lambda x: x['Date'])

rdr = rdr.addfield('total', lambda x: x['Amount'])
return rdr

redst...@gmail.com

unread,

Feb 1, 2021, 10:20:56 PM2/1/21

to Beancount

Continuing this series, I've added file format readers to read csv and xlsx files with multiple tables. This seems to be a common format I've come across from multiple providers.

Annoyingly, there is no standard specification or even a sane way that all providers follow. This reader assumes that any line with a single field marks the beginning of a new section, with that field as the title of the section. The next line is assumed to be the header for that new section.

This is a file format reader for beancount_reds_importers that converts:

---- examples.csv -----
downloaded on: blah blah
section1
date,transactions,amount
2020-02-02,3,5.00
2020-02-02,3,5.00
section2
account_num,balance,date
123123,1000,2020-12-31
23048,2000,2020-12-31
end_of_file
-----------------------

to this data structure:
self.alltables = {'section1': <petl table of section 1>
'section2': <petl table of section 2>
}

where each value is a petl table.

This file format is common enough to warrant this reader.
The xlsx_multitable reader is built on top of this reader

redst...@gmail.com

unread,

Feb 1, 2021, 10:31:17 PM2/1/21

to Beancount

Auto-generated balance assertions are key to ensuring integrity of your imports. Ofx files typically contain balances, while csv files typically do not. Schwab is an exception: it lets you download a separate csv file that lists the current balances of each account you own. However, the resulting format means there are multiple tables (one per account) in the csv

The new "schwab_csv_balances" importer makes use of the new multitable reader. It lets you specify just the section you want to extract, and turns that section's table into balance assertions.

Red S

unread,

Jun 7, 2022, 1:51:08 PM6/7/22

to Beancount

Beancount Red's Importers 0.5.0 has been released.

pip install beancount-reds-importers

Includes new tools:

- `ofx-summarize`: Quick and dirty way to summarize a .ofx file, and peek inside it
- `bean-download`: [Download account statements automatically](https://reds-rants.netlify.app/personal-finance/direct-downloads/)
(for supporting institutions), from your configuration of accounts. Multi-threaded.

The commands include shell auto-completion (tab-to-complete) via click. `bean-download`, in particular, can complete the account or account groups you want to download, which can be handy.

CHANGES:

## New Tools (see README.md)

- bean-download
- ofx-summarize

## New Importers:
- vanguard screenscraped

## Features and Improvements:
- Add a single-table xlsx reader (#24)
- consider isin to be a substring
- support custom entries (commodities, open/close, etc.). see #30
- Add a build_metadata function that can be overridden by custom importers
(savingsandloan)
- csv_multitable_reader: add head/tail skipping, comments (#27) (savingsandloan)
- reworked get_target_acct_custom: now, get_target_acct_custom() will be allowed to
either return a target account (for special cases), or return a None, which will allow
the default implementation to run
- vanguard: catch capital gains distributions using description
- rename capgains_lt to capgainsd_lt (distribution)
- add settleDate to csv
- allow filename_pattern override via config
- change filename_identifier_substring to regex pattern
- commodity_leaf is now configurable via string replacement
- Make sorting of paycheck postings configurable (#21)
- If the `sort_postings` option in the importer config is set to `False`, then related
postings are left next to each other. The option defaults to `True` to match the
existing behavior. (Gary Peck)
- add IMPORTER_NAME for debugging
- schwab_csv_balances: support more general filenames
- schwab csv importer: add more transaction types
- add ticker to get_target_acct_custom()

## Bugs fixed
- investments: generate balance + price entries even if end_date is missing
- fix bug mentioned in #26 (use cash_account for cash transfers)
- fix bug causing ofxreader to fail on multiple calls by bean-extract
- fix bug that read the same file upon multiple calls from bean-extract
- fix target account for buy/sell
- fix csvreader identification bug
- commodity_leaf: filing string
- fix several commodity leaf account bugs
- file_date was not returning value
- Handle sign flipping for all account types in paycheck importer (#20) (Gary Peck)
- Fix missing return value in xlsx_multitable_reader.is_section_title() (#18) (Gary Peck
- schwab file name
- add and expose handler for price and cost both zero. Closes #15
- Fix handling for readers when available_cash is not present. (Matt Brown)

Red S

unread,

Jan 22, 2023, 2:50:10 PM1/22/23

to Beancount

Beancount Red's Importers 0.6.0 has been released. Thanks to several contributors who were a part of this release!

pip install beancount-reds-importers

New importers (five); several small features; many improvements

## 0.6.0 (2023-01-22)

### New Importers

- add schwab_checking_csv. [Red S]
- importer: amazon gift card importer. [Red S]
- add Discover credit card csv importer. [Red S]
- importer: add capitalone 360 ofx. [Red S]
- add becu (Boeing Employees Credit Union) (#34) [Patrick Baker]

### Improvements

- Better missing security reporting (#43) [thehilll]

* if get_ticker_info_from_id finds an id not present in fund_info.py try to use self.ofx.security_list
to report more useful information. At least in Fidelity ofx files both the symbol and security name
are present, so in most cases the required additions to fund_info.py are already in the file.

* Slightly better reportings...print a summary line before list of securities

* remove unused f string

* 1. Comments to explain what is going on
2. Key the extracted securities dict by CUSIP (not symbol) which matches what is found
in the transaction entries (not for bonds symbol generally matches CUSIP)
3. Values of dict are now a tuple of (symbol, cusip, name) which for stocks should be all
that is needed for fund_info.py

* Remove another unnecessary f string

- Convert value to str before applying regex in remove_non_numeric (#32) [Balazs Keresztury]
- add overridable security_narration() method; use it in Fidelity. [Red S]
- add ability to use smart importer with investments (#36) [Patrick Baker]
- bean-download accepts a comma separated institution list. [Red S]
- bean-download display. [Red S]
- add skip_transaction to banking. [Red S]
- minor: workday importer now specifies filing account. [Red S]
- minor: add overridable post process step for csv. [Red S]
- minor: add etrade to direct download template. [Red S]
- minor: Chase filename regex is now [Cc]hase. [Red S]
- minor: allow overriding payee in investments. [Red S]

### Fixes

- fidelity rounding errors are frequently higher than current value. [Red S]
- date format wasn't getting set during bean-file. [Red S]
- #41 add 'check' to ofx transaction types. [Red S]
- #41 add 'payment' to ofx transaction types. [Red S]
for investment accounts with banking features
- #40 check if ofx file provides balances instead of assuming it does. [Red S]
- #37 document `filename_pattern` in the demo example. [Red S]
- schwab_csv: update csv fields; custom payee. [Red S]
- csvreader returned blank when skip_comments was not present. [Red S]
- handle vanguard annoyance. [Red S]
- workday: skip last row to zero. [Red S]
- banking: fix the payee/narration order issue caused by 9057fcf j 9057fcf: #33. allow importers to override payee and narration fields. [Red S]
- catch ofxparse exceptions. [Red S]
- fix; schwab_checking_csv: withdrawals should be negative. [Red S]
- schwab_csv bug fixes. [Red S]
- #33. allow importers to override payee and narration fields. [Red S]

- when matching ofx security_id to funds_db prefer a complete match, if none found fall back to substring (#45) [thehilll]

Red S

unread,

May 17, 2024, 8:49:06 PMMay 17

to Beancount

Red’s Importers 0.9.0 has been released. Several great contributions by many folks — thank you!

pip3 install --upgrade beancount-reds-importers

I haven’t kept up with announcing these here :(. Here’s a jumbo changelog.

Changes since 0.6.0 New Features

add genericpdf paycheck importer. [Ammon Sarver]
add bamboohr paycheck importer. [Ammon Sarver]
add pdfreader libreader importer. [Ammon Sarver]
bean-download needs-update to determine accounts needing an update based on the latest balance assertion in the journal. [Red S]

New Importers

add CSV importer for Digital Credit Union (#89) [Harlan Lieberman-Berg]
add importer for Alliant Federal Credit Union (#88) [Harlan Lieberman-Berg]
add schwab csv credit line importer. [Red S]
Fidelity CMA (Cash Managment) csv importer (#47) [kantskernel]
add Schwab CSV Positions importer. [Red S]
united overseas bank (uob) importers: card, srs. [Red S]
standard chartered importers (bank, card) [Red S]
united overseas bank csv importer. [Red S]

Improvements

minor: add ‘show_configured’ in paycheck transaction builder. [Red S]
minor: add_custom_postings to banking.py. [Red S]
minor: add identification based on filename to schwab importers. [Red S]
minor: overridable add_custom_postings() [Red S]
minor: add tdameritrade to template.cfg. [Red S]
Handle the INVEXPENSE field (#79) [Jacob Farkas]
handle ‘fee’ ofx type in investments. [Red S]
configurable balance assertion dates. [Red S]
smarter balance assertion dates. [Red S]
We find the statement’s end date from the OFX file. However, banks and
credit cards typically have pending transactions that are not included
in downloads. When we download the next statement, new transactions may
appear prior to the balance assertion date that we generate for this
statement. To attempt to avoid this, we set the balance assertion date
to either two days before the statement’s end date or the last
transaction’s date, whichever is later.
needs_update: add —all-accounts. [Red S]
Improve Vanguard support. [Matt Brown]
Remove noisy Vanguard memos. Show Vanguard transaction type as payee.
Add 401k support to investments. [Matt Brown]
Support 401(k) source sub-accounts in account config values.
add get_tags to transactionbuilders (#68) [Sarah E Busch]
schwab_csv_brokerage: more descriptive payee/memo. [Red S]
overridable target account in banking importer. [Red S]
get_available_cash now takes as input settlement_fund_balance. [Red S]
different institutions seem to compute this in different ways.
overridable method now includes settlement_fund_balance for use if
needed in importers
use filing_account to file. [Red S]
overridable main_account in banking. [Red S]
overridable currency and date fields in csvreader. [Red S]
remove double quotes from column headers. [Red S]
SCB/UOB: add ‘convert_currencies’ as an option; fix conversions. [Red S]
check for account number in csv files. [Red S]
perform table extraction in csvreader; scbbank now uses it. [Red S]
scb/uob: enable custom header; xls/xlsx: respect header. [Red S]
relax header finding condition in csv. [Red S]
xls reader (previous one was xlsx reader) [Red S]
enable price conversions in banking transaction builder. See #38. [Red S]
default currency to CURRENCY_NOT_CONFIGURED. [Red S]
csvreader now can search for the header. [Red S]
Eliminates the need to skip a set number of head/tail lines.
attempt to get currency from input file instead of config. [Red S]
Improve 401k and Vanguard support [Matt Brown]

Reply all

Reply to author

Forward