New project: baskets

224 views
Skip to first unread message

Martin Blais

unread,
Aug 24, 2018, 12:15:02 AM8/24/18
to Beancount
I've taken the week off, so I indulged myself and implemented something that's been on my mind for a while, which some of you may find useful.

I direct my own savings via a discount broker, with a strategy not entirely unlike that which Betterment or WealthFront offers for its clients: a broadly diversified portfolio of stocks and bonds, implemented by purchasing ETFs. For instance, see what Betterment uses here: https://help.betterment.com/hc/en-us/articles/115004258066-What-funds-ETFs-are-in-the-Betterment-Portfolio-Strategy-.  I roll my own version of this, mainly because if I were to use one of these services I'd be subject to cross-account wash-sale considerations (so essentially, if you use these services, you have to put all your beans there or invest only in a distinct set of assets outside, which I don't want to do). Now, like most people I have retirement accounts too, which I also invest in ETFs whenever I can, though there are some restrictions and I cannot always use the very same instruments, these types of accounts tend to have a more limited selection of instruments available. My situation is further complicated by a history of having bought nearly identical ETFs from different issuers (e.g. I used iShares before Vanguard came blaring on the market with the lowest fees) which have accumulated capital gains, so I'd rather not convert those and realize the gains just to normalize my portfolio.

In short, I have a bunch of ETFs and a quantity for each. I've been wanting to ask the question: "What is my exposure to stock X?" Many of the ETFs I hold have overlap in their constituents, e.g., many of them will end up containing a small fraction of AAPL or AMZN, for example. One particular reason to do this is that my employer is one of those big companies, and my exposure to it is further compounded by the fact that, like most employees, I regularly vest shares in it. So my total exposure to it includes the vested shares + all the bits and pieces that come from the ETFs. I'd like to know what my risk is (and correspondingly, how much I want to sell to avoid becoming too concentrated).

Now you would think that brokers would offer this service; alas, I'm not aware of any service allowing you to do this easily. Besides, since my single source of truth across all accounts comes for Beancount this wouldn't be terribly useful to me either (everything is unified in my Beancount file). There are a few services online like etfdb.com and etf.com which claim to provide a downloadable holdings list (I'm not sure if it's normalized and easily parseable, e.g. do all the rows have tickers?), but they tend to cather to institutions and charge more than I'd want to pay just for some occasional downloads (several $100's / year).  However, ETF issuers always provide some way for you to download the detailed list of holdings since they're marketing to the investor, they're trying to convince you to use those as investment vehicles. Like those websites, I think they might charge institutions for data access because they often make it a real pain to scrape those lists of holding (lots of JavaScript with ugly URLs and such, basically not easy at all).

What's more, those downloadable holdings files aren't really looking all the same. Some have tickers, some don't, some use other codes like CUSIP, ISIN, SEDOL, or more likely a combination of those. Some just have the long stock name... some algorithm is needed to associate the constituents between ETFs.

So... introducing: "baskets", an ETF disaggregator. You provide "baskets" with an input file consisting of ETF symbols and the quantity you hold and their prices and a few other things, and it will do two things for you:

1. Control a browser robot to download the list of holdings for each of your ETFs to a local file database. This uses ChromeDriver (Selenium) due to the annoying and difficult nature of downloading the list of holdings directly from the issuers' websites. (With a real Chrome instance you can't really prevent the user from scraping the files.)

2. Parse each of the files and normalize the list of holdings to a common format, and reaggregate the holdings together to provide individual stock exposure, matching similar instruments on ticker, ISIN, SEDOL, CUSIP and similar names.

Here it is:
This is not super well tested code yet, I wrote it in just a couple of days.
Let me know if you find this useful.

(I'd like to eventually write a web interface to it so you can just access a web page and upload your input to view the results.)


Metin Akat

unread,
Aug 24, 2018, 4:22:45 AM8/24/18
to Beancount
Wow, this is so cool, thank you!

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhNGVEjVF0DrLkHGYxrRMnbW8KUirPk%3Dvo8QyGwGUsyNnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Justus Pendleton

unread,
Aug 29, 2018, 8:49:47 AM8/29/18
to Beancount
On Friday, August 24, 2018 at 11:15:02 AM UTC+7, Martin Blais wrote:
However, ETF issuers always provide some way for you to download the detailed list of holdings since they're marketing to the investor, they're trying to convince you to use those as investment vehicles. Like those websites, I think they might charge institutions for data access because they often make it a real pain to scrape those lists of holding (lots of JavaScript with ugly URLs and such, basically not easy at all).

They aren't charging for access to the data :) 

All ETFs & mutual funds (and hedge funds, for that matter) are required to file their holdings with the SEC on a regular basis in their quarterly N-Q filing ("Quarterly Schedule of portfolio holdings of management investment companies"). All of these are available for free on Edgar (though Edgar is...not easy to use). The format is an XML thing. Here's one for Vanguard: https://www.sec.gov/Archives/edgar/data/36405/000093247118006124/0000932471-18-006124.txt

It isn't perfect for your needs, since, e.g. there is no "ticker symbol" and the company name looks pretty free-form to me. It also isn't exactly "well structured" XML. It is more like HTML tables (including page breaks) stuffed inside of an XML container. But it might be easier than trying to navigate the different provider websites.

Martin Blais

unread,
Aug 29, 2018, 9:37:36 AM8/29/18
to Beancount
Interesting! Thanks for the pointer Justus. I wasn't aware they were required to file the list of holdings in that way.
 

Alen Šiljak

unread,
Apr 23, 2019, 5:54:22 PM4/23/19
to Beancount
I think asset allocation would do something very similar if it was granular enough. Any significant difference to the target allocation would trigger a warning bell.
Reply all
Reply to author
Forward
0 new messages