I've taken the week off, so I indulged myself and implemented something that's been on my mind for a while, which some of you may find useful.
I direct my own savings via a discount broker, with a strategy not entirely unlike that which Betterment or WealthFront offers for its clients: a broadly diversified portfolio of stocks and bonds, implemented by purchasing ETFs. For instance, see what Betterment uses here:
https://help.betterment.com/hc/en-us/articles/115004258066-What-funds-ETFs-are-in-the-Betterment-Portfolio-Strategy-. I roll my own version of this, mainly because if I were to use one of these services I'd be subject to cross-account wash-sale considerations (so essentially, if you use these services, you have to put all your beans there or invest only in a distinct set of assets outside, which I don't want to do). Now, like most people I have retirement accounts too, which I also invest in ETFs whenever I can, though there are some restrictions and I cannot always use the very same instruments, these types of accounts tend to have a more limited selection of instruments available. My situation is further complicated by a history of having bought nearly identical ETFs from different issuers (e.g. I used iShares before Vanguard came blaring on the market with the lowest fees) which have accumulated capital gains, so I'd rather not convert those and realize the gains just to normalize my portfolio.
In short, I have a bunch of ETFs and a quantity for each. I've been wanting to ask the question: "What is my exposure to stock X?" Many of the ETFs I hold have overlap in their constituents, e.g., many of them will end up containing a small fraction of AAPL or AMZN, for example. One particular reason to do this is that my employer is one of those big companies, and my exposure to it is further compounded by the fact that, like most employees, I regularly vest shares in it. So my total exposure to it includes the vested shares + all the bits and pieces that come from the ETFs. I'd like to know what my risk is (and correspondingly, how much I want to sell to avoid becoming too concentrated).
Now you would think that brokers would offer this service; alas, I'm not aware of any service allowing you to do this easily. Besides, since my single source of truth across all accounts comes for Beancount this wouldn't be terribly useful to me either (everything is unified in my Beancount file). There are a few services online like
etfdb.com and
etf.com which claim to provide a downloadable holdings list (I'm not sure if it's normalized and easily parseable, e.g. do all the rows have tickers?), but they tend to cather to institutions and charge more than I'd want to pay just for some occasional downloads (several $100's / year). However, ETF issuers always provide some way for you to download the detailed list of holdings since they're marketing to the investor, they're trying to convince you to use those as investment vehicles. Like those websites, I think they might charge institutions for data access because they often make it a real pain to scrape those lists of holding (lots of JavaScript with ugly URLs and such, basically not easy at all).
What's more, those downloadable holdings files aren't really looking all the same. Some have tickers, some don't, some use other codes like CUSIP, ISIN, SEDOL, or more likely a combination of those. Some just have the long stock name... some algorithm is needed to associate the constituents between ETFs.
So... introducing: "baskets", an ETF disaggregator. You provide "baskets" with an input file consisting of ETF symbols and the quantity you hold and their prices and a few other things, and it will do two things for you:
1. Control a browser robot to download the list of holdings for each of your ETFs to a local file database. This uses ChromeDriver (Selenium) due to the annoying and difficult nature of downloading the list of holdings directly from the issuers' websites. (With a real Chrome instance you can't really prevent the user from scraping the files.)
2. Parse each of the files and normalize the list of holdings to a common format, and reaggregate the holdings together to provide individual stock exposure, matching similar instruments on ticker, ISIN, SEDOL, CUSIP and similar names.
Here it is:
This is not super well tested code yet, I wrote it in just a couple of days.
Let me know if you find this useful.
(I'd like to eventually write a web interface to it so you can just access a web page and upload your input to view the results.)