Importing data from vanguard and other institutions

709 views
Skip to first unread message

redst...@gmail.com

unread,
May 22, 2015, 7:57:35 PM5/22/15
to bean...@googlegroups.com
Hi Martin or anyone who has experience with this: I'm trying to figure out how others are getting the data from their brokerage/investment institutions into Beancount. I'll use Vanguard as an example. Specifically:

1. The import process: Do you use .csv or ofx? With ofx, Vanguard only allows the past 18 months of transactions. So I have screenscraped .csv data, which doesn't have the 18-month limit. I see an incompletely ledgerhub importer. I'm trying to not reinvent the wheel here. Also, to get into the gorier details, are there things to watch out for when importing? I have my own importer cobbled together when I used ledger, but that falls far short of what I'm hoping beancount will help me accomplish.

2. I've used Specific ID of shares. So I'm mainly interested in automatic lot matching. Is this supported already? The documentation says to watch out for "upcoming changes." Even if it is not supported, what's the best practice on getting my data in, in such a way that a) when support arrives, everything will work, and b) so I don't leave out any data from the source, and get everything into Beancount so I don't have to redo this?

Thanks in advance!

Martin Blais

unread,
May 23, 2015, 10:44:08 AM5/23/15
to bean...@googlegroups.com
On Fri, May 22, 2015 at 7:57 PM, <redst...@gmail.com> wrote:
Hi Martin or anyone who has experience with this: I'm trying to figure out how others are getting the data from their brokerage/investment institutions into Beancount. I'll use Vanguard as an example.



Specifically:

1. The import process: Do you use .csv or ofx?

ledgerhub.importers.generic.ofx_invest

History, Statements & Forms -> Download -> Select: Quicken: Each fund to a separate account -> 3 months -> all accounts
Click on Download

This importer really should be renamed to vanguard something something because it has so much Vanguard-specific stuff it's not really generic at all anymore.


 
With ofx, Vanguard only allows the past 18 months of transactions. So I have screenscraped .csv data, which doesn't have the 18-month limit. I see an incompletely ledgerhub importer.

I started writing a CSV importer at some point but I stopped it, I think, when I realized the OFX download had a higher precision. I don't quite remember.

 
I'm trying to not reinvent the wheel here. Also, to get into the gorier details, are there things to watch out for when importing?

Importer code is always messy IMO. There's always a little something here and there to watch out for. I always eyeball the result of my import and cut-n-paste the extracted transactions manually from the output of the importer to my input file. Some of my importers require some small manual fixes; at the moment I think the Vanguard ofx_invest one does require renaming accounts for dividends (to subaccounts for match401k/pre-tax 401k/aftertax-401k/rollover, etc.. This depends on your configuration. Best is to try it and see what works.

I know that the importer code is messy; I'm focusing all my efforts on Beancount to make that perfect and ledgerhub is just something I share in case it's useful to others-I need to add tests and make that solid someday. So much to do...


 
I have my own importer cobbled together when I used ledger, but that falls far short of what I'm hoping beancount will help me accomplish.

In which way? Just curious, what do you envision?

I think the LedgerHub common code (i.e., not the importers) could be reused quite well, that part is fairly clean. It's just the importer implementations that are messy. If you want you could have your own (similarly messy) importer implementations that get called by ledgerhub-extract, ledgerhub-file, etc.



2. I've used Specific ID of shares. So I'm mainly interested in automatic lot matching. Is this supported already? The documentation says to watch out for "upcoming changes." Even if it is not supported, what's the best practice on getting my data in, in such a way that a) when support arrives, everything will work, and b) so I don't leave out any data from the source, and get everything into Beancount so I don't have to redo this?

See the link above for the thread on this.
I'm using the "NONE" booking method for these accounts at the moment, so it works just like Ledger (no checks on cost basis).
I have balance checks on the number of shares only.
All the data is in there, only minor syntax changes will be required when the feature becomes ready.

About "automatic lot matching", I'll take it to mean FIFO or LIFO, etc. but that's not implemented yet. 401k accounts typically use average cost basis (because there are no tax implications) and that isn't implemented (proposal: http://furius.ca/beancount/doc/proposal-booking). If you refer to Roth or After-tax accounts, you can use the regular method where you provide the cost basis when you sell lots.



Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/2a0eab02-bd1f-4500-addc-5c4c09b30844%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

redst...@gmail.com

unread,
May 23, 2015, 8:39:49 PM5/23/15
to bean...@googlegroups.com
Hi Martin,
Thanks a bunch for all the pointers! Some follow up responses:


I know that the importer code is messy; I'm focusing all my efforts on Beancount to make that perfect and ledgerhub is just something I share in case it's useful to others-I need to add tests and make that solid someday. So much to do...

It's definitely very useful, so I appreciate your sharing it, in its current state! I have a whole bunch of import scripts that I used (with ledger) that have gotten quite out of hand in the past couple years. I'd think this is the case for most folks. Ledgerhub is a much better idea than everyone maintaining their own homegrown scripts, and I hope it catches on everyone uses it, and more importantly, contributes to it.

I have my own importer cobbled together when I used ledger, but that falls far short of what I'm hoping beancount will help me accomplish.

In which way? Just curious, what do you envision?

To begin with, inventory/lot tracking. Basically, a lot of what you've written in your inventory tracking proposal. Eventually:
- matching tax lots with my filings, for extra verification
- computing annual returns on any period on any subset of investments, so I can analyze my investments better. Investments typically end up spread across several institutions, and to be able to analyze the whole of it would be quite valuable. You might remember a thread of mine on ledger-cli that you responded to - when I'd asked a related question a few months ago, which got me started on taking a serious look at beancount :)
- and several other investment analyses - will list these out at some point later

2. I've used Specific ID of shares. So I'm mainly interested in automatic lot matching. Is this supported already? The documentation says to watch out for "upcoming changes." Even if it is not supported, what's the best practice on getting my data in, in such a way that a) when support arrives, everything will work, and b) so I don't leave out any data from the source, and get everything into Beancount so I don't have to redo this?

See the link above for the thread on this.
I'm using the "NONE" booking method for these accounts at the moment, so it works just like Ledger (no checks on cost basis).
I have balance checks on the number of shares only.
All the data is in there, only minor syntax changes will be required when the feature becomes ready.

About "automatic lot matching", I'll take it to mean FIFO or LIFO, etc. but that's not implemented yet. 401k accounts typically use average cost basis (because there are no tax implications) and that isn't implemented (proposal: http://furius.ca/beancount/doc/proposal-booking). If you refer to Roth or After-tax accounts, you can use the regular method where you provide the cost basis when you sell lots.

Thanks for the references. Regarding automatic lot matching, I was referring to Specific ID. I think you call it STRICT in your documentation. When I say automatic, I mean, not having to provide dates, but only providing cost bases, for cases that are disambiguous (99% of my cases). From what I understand, I currently have to provide the date, redundantly, when booking a purchase, so I can refer to it when selling. If that's not done, I understand that my other option is to use NONE, and devolve to what Ledger would do, which I would rather not do.

So for this type lot matching, what do you recommend I do?

Thanks again for always answering all my questions with excellent detail and pointers! I hope my contribution:question ratio will start getting better soon :).

Martin Blais

unread,
May 23, 2015, 8:49:14 PM5/23/15
to bean...@googlegroups.com
On Sat, May 23, 2015 at 8:39 PM, <redst...@gmail.com> wrote:
About "automatic lot matching", I'll take it to mean FIFO or LIFO, etc. but that's not implemented yet. 401k accounts typically use average cost basis (because there are no tax implications) and that isn't implemented (proposal: http://furius.ca/beancount/doc/proposal-booking). If you refer to Roth or After-tax accounts, you can use the regular method where you provide the cost basis when you sell lots.

Thanks for the references. Regarding automatic lot matching, I was referring to Specific ID. I think you call it STRICT in your documentation. When I say automatic, I mean, not having to provide dates, but only providing cost bases, for cases that are disambiguous (99% of my cases). From what I understand, I currently have to provide the date, redundantly, when booking a purchase, so I can refer to it when selling. If that's not done, I understand that my other option is to use NONE, and devolve to what Ledger would do, which I would rather not do.

So for this type lot matching, what do you recommend I do?

Currently, if you provide the date in, you have to provide the date out. Of course, this is not super sensible... it needs redoing. I did it this way originally because of details of my previous implementation. It's wrong.

In the future that will be improved a lot: upon _augmenting_ a lot, the date of the transaction will always be associated with the lot, automatically. On a lot _reduction_, the data provided (cost basis, date, and/or label) will _only_ be used to disambiguate between the available lots. This means that if you have only a single lot, you could provide no data and it would book correctly. The way FIFO and LIFO booking methods will be integrated is simple: they will only get invoked in cases of ambiguity.

For now, what I would do if I were you, is just use the cost basis everywhere.
Only where there are ambiguous cases--same cost basis on different dates--should you use the date. I doubt you have many of these... it's very unlikely, prices move all the time.


llpa...@gmail.com

unread,
Apr 22, 2018, 11:02:43 AM4/22/18
to Beancount
Martin,

Currently, where is the "ledgerhub.importers.generic.ofx_invest" importer, or anything that substitutes it for Vanguard?

Thanks,
Lluis

Martin Blais

unread,
Apr 22, 2018, 5:21:49 PM4/22/18
to Beancount
Ledgerhub is a retired project.

I have 3 importers that support Vanguard for OFX, CSV and PDF (for filing only).
I don't maintain these externally by default, but if you want I can share a copy of them.
(Warning: it's not great code, and it depends a little bit on the structure of how accounts.)




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages