for newbies, those hacking on a csv converter

177 views
Skip to first unread message

fin

unread,
Jan 28, 2023, 12:09:58 PM1/28/23
to bean...@googlegroups.com
i'm working my way through a tdaameritrade_csv converter
(not pretty yet by far), but my lack of experience with
beancount, ledgers and the importers meant i had to wade
through some things to get to the point where i could get
output from my initial stab at a converter.

the biggest issue is that there is no feedback when
you make an error that prevents anything from loading
at all. so you are not sure if you screwed up something
basic or if your importer isn't working at all.

if you specify a pattern that doesn't match the input
file enough that bean-* can find it then you get nothing
back which says "no file found" you just get back this:


**** /home/me/fin/beancount/t.csv
;; -*- mode: beancount -*-
**** /home/me/fin/beancount/t.csv
;; -*- mode: beancount -*-

to those who know what this means that's fine, but for
someone just kicking the tires and trying things out it
doesn't mean anything at all.

so one thing i did was to add print statements to the
csvreader.py but in the end the most "keep my changes in
one place" way was to include the following in my __init__.py
(which is taken from csvreader.py).

=====

def read_file(self, file):
if not self.file_read_done:
rdr = self.read_raw(file)
print ("Version 0.0.5 \n After read_raw file : \n", rdr)
rdr = rdr.skip(getattr(self, 'skip_head_rows', 0)) # chop unwanted header rows
rdr = rdr.head(len(rdr) - getattr(self, 'skip_tail_rows', 0) - 1) # chop unwanted footer rows

if hasattr(self, 'skip_comments'):
rdr = rdr.skipcomments(self.skip_comments)
rdr = rdr.rowslice(getattr(self, 'skip_data_rows', 0), None)
rdr = self.prepare_raw_columns(rdr)
print ("After prepare_raw_columns : \n", rdr)
rdr = rdr.rename(self.header_map)
print ("After rename : \n", rdr)
rdr = self.convert_columns(rdr)
print ("After convert_columns : \n", rdr)
rdr = self.prepare_processed_columns(rdr)
print ("After prepare_processed_columns : \n", rdr)
self.rdr = rdr
self.ifile = file
self.file_read_done = True

=====

which then told me that i wasn't even getting to the reading
part at all. so i was able then to finally go back and find
my mistake (not matching the file header properly).


so then i was able to start getting output that looked like:

**** /home/me/fin/beancount/test-trans.csv
Importer: beancount_reds_importers.tdameritrade_csv.Importer
Account: Assets:SB:TDA:MM

Version 0.0.5
After read_raw file :
+------------+----------------+-------------------------------------------+----
------+--------+-------+------------+---------+---------+--------------------+--
-------------------+------------------------+
| DATE | TRANSACTION ID | DESCRIPTION | QUAN
TITY | SYMBOL | PRICE | COMMISSION | AMOUNT | REG FEE | SHORT-TERM RDM FEE | FU
ND REDEMPTION FEE | DEFERRED SALES CHARGE |
+============+================+===========================================+=====
=====+========+=======+============+=========+=========+====================+===
==================+========================+
| 01/02/2000 | 123456 | CHECK (WRITTEN AGAINST BROKERAGE ACCOUNT) | | | | | -123.12 | | | | |
+------------+----------------+-------------------------------------------+----------+--------+-------+------------+---------+---------+--------------------+---------------------+------------------------+
...
==========


one other gotcha that took me awhile to figure out was:

# header looks like: DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL,PRICE,COMMISSION,AMOUNT,REG FEE,SHORT-TERM RDM FEE,FUND REDEMPTION FEE, DEFERRED SALES CHARGE
# note that space on the beginning of that last field...

which then needed the following to work:

" DEFERRED SALES CHARGE":'defslschrg',


which is great! :) progress...

i have a long ways to go in understanding but at least now i can
see what my initial version is doing and keep poking around more
to figure out more.


fin

fin

unread,
Jan 28, 2023, 12:24:57 PM1/28/23
to bean...@googlegroups.com
fin wrote:
> i'm working my way through a tdaameritrade_csv converter
...

i wasn't clear on the context but i'm using reds importer
framework for the moment. also enjoying playing around with
petl in general. :)


fin

Martin Blais

unread,
Jan 28, 2023, 12:26:11 PM1/28/23
to bean...@googlegroups.com
I just wanted to chip in some context to importing from TD. I spent a significant amount of time in the past getting data from TD to a normalized form for importing into systems (including Beancount, but mostly for Johnny, which is better suited to mark P/L on complex trades - https://github.com/beancount/johnny). In short, what I've learned is

- The most complete way to get data out of the system is via the thinkorswim platform's Account Statement tab.
- Despite this, you still need to join multiple tables in order to resolve all the infos needed (prices, fees, etc.), and you need to configure the platform to include certain columns
- For marking open positions, a separate download of the Activity and Positions tab - also suitably customized - is necessary. 
- Parsing descriptions to extract some of the infos was necessary to get all the required bits and pieces.
- In Johnny I normalize inputs from all brokers to two tables: a table of transactions  https://github.com/beancount/johnny/blob/master/johnny/base/transactions.md  and a table of positions https://github.com/beancount/johnny/blob/master/johnny/base/positions.md
- The downloads do not provide sufficient information about contract multipliers and futures and options expiration dates (e.g. for futures and futures options) so you end up having to store that elsewhere (I built this project for this purpose: https://github.com/blais/mulmat).  In the ideal world there would exist some sort of open source instrument master database for retail people doing more complex things than buying mutual funds, but AFAIK this doesn't exist.

If all you use is equities and a cash account, you're possibly okay with other sources of downloads. However, if you're using futures or options, you're likely to run into similar issues. You can see the (messy) parsing code here: https://github.com/beancount/johnny/blob/ma]ster/johnny/sources/thinkorswim_csv/transactions.py and here: https://github.com/beancount/johnny/blob/master/johnny/sources/thinkorswim_csv/positions.py)
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/7f1gaj-je6.ln1%40anthive.com.

Martin Blais

unread,
Jan 28, 2023, 12:26:50 PM1/28/23
to bean...@googlegroups.com
FYI there's also Polars
I'd love to see an in-depth comparison at some point.


 


  fin

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

fin

unread,
Jan 28, 2023, 12:59:11 PM1/28/23
to bean...@googlegroups.com
Martin Blais wrote:
...
> FYI there's also Polars
> https://www.pola.rs/
> I'd love to see an in-depth comparison at some point.

it won't be from me as i've barely gotten going with
Python. i don't need to learn yet another language...
as i get older it gets harder. :)

my first language was Fortran IV, 2nd was Pascal, 3rd
was assembler for the Univac 1100, 4the was C, ...
probably 50 to 100 others since then in bits and pieces.
it's hard to determine how to count them all. ;)

oddest one when i first encountered it was SNOBOL.


fin

fin

unread,
Jan 28, 2023, 1:00:05 PM1/28/23
to bean...@googlegroups.com
Martin Blais wrote:
...
> I just wanted to chip in some context to importing from TD. I spent a
> significant amount of time in the past getting data from TD to a normalized
> form for importing into systems (including Beancount, but mostly for
> Johnny, which is better suited to mark P/L on complex trades -
> https://github.com/beancount/johnny). In short, what I've learned is
>
> - The most complete way to get data out of the system is via the
> thinkorswim platform's Account Statement tab.

yes, but i don't have that nor do i plan on going there right
now. i just want to get the past history files i have imported.
so i'm writing a csv importer for that file format (their regular
account csv format which is probably not like TOS).

it is not complete or easy to work with, but i should be able
to wrangle it somehow.


...details snipped, thanks... :)
> If all you use is equities and a cash account, you're possibly okay with
> other sources of downloads. However, if you're using futures or options,
> you're likely to run into similar issues. You can see the (messy) parsing
> code here:
> https://github.com/beancount/johnny/blob/ma]ster/johnny/sources/thinkorswim_csv/transactions.py
> and here:
> https://github.com/beancount/johnny/blob/master/johnny/sources/thinkorswim_csv/positions.py
> )

i'm not using futures or options so that makes things simpler.
no margin stuff, nothing crazy, just simple trades. i did trade
using lots at first because it made it easier for my own record
keeping but now i'm just going with the account setting (FIFO i
think, but since i hardly ever sell i'd have to check).

now the harder part is the various Money Market accounts that
they move money between, first there is the cash account and
then the sweep accounts.

it looks to me like they put the transactions in the account
in the opposite order of what i'd expect, so they are grouped
together by one day or a few days before you can reconcile between
them all but that is like any other transaction where you
sometimes have a lag between when you book it and when it hits
the other side.


fin

Red S

unread,
Jan 28, 2023, 4:48:57 PM1/28/23
to Beancount
the biggest issue is that there is no feedback when
you make an error that prevents anything from loading
at all. so you are not sure if you screwed up something
basic or if your importer isn't working at all.

I agree, it would be nice to have a `--debug` or even `--verbose` flag to bean-extract or beangulp (neither of which I maintain) which would be passed down to the importer code, and inform you as to why something is not matching.

# header looks like: DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL,PRICE,COMMISSION,AMOUNT,REG FEE,SHORT-TERM RDM FEE,FUND REDEMPTION FEE, DEFERRED SALES CHARGE
# note that space on the beginning of that last field...

which then needed the following to work:

" DEFERRED SALES CHARGE":'defslschrg',

Yes, if one hasn't worked with csvs, this can be a tripping point. If you open the .csv in a plain text editor which displays the characters as they are (which I recommend), this will be apparent. Most spreadsheet editors and such won't show this.

Red S

unread,
Jan 28, 2023, 4:55:15 PM1/28/23
to Beancount
FYI there's also Polars
I'd love to see an in-depth comparison at some point.

Looked at this a while ago, though not in depth. What would be a game changer over petl for my purposes (reds-importers for csv import) is a high-level library that can automatically recognize and extract table structures of multiple tables in a single csv. This would be a step in the direction of an automatic, universal, csv importer. Polars doesn't seem to do this.

Its main feature seems to be performance. And it does have a couple of small niceties like automatic date parsing. Neither warrants moving to it over petl---for my purposes.

My two cents.

fin

unread,
Jan 28, 2023, 7:16:09 PM1/28/23
to bean...@googlegroups.com
Red S wrote:

...
> Yes, if one hasn't worked with csvs, this can be a tripping point. If you
> open the .csv in a plain text editor which displays the characters as they
> are (which I recommend), this will be apparent. Most spreadsheet editors
> and such won't show this.

i'm a bit visually challenged at times but normally i would
notice it as my font and editor honor monospaced fonts (which
i use out of preference for coding data oriented things), but
for some reason it didn't kick in until i was able to dump the
headers and then it stood out a lot more.


to change topics a little bit... :)

is there a way to get the match to for 'type' to work on a
substring starting the line instead of having to match the whole
string? exit at the first match found. i do not care about
performance or how fast it works.

if you make your list then using the longer lines first then
it will do what you need for it to do and not mean having to
special case each and every type of description.


for example:


i have currently:

'CASH ALTERNATIVES DIVIDENDS (12345)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS (ABC45)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS (123)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS': 'dividends',


when it would be more concise if i could write it:
'CASH ALTERNATIVES DIVIDENDS (': 'dividends-ticker',
'CASH ALTERNATIVES DIVIDENDS': 'dividends-no-ticker',


this is only one type but there are many more that work the
same way and having to write a new line for each variant is
making this go a lot longer.

i've been trying to write this myself but so far i don't get
what i need to do for my convert function for 'type' it's not
as simple as the other examples.

however it's quite possible i'm missing some other approach as
my python is weaker than i'd like. i learn by doing, i get as
far as i can and then hit something which makes me go back and
try other things or other approaches until it clicks. reading
docs on-line is really bad on your screen if you have limited
room (i need a pretty big font, so if people are putting menus
on the side and links which pop up every time you move your
mouse around you have to keep panning text in windows left and
right to see what they mean and ... ggrr!).

i'll get over it eventually. :)


fin

Red S

unread,
Jan 28, 2023, 10:32:01 PM1/28/23
to Beancount
to change topics a little bit... :)

is there a way to get the match to for 'type' to work on a
substring starting the line instead of having to match the whole
string? exit at the first match found. i do not care about
performance or how fast it works.

if you make your list then using the longer lines first then
it will do what you need for it to do and not mean having to
special case each and every type of description.


for example:


i have currently:

'CASH ALTERNATIVES DIVIDENDS (12345)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS (ABC45)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS (123)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS': 'dividends',


when it would be more concise if i could write it:
'CASH ALTERNATIVES DIVIDENDS (': 'dividends-ticker',
'CASH ALTERNATIVES DIVIDENDS': 'dividends-no-ticker',


this is only one type but there are many more that work the
same way and having to write a new line for each variant is
making this go a lot longer.

 Yes! A contributor just ran into the same situation a couple days ago and used a neat solution using petl.capture(). See here. You can split your description field into the type for the first 3 words (if that's valid), and then use the remaining for the memo (or throw it away). It'll look something like this (untested code!):

rdr = rdr.capture('Description', '(?:\\s)(?:\\w*)(.*)', ['type', 'memo'])

More info is in the petl reference.

fin

unread,
Jan 29, 2023, 12:26:09 AM1/29/23
to bean...@googlegroups.com
Red S wrote:
...
> this is only one type but there are many more that work the
> same way and having to write a new line for each variant is
> making this go a lot longer.
>
> Yes! A contributor just ran into the same situation a couple days ago and
> used a neat solution using petl.capture(). See here
><https://github.com/redstreet/beancount_reds_importers/blob/dabcb455b6a4fc776b2fcfea24e64a41ca819acb/beancount_reds_importers/importers/fidelity/fidelity_cma_csv.py#L44>.
> You can split your description field into the type for the first 3 words
> (if that's valid), and then use the remaining for the memo (or throw it
> away). It'll look something like this (untested code!):
>
> rdr = rdr.capture('Description', '(?:\\s)(?:\\w*)(.*)', ['type', 'memo'])
>
> More info is in the petl reference
><https://petl.readthedocs.io/en/stable/transform.html>.

awesome! thank you! i'll look at it tomorrow when i have more
brain cells and time. :)

i knew i read something about that two days ago in my first read
through of their docs, but their documention is hard to scan easily.
it's good compared to some but it should be broken into smaller
parts. and they definitely need a way to hide the left side to
expand the text you can see. it's not a very good design. as i
also hate popups on mousing over something (which github does and
it sucks when things get in the way of what you're trying to read)...
ok my rant for the evening is over... :)

thanks again!


fin

fin

unread,
Jan 30, 2023, 6:50:54 PM1/30/23
to bean...@googlegroups.com
Red S wrote:
...
> You can split your description field into the type for the first 3 words
> (if that's valid), and then use the remaining for the memo (or throw it
> away). It'll look something like this (untested code!):
>
> rdr = rdr.capture('Description', '(?:\\s)(?:\\w*)(.*)', ['type', 'memo'])
...

in the end that approach didn't do what i needed for it to do
so i ended up using the following:

rdr = rdr.sub('DESCRIPTION',
'^Bought.*', 'Bought', count=1)
rdr = rdr.sub('DESCRIPTION',
'^CASH ALTERNATIVES DIVIDENDS \(.*',
'CASH ALTERNATIVES DIVIDENDS (ticker)', count=1)


and so the lines up above in the map becomes the more simple:

'Bought': 'buystock',
'CASH ALTERNATIVES DIVIDENDS (ticker)': 'dividends',
'CASH ALTERNATIVES DIVIDENDS': 'dividends',


the DESCRIPTION field/column is my type.

and that approach let me get all the lines in my files scanned,
the output is not right and looks horrible, but at least they all
scanned without errors. :) [it's my first batch of about 20 files
some of which i typed in a long time ago and others that were
downloaded as soon as i figured out how to get csv files. i have
a bunch of other records that i've typed into a different format
that i will eventually want to massage and get fed into my ledger
for historical information but that's not going to happen for
some time yet.]

since there are many things i do not do (i don't do options or
margins or mutual funds) so i'm sure there are many transaction
types i've not seen in my history so those are going to have to
be figured out by someone else sometime. but i have a working
framework now.


on to the next question!

i copy the DESCRIPTION to another column but how do i get
fields put in the transaction description? currently it looks
like:

2001-10-29 * "transfer" "[MMDA1] TDAmeritrade MMF"
Assets:SB:TDA:MMDA1 1.51 MMDA1
Assets:SB:TDA:Pass-Through -1.51 MMDA1

2001-10-29 * "transfer" "transfer"
Assets:SB:TDA:CASH 0 USD
Assets:SB:TDA:Pass-Through -0 USD

2001-10-29 * "income" "[MMDA1] TDAmeritrade Money Market Fund"
Assets:SB:TDA:CASH 4.54 USD
Income:SB:TDA:MM:Interest:MMDA1 -4.54 USD

i know i'm missing some other things which aren't working quite
right yet, but just to get the information put in the ledger
as metadata but also on the description part of the line would
be good. it would help me further along, before i start digging
into some other things and probably easier on my brain tonight. :)

once i get a cleaner version with more bugs sorted out i'll
post a link to it.


fin

Red S

unread,
Jan 30, 2023, 10:17:30 PM1/30/23
to Beancount
on to the next question!

i copy the DESCRIPTION to another column but how do i get
fields put in the transaction description? currently it looks
like:

2001-10-29 * "transfer" "[MMDA1] TDAmeritrade MMF"
Assets:SB:TDA:MMDA1 1.51 MMDA1
Assets:SB:TDA:Pass-Through -1.51 MMDA1

2001-10-29 * "transfer" "transfer"
Assets:SB:TDA:CASH 0 USD
Assets:SB:TDA:Pass-Through -0 USD

2001-10-29 * "income" "[MMDA1] TDAmeritrade Money Market Fund"
Assets:SB:TDA:CASH 4.54 USD
Income:SB:TDA:MM:Interest:MMDA1 -4.54 USD

i know i'm missing some other things which aren't working quite
right yet, but just to get the information put in the ledger
as metadata but also on the description part of the line would
be good. it would help me further along, before i start digging
into some other things and probably easier on my brain tonight. :)

I'm not sure I understand. What is the current output, and what would you like it to be?

fin

unread,
Jan 31, 2023, 12:28:09 AM1/31/23
to bean...@googlegroups.com
Red S wrote:
...
> I'm not sure I understand. What is the current output, and what would you
> like it to be?

ok, i get sidetracked sometimes... :)

thanks for getting this far!


the fields are originally fed in as:

DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL,PRICE,COMMISSION,AMOUNT,REG FEE,SHORT-TERM RDM FEE,FUND REDEMPTION FEE, DEFERRED SALES CHARGE
12/29/2000,1,CASH ALTERNATIVES DIVIDENDS (ACCTNUMBER1),,,,,1.51,,,,
12/29/2000,2,CASH ALTERNATIVES PURCHASE (MMSWEEP1),1.51,MMSWEEP1,,,0.00,,,,
12/29/2000,3,CASH ALTERNATIVES PURCHASE,,,,,-1.51,,,,
12/29/2000,4,CASH ALTERNATIVES PURCHASE (MMSWEEP1),4.54,MMSWEEP1,,,0.00,,,,
12/29/2000,5,CASH ALTERNATIVES PURCHASE,,,,,-4.54,,,,
12/29/2000,6,CASH ALTERNATIVES INTEREST (MMSWEEP1),,MMSWEEP1,,,4.54,,,,
***END OF FILE***


here's the current output:

;; -*- mode: beancount -*-
**** /home/me/fin/beancount/testing/test-trans.csv

2000-12-29 * "dividends" "dividends"
Assets:SB:TDA:CASH 1.51 USD
Income:SB:TDA:MM:Dividends:USD -1.51 USD

2000-12-29 * "transfer" "[MMSWEEP1] TDAmeritrade Sweep ACCT 1"
Assets:SB:TDA:MMSWEEP1 1.51 MMSWEEP1
Assets:SB:TDA:Pass-Through -1.51 MMSWEEP1

2000-12-29 * "transfer" "transfer"
Assets:SB:TDA:CASH 0 USD
Assets:SB:TDA:Pass-Through -0 USD

2000-12-29 * "transfer" "[MMSWEEP1] TDAmeritrade Sweep ACCT 1"
Assets:SB:TDA:MMSWEEP1 4.54 MMSWEEP1
Assets:SB:TDA:Pass-Through -4.54 MMSWEEP1

2000-12-29 * "transfer" "transfer"
Assets:SB:TDA:CASH 0 USD
Assets:SB:TDA:Pass-Through -0 USD

2000-12-29 * "income" "[MMSWEEP1] TDAmeritrade Sweep ACCT 1"
Assets:SB:TDA:CASH 4.54 USD
Income:SB:TDA:MM:Interest:MMSWEEP1 -4.54 USD


**** /home/me/fin/beancount/testing/test-trans.csv
;; -*- mode: beancount -*-


i'd like to take the TRANSACTION ID and DESCRIPTION fields
and put them on one line as a comment or something and then
i'd like the entire contents of the input data for that line
put on the next line.


so that this:

2000-12-29 * "transfer" "[MMSWEEP1] TDAmeritrade Sweep ACCT 1"
Assets:SB:TDA:MMSWEEP1 4.54 MMSWEEP1
Assets:SB:TDA:Pass-Through -4.54 MMSWEEP1


ends up looking something like:

2000-12-29 * "transfer" "[MMSWEEP1] TDAmeritrade Sweep ACCT 1"
;
; TrID Desc: "4 CASH ALTERNATIVES PURCHASE (MMSWEEP1)"
;
; Raw Data: "12/29/2000,4,CASH ALTERNATIVES PURCHASE (MMSWEEP1),4.54,MMSWEEP1,,,0.00,,,,"
;
Assets:SB:TDA:MMSWEEP1 4.54 MMSWEEP1
Assets:SB:TDA:Pass-Through -4.54 MMSWEEP1


fin

Red S

unread,
Jan 31, 2023, 2:24:12 AM1/31/23
to Beancount
Ah okay, this is best done via metadata rather than comments.

Simply define a build_metadata() in your importer, and return a dictionary of metadata values you want for each transaction. It should just be a line or two of code. See build_metadata() in investments.py, which you will override.

fin

unread,
Jan 31, 2023, 8:47:35 AM1/31/23
to bean...@googlegroups.com
Red S wrote:
...
> Ah okay, this is best done via metadata rather than comments.
>
> Simply define a build_metadata() in your importer, and return a dictionary
> of metadata values you want for each transaction. It should just be a line
> or two of code. See build_metadata() in investments.py, which you will
> override.


ok, thanks for the pointer! :)


fin

fin

unread,
Feb 2, 2023, 6:58:06 PM2/2/23
to bean...@googlegroups.com
fin wrote:
...
> ok, thanks for the pointer! :)


i didn't make much progress on the meta part or a few
other things i tried out so this so i've uploaded my
temporary working version to my github repository at:

https://github.com/flowerbug/beancount_reds_importers/tree/next-flowerbug

i now have to put a lot of energy into getting ready for an
actual in person event that will tie me up for several weeks
but i plan on coming back to this after Feb 25th.

it's a start that matches all the lines in my data files
but i know there are many things i don't do which i would
never see (trading options, funds or on margin) as i've
always kept it simple.

so that is where it's at for now. :)

i don't expect this to be merged into the main archive you're
keeping because it isn't a final good enough version but just in
case something happens at least my initial efforts are there for
someone else to work from.


fin

Reply all
Reply to author
Forward
0 new messages