beancount OFX; investment inventory

Christopher Singley

unread,

Jan 5, 2017, 5:09:22 PM1/5/17

to Beancount

Hi, I'm writing some accounting applications in Python, and thereby stumbled upon beancount.

It occurs to me that you might be interested in my OFX parser:

https://github.com/csingley/ofxtools

It handles a pretty decent subset of the OFX spec, it handles OFX versions 1 & 2, and it has no dependencies beyond stdlib.

I'm investigating using beancount as an accounting engine...human-readable text files are great.

The main focus of my present work is booking investment transactions from data downloads, and from reading your document "Trading with Beancount" I'm wondering if it's a good fit.

Have you ever solved the problem of preserving lot opening dates through various reorganizations (splits, mergers, spinoffs, basis reductions, etc.)? In general, assets can be booked in with an opening date different than the transaction date, and it doesn't appear that beancount handles that.

Thanks,
Chris

Jason Chu

unread,

Jan 5, 2017, 5:20:23 PM1/5/17

to Beancount

An OFX parser already exists as part of beancount.ingest (https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/src/python/beancount/ingest/importers/ofx.py).

The original design for importing into beancount is the LegerHub doc (https://docs.google.com/document/d/11u1sWv7H7Ykbc7ayS4M9V3yKqcuTY7LJ3n1tgnEN2Hk/edit#) and the subsequent postmortem and explanation for the beancount ingest is here (https://docs.google.com/document/d/1Bln8Zo11Cvez2rdEgpnM-oBHC1B6uPC18Qm7ulobolM/edit).

I get the feeling with the new booking branch that the lot opening date problem has been solved, but I'm sure Martin can explain it better than I can.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/af40974c-cd6f-4155-a4ea-5d533d4f47b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Singley

unread,

Jan 5, 2017, 6:27:03 PM1/5/17

to Beancount

beancount.ingest.importers.ofx doesn't handle INVSTMTRS at all, does it? Just STMTRS and CCSTMTRS.

What I need is an importer that translates OFX INVSTMTRS to beancount transaction entries. If lot inception dates have been decoupled from transaction dates, I should probably be able to write one without too much trouble using my OFX parser.

Thanks for the links to the discussion docs.

Martin Blais

unread,

Jan 6, 2017, 4:55:17 AM1/6/17

to Beancount

On Thu, Jan 5, 2017 at 12:09 PM, Christopher Singley <ch...@singleys.com> wrote:

Hi, I'm writing some accounting applications in Python, and thereby stumbled upon beancount.

It occurs to me that you might be interested in my OFX parser:

https://github.com/csingley/ofxtools

It handles a pretty decent subset of the OFX spec, it handles OFX versions 1 & 2, and it has no dependencies beyond stdlib.

Thanks for the link.

Looks nice.

One of the example importers in beancount.ingest does OFX. However, it's not intended to be a full-fledged OFX thing... it's really just a good example of how to build a custom importer with some basic and admittedly sloppy OFX parsing. Ideally people would use something like your library.

I'm investigating using beancount as an accounting engine...human-readable text files are great.

The main focus of my present work is booking investment transactions from data downloads, and from reading your document "Trading with Beancount" I'm wondering if it's a good fit.

It's good for personal finance level work. If you're a day trader and are pumping 20+ trades/day, you'll probably want to make something more specialized, that takes into account time and that has specialized P/L reporting. As an example, I use it to track my own portfolio with perhaps 20 trades/month, across 3-4 accounts, some pre-tax/retirement, some active, including occasional currency trading/hedging, so not "active" but also a fair bit more than most people's retirement-only activity.

Have you ever solved the problem of preserving lot opening dates through various reorganizations (splits, mergers, spinoffs, basis reductions, etc.)? In general, assets can be booked in with an opening date different than the transaction date, and it doesn't appear that beancount handles that.

Yes.

I've done splits and cost basis adjustments, no problem.

Right now the way to do this is to empty the lot (like you're selling it) and in the same transaction replace it on another posting.

The replacement lot can accept a date override, which gets carried on the new lot.

You put the date in the {...} cost specifier.

There's no check that the date matches the other lot... you could put any date you want, so be careful.

Otherwise, by default, an augmenting lot acquires the date of its transaction.

For splits, the current methodology is to keep the same symbol. An alternative would be to automatically generate appropriate symbols internally and have some sort of syntax to indicate that the symbol's "version number" has moved on, but I haven't done work on that yet, nor do I feel it's a great hindrance so far. The price database does, on the other hand, have wonky kinks when that happens.

I don't know about mergers and spinoffs, but I presume the same technique would work.

Hope this helps,

Thanks,
Chris

--
You received this message because you are subscribed to the Google Groups "Beancount" group.

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

Martin Blais

unread,

Jan 6, 2017, 4:58:26 AM1/6/17

to Beancount

As mentioned earlier, my importer has no pretension of being exhaustive or compliant with "the standard." It's a piece of code I've used over the years that has evolved and served me well, however. Importers are generally rather unsavory code, to be honest. As long as they work on the subset of input they tend to see for one's account, everyone's happy, they do their job.

Given your position in having worked with OFX, I'd reuse your library if I were you. If you want, you could even provide a good clean example of importing all possible OFX type of inputs, I think some people on the list might find a comprehensive importer useful.

Thanks Christopher,

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/9fbd62cb-f1a8-49a9-8ac5-1f7ed8959f27%40googlegroups.com.

Christopher Singley

unread,

Jan 6, 2017, 5:17:35 AM1/6/17

to Beancount

Thanks Martin. Yeah I need datetime not just date type, which is of course not terribly compatible with the mandate for human-friendly input format.

I'm browsing through beancount.core.inventory.add_amount(), which seems to assume that the units have pre-matched against the positions in inventory.... if you specify cost=None it'll just operate on the first item in the list without further iteration, yes? Does this matching happen elsewhere in the code, or is it a straight pass-through from JEs listed in a text file?

Thanks,
Chris

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,

Jan 6, 2017, 5:18:49 AM1/6/17

to Beancount

On Thu, Jan 5, 2017 at 11:54 PM, Martin Blais <bl...@furius.ca> wrote:

On Thu, Jan 5, 2017 at 12:09 PM, Christopher Singley <ch...@singleys.com> wrote:
Hi, I'm writing some accounting applications in Python, and thereby stumbled upon beancount.

It occurs to me that you might be interested in my OFX parser:

https://github.com/csingley/ofxtools

It handles a pretty decent subset of the OFX spec, it handles OFX versions 1 & 2, and it has no dependencies beyond stdlib.

Thanks for the link.
Looks nice.

One of the example importers in beancount.ingest does OFX. However, it's not intended to be a full-fledged OFX thing... it's really just a good example of how to build a custom importer with some basic and admittedly sloppy OFX parsing. Ideally people would use something like your library.

I'm investigating using beancount as an accounting engine...human-readable text files are great.

The main focus of my present work is booking investment transactions from data downloads, and from reading your document "Trading with Beancount" I'm wondering if it's a good fit.

It's good for personal finance level work. If you're a day trader and are pumping 20+ trades/day, you'll probably want to make something more specialized, that takes into account time and that has specialized P/L reporting. As an example, I use it to track my own portfolio with perhaps 20 trades/month, across 3-4 accounts, some pre-tax/retirement, some active, including occasional currency trading/hedging, so not "active" but also a fair bit more than most people's retirement-only activity.

BTW, while you do your exploration, this comparison of Ledger & Beancount might be useful:

https://docs.google.com/document/d/1dW2vIjaXVJAf9hr7GlZVe3fJOkM-MtlVjvCO1ZpNLmg/

The TL;DR is that Ledger doesn't attempt to book reducing lots against specific positions.

Furthermore, it doesn't distinguish between currency exchange and positions with a cost basis.

Tracking of cost basis is therefore quite limited.

AFAIK HLedger repeats those same design choices (though I haven't looked for a while).

The way Beancount handles this is mostly explained in this recently written document:

https://docs.google.com/document/d/11a9bIoNuxpSOth3fmfuIFzlZtpTJbvw-bPaQCnezQJs/

In short, these are the salient differences in terms of investment tracking.

One feature I want add soon is the ability to list trades (for sure before tax time). With changes I made late last year it will be very easy to add this, but I'm tidying up docs and incomplete branches in order to bake a numbered release before I do much else.

Also, an example of an investment-related custom application is the listing of information for wash sales, which I do here with a custom script:

https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/experiments/washsales/?at=default

It's not generalized so much yet, I should probably document it better so others can use it. The problem I'm solving with that is that I have some probably incorrectly booked cost basis adjustments from previous years (I book it pessimistically, choosing the worst case for me) because the actual calculation from the IRS is defined ambiguously, and I need to track the adjusted basis of various lots I chose in the past years for future sales. The information isn't available anywhere else, I need to track it carefully for accurate reporting.

There's also other fun stuff under experiments/

Martin Blais

unread,

Jan 6, 2017, 5:21:14 AM1/6/17

to Beancount

On Fri, Jan 6, 2017 at 12:17 AM, Christopher Singley <ch...@singleys.com> wrote:

Thanks Martin. Yeah I need datetime not just date type, which is of course not terribly compatible with the mandate for human-friendly input format.

I'm browsing through beancount.core.inventory.add_amount(), which seems to assume that the units have pre-matched against the positions in inventory.... if you specify cost=None it'll just operate on the first item in the list without further iteration, yes? Does this matching happen elsewhere in the code, or is it a straight pass-through from JEs listed in a text file?

Astute observation.

There are two phases: parsing, and then booking.

The parsing outputs incomplete costs, which are interpreted as aspects to be matched against the list of available positions during the booking phase.

After booking, the postings all have instances of "Cost" resolved to match exactly against existing positions.

Therefore, the Inventory code can afford to be simple and just look for an exact match.

I hope this helps,

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/4c5c6f5e-e827-40e9-92b8-31f255987dfe%40googlegroups.com.

Christopher Singley

unread,

Jan 6, 2017, 1:38:33 PM1/6/17

to Beancount

On Thursday, 5 January 2017 23:21:14 UTC-6, Martin Blais wrote:

There are two phases: parsing, and then booking.
The parsing outputs incomplete costs, which are interpreted as aspects to be matched against the list of available positions during the booking phase.
After booking, the postings all have instances of "Cost" resolved to match exactly against existing positions.
Therefore, the Inventory code can afford to be simple and just look for an exact match.

Sorry, could you point me in the direction of the relevant booking code? Your code is very well organized, but I'm just a humble CFA type unburdened by fancy computer science book learnin'.

But if you need to know the IRS rules about wash sales... there is indeed an ambiguity in the revenue code, which is silent on the ordering of replacement shares in wash sales. What the brokers do is use FIFO here, and of course you'll want to tie your schedule D out to what the broker's reporting to Treasury. Prior to 2010 (or the subsequent years' phase-in of the broker cost basis reporting obligation by security type) it was up for grabs; no guarantees that your broker's basis information (if they reported it to you at all) is accurate in any way. I've got large deltas because of pre-2010 return of capital distributions from lots still on the books, that were never coded as such by brokers. As you can see, I'm not much of a day trader.

... ...

Do you think beancount would fall over if I swapped out position.Cost.date and position.CostSpec.date for Python datetime.datetime instead of datetime.date?

This is a fundamental requirement for those of us who use Interactive Brokers, whose trade accounting is *very* granular. Since the other side of every trade is always some algo trying not to show its cards, you wind up with a zillion lots of 100sh at the same or slightly different prices, and the same or slightly different millisecond. We always pick lots algorithmically... besides FIFO/LIFO, maxcost and mincost are popular, for obvious reasons.

I also note the comment in your code about the choice to use indexed lists instead of dicts to track inventory, on the assumption that these lists will always be small. I don't really need high performance, but this sounds like it might be a problem if you really started running some trade volume through it. On pretty low-key trading activity I regularly maintain 10K rows in inventory in 20 securities; figure I need to budget for handling at least 1M rows.

I guess I'm probably going to have to wind up writing my own. Too bad, we're solving very similar problems and you appear to be a much better programmer.

Simon Michael

unread,

Jan 6, 2017, 3:50:54 PM1/6/17

to bean...@googlegroups.com

Interesting discussion.

hledger is the least sophisticated of the three when it comes to
trading, currently. Beancount seems to be leading the way here. One of
these days I hope to understand better what y'all are talking about here.

Christopher Singley

unread,

Jan 6, 2017, 5:22:33 PM1/6/17

to Beancount

Securities trading is basically a special case of inventory/cost of goods sold bookkeeping. It's a bit funkier than plan-vanilla flow-of-funds type bookkeeping.

You can read up on it at the website for which beancount is named:
http://www.dwmbeancounter.com/BCTutorSite/Courses/Inv/lesson02-2.html

In terms of that tutorial, we're essentially talking about "FIFO Perpetual Inventory" accounting... except that for securities inventory, we relax the requirement to maintain a consistent scheme for booking out inventory units.

Essentially, we need to maintain a journal which is a list of (asset, datetime, units, cost [or cost per unit]). Summing for cost [or cost*cost/unit] ties out to balance sheet accounts. For trading marketable securities, a normal chart of accounts will have current asset subaccounts for every issue owned (identified by ticker, CUSIP, ISIN, or what have you), and current liability subaccounts for each issue sold short. Each of those subaccounts will have a subaccount for cost, which ties out to the journal with the constraint that sum of cost for a given asset has to equal the costs posted to the relevant balance sheet subaccount on the general ledger.

Entries to the inventory journal are not double-entry accounting, and they're not necessarily money-denominated at all. Weird stuff happens to inventory in the real world (if you can call the stock market the real world).

When you reduce or close a position, somebody needs to look through the list of lots, and choose which lot(s) to get rid of, based on some logic. Based on that choice, whoever's controlling the inventory will return a figure for the cost booked out, and the guy maintaining the general ledger will post a transaction decreasing the balance-sheet cost of the asset (on one side) and increasing the cost of goods sold on the income statement (on the other side).

It seems to me that what you want is a Python dict of {asset: [lot, lot, lot, ...]}, where lot=(datetime, units, cost).

More generally, you keep a journal like that for each brokerage account, so you'd either want to key the dict above by (account, asset), or a better way would be to use nested dicts e.g. portfolio[account][security] = [lot, lot, lot, ...]. That way you can quickly & naturally sum costs for a security in an account (via hash map lookup & list comprehension), and roll those up into total cost for a brokerage account. You can do the same for units. This corresponds to the way this stuff gets reported by brokers and the way we set up these charts of accounts. It's a little bit less ideal for reporting aggregates of securities across accounts, but still way better than an unstructured list.

Unfortunately the list structure seems to be baked into the application interface of inventory.py... the inventory control logic isn't locally encapsulated, so the dumb objects here return list pointers back to where the smarts actually reside. I think this means I can't change this without completely forking the project and ripping up things nobody wants ripped up.

Oh well.

Martin Blais

unread,

Jan 7, 2017, 4:59:11 AM1/7/17

to Beancount

On Fri, Jan 6, 2017 at 8:38 AM, Christopher Singley <ch...@singleys.com> wrote:

On Thursday, 5 January 2017 23:21:14 UTC-6, Martin Blais wrote:
There are two phases: parsing, and then booking.
The parsing outputs incomplete costs, which are interpreted as aspects to be matched against the list of available positions during the booking phase.
After booking, the postings all have instances of "Cost" resolved to match exactly against existing positions.
Therefore, the Inventory code can afford to be simple and just look for an exact match.

Sorry, could you point me in the direction of the relevant booking code? Your code is very well organized, but I'm just a humble CFA type unburdened by fancy computer science book learnin'.

Here:

https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/src/python/beancount/loader.py?at=default&fileviewer=file-view-default#loader.py-436

The loader invokes the parser on all files ("recursively" is for the include files).

Then runs the booking code.

https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/src/python/beancount/parser/booking.py?at=default&fileviewer=file-view-default#booking.py-19

This dispatches between old ("simple") and new ("full") booking codes.

You can disregard the old version, it's flawed anyway, and not the default anymore.

It's only there for transition, I'll remove it at some point.

The new ("full") booking code is located here:

https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/src/python/beancount/parser/booking_full.py?at=default&fileviewer=file-view-default#booking_full.py-105

It's a bit gnarly, as it handles a lot of missing values ("interpolation") as well.

The unit tests are perhaps more useful to try to understand it:

https://bitbucket.org/blais/beancount/src/397e821a48c5db2700a8a66868c5d4e68ca19bfb/src/python/beancount/parser/booking_full_test.py?at=default&fileviewer=file-view-default

But if you need to know the IRS rules about wash sales... there is indeed an ambiguity in the revenue code, which is silent on the ordering of replacement shares in wash sales. What the brokers do is use FIFO here, and of course you'll want to tie your schedule D out to what the broker's reporting to Treasury. Prior to 2010 (or the subsequent years' phase-in of the broker cost basis reporting obligation by security type) it was up for grabs; no guarantees that your broker's basis information (if they reported it to you at all) is accurate in any way. I've got large deltas because of pre-2010 return of capital distributions from lots still on the books, that were never coded as such by brokers. As you can see, I'm not much of a day trader.

I've read all the IRS documentation on wash sales and wrote code to reproduce its examples in Beancount.

I have a complex case (monthly vesting of RSUs) where tons of little wash sales get generated and cascade each other through time.

Once I realized their rules were ambiguous, it was clear this was an unintended consequence of the law. There's only one software on the market I've found that claims to be able to do this, and other people have tried resolving the madness here:

https://github.com/adlr/wash-sale-calculator

and here:

https://github.com/bbreslauer/wash-sale-tracker

I use a specific method - designed to be as disadvantageous to me as possible - to assign cost basis to replacement lots and track them using Beancount. The numbers are small so I don't care foregoing my loss deductions longer than I have to. I'm more concerned about avoiding using losses prematurely and then having to go through and audit and make backpayments. I'd rather it be the other way around (being owed to).

... ...

Do you think beancount would fall over if I swapped out position.Cost.date and position.CostSpec.date for Python datetime.datetime instead of datetime.date?

Not sure.

There's very little computation on the dates, mostly just comparisons for sorting and equality.

- The would need to be some sort of new syntax for parsing the times

- There are some invariants in the ordering of directives e.g. Balance directives need to appear before all transactions; having a time might mess with that and add complexity, that might create odd situations

- Comparisons would have to be reviewed everywhere

- All the unit tests would have to be ported

In any case, it's not a quick hack and there are so many other more urgent things I doubt I'd have time to work on it for another 6 months or more.

This is a fundamental requirement for those of us who use Interactive Brokers, whose trade accounting is *very* granular. Since the other side of every trade is always some algo trying not to show its cards, you wind up with a zillion lots of 100sh at the same or slightly different prices, and the same or slightly different millisecond. We always pick lots algorithmically... besides FIFO/LIFO, maxcost and mincost are popular, for obvious reasons.

Interesting. I'm considering joining the platform myself, mainly to take advantage of the low margins and leverage up.

Do we have another IB user in the house?

I think there's a possibility that you may be able to aggregate cost at the importer level and book an aggregate, simplified trade.

Not sure, I'd have to see what the downloaded data looks like.

My other brokers also split sales on matches, but not that much. It's not dramatic.

BTW I'm not unfamiliar with this problem, I used to work for a few HFT shops (on various desks).

Beancount isn't designed with that usage in mind.

I think it can tolerate some, but like I said, if you have tons of trades/day, you're better off writing something custom.

I also note the comment in your code about the choice to use indexed lists instead of dicts to track inventory, on the assumption that these lists will always be small. I don't really need high performance, but this sounds like it might be a problem if you really started running some trade volume through it. On pretty low-key trading activity I regularly maintain 10K rows in inventory in 20 securities; figure I need to budget for handling at least 1M rows.

Converting the inventory from a list to a dict should be very easy. In fact, it used to be this way. It would take about a day's worth of careful work to do it and port all the unittests to ensure everything still WAI.

I guess I'm probably going to have to wind up writing my own. Too bad, we're solving very similar problems and you appear to be a much better programmer.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/e3e6445d-9f98-4e26-9b03-4c97fba70998%40googlegroups.com.

Martin Blais

unread,

Jan 7, 2017, 5:12:51 AM1/7/17

to Beancount

On Fri, Jan 6, 2017 at 12:22 PM, Christopher Singley <ch...@singleys.com> wrote:

Securities trading is basically a special case of inventory/cost of goods sold bookkeeping. It's a bit funkier than plan-vanilla flow-of-funds type bookkeeping.

You can read up on it at the website for which beancount is named:
http://www.dwmbeancounter.com/BCTutorSite/Courses/Inv/lesson02-2.html

Actually, Beancount isn't named after this, thought I did read it a long while ago, it's a great site.

In terms of that tutorial, we're essentially talking about "FIFO Perpetual Inventory" accounting... except that for securities inventory, we relax the requirement to maintain a consistent scheme for booking out inventory units.

Essentially, we need to maintain a journal which is a list of (asset, datetime, units, cost [or cost per unit]).

That's what Beancount does, except it's in terms of

position = (units, currency, cost & cost currency, date, label)

Your datetime is quantized to days.

The label is there to help identify a particular lot by name if you need to.

Summing for cost [or cost*cost/unit] ties out to balance sheet accounts.

What do you mean by "ties out"?

For trading marketable securities, a normal chart of accounts will have current asset subaccounts for every issue owned (identified by ticker, CUSIP, ISIN, or what have you), and current liability subaccounts for each issue sold short.

That's what I recommend. Each trading account I use has a subaccount for every particular commodity, e.g.

Assets:US:InteractiveBrokers:Trading:AAPL

Each of those subaccounts will have a subaccount for cost, which ties out to the journal with the constraint that sum of cost for a given asset has to equal the costs posted to the relevant balance sheet subaccount on the general ledger.

Instead of using a subaccount, each of the position has its cost attached to it. This representation allows us to think of aggregating any type of position. This then allows us to formulate queries using something simple like SQL. "Sum up all the contents - whatever it is - under those accounts" is a working idea.

Entries to the inventory journal are not double-entry accounting, and they're not necessarily money-denominated at all. Weird stuff happens to inventory in the real world (if you can call the stock market the real world).

When you reduce or close a position, somebody needs to look through the list of lots, and choose which lot(s) to get rid of, based on some logic. Based on that choice, whoever's controlling the inventory will return a figure for the cost booked out, and the guy maintaining the general ledger will post a transaction decreasing the balance-sheet cost of the asset (on one side) and increasing the cost of goods sold on the income statement (on the other side).

That's exactly what the lot reduction process is like in Beancount. The information you provide is matched against the list of available lots right before the transaction is applied. There's even a debugging tool to print out the contents of the accounts affected before and after (bean-doctor context). Because the cost basis is kept attached to each lot, it is summed and balanced against the other postings of the transaction.

It seems to me that what you want is a Python dict of {asset: [lot, lot, lot, ...]}, where lot=(datetime, units, cost).

The "Inventory" object is essentially that, it's a mapping from

(currency, cost, acquisition-date, label) -> units

Same thing.

More generally, you keep a journal like that for each brokerage account, so you'd either want to key the dict above by (account, asset), or a better way would be to use nested dicts e.g. portfolio[account][security] = [lot, lot, lot, ...]. That way you can quickly & naturally sum costs for a security in an account (via hash map lookup & list comprehension), and roll those up into total cost for a brokerage account.

You can do the same for units. This corresponds to the way this stuff gets reported by brokers and the way we set up these charts of accounts. It's a little bit less ideal for reporting aggregates of securities across accounts, but still way better than an unstructured list.

Inventory's representation is equivalent, but also allows a well-defined aggregation of any position.

I see no disadvantage in the way Inventory represents it compared to a hierarchical representation.

Unfortunately the list structure seems to be baked into the application interface of inventory.py... the inventory control logic isn't locally encapsulated, so the dumb objects here return list pointers back to where the smarts actually reside. I think this means I can't change this without completely forking the project and ripping up things nobody wants ripped up.

Oh well.

Actually no. Converting Inventory to a dict is not very difficult, very minor changes in the interface would need be done.

BTW encapsulation in this case would be a bad thing... the inventory matching logic is much better left factored out of the container. Baking and "encapsulating" the logic in a method "abstraction" is something I'd have done in the 90's when I was much less enlightened and in love with OO and Design Patterns (I'm almost ashamed to admit, there was once a time...). A better way to build this is to keep the containers dumb and the algorithms outside, it's a much more flexible and elegant design. It's on purpose.

On Friday, 6 January 2017 09:50:54 UTC-6, Simon Michael wrote:
Interesting discussion.

hledger is the least sophisticated of the three when it comes to
trading, currently. Beancount seems to be leading the way here. One of
these days I hope to understand better what y'all are talking about here.

On 1/5/17 9:18 PM, Martin Blais wrote:
> The TL;DR is that Ledger doesn't attempt to book reducing lots against
> specific positions.
> Furthermore, it doesn't distinguish between currency exchange and positions
> with a cost basis.
> Tracking of cost basis is therefore quite limited.
> AFAIK HLedger repeats those same design choices (though I haven't looked
> for a while).

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/c3a9d027-0e12-4717-9f27-0dde48a39e31%40googlegroups.com.

Christopher Singley

unread,

Jan 7, 2017, 1:48:26 PM1/7/17

to Beancount

Thanks for the guided tour of beancount, I will puzzle through its workings. So far I like what I see very much; I'm just trying to see if I can fit it to my use case.

On Friday, 6 January 2017 22:59:11 UTC-6, Martin Blais wrote:

I have a complex case (monthly vesting of RSUs) where tons of little wash sales get generated and cascade each other through time.
Once I realized their rules were ambiguous, it was clear this was an unintended consequence of the law. There's only one software on the market I've found that claims to be able to do this, and other people have tried resolving the madness here:
https://github.com/adlr/wash-sale-calculator
and here:
https://github.com/bbreslauer/wash-sale-tracker

I use a specific method - designed to be as disadvantageous to me as possible - to assign cost basis to replacement lots and track them using Beancount. The numbers are small so I don't care foregoing my loss deductions longer than I have to. I'm more concerned about avoiding using losses prematurely and then having to go through and audit and make backpayments. I'd rather it be the other way around (being owed to).

I've got some wash sale logic, but nothing that's ready for prime time. This is a reasonably hairy problem. The thing to keep in mind is that the accounting treatment here is different for GAAP vs. tax basis. The way CPAs handle this is to keep the books on GAAP basis, to produce GAAP financial statements, and then there's a separate workflow involving book-to-tax adjustments (with its own journals) to produce tax reports. They keep all this shit off the main ledger. So the general solution involves a reporting function that can ingest a separate journal of adjusting entries, with its own equality constraints, to accommodate different accounting bases. No fun.

Often the amount of the disallowed loss is judged not worth the effort to track it, and it's just written off.

... ...

Do you think beancount would fall over if I swapped out position.Cost.date and position.CostSpec.date for Python datetime.datetime instead of datetime.date?

Not sure.
There's very little computation on the dates, mostly just comparisons for sorting and equality.
- The would need to be some sort of new syntax for parsing the times
- There are some invariants in the ordering of directives e.g. Balance directives need to appear before all transactions; having a time might mess with that and add complexity, that might create odd situations
- Comparisons would have to be reviewed everywhere
- All the unit tests would have to be ported
In any case, it's not a quick hack and there are so many other more urgent things I doubt I'd have time to work on it for another 6 months or more.

I tend to stick with strict ISO 8601 across the board.

If beancount could reasonably be generalized to be fit for purpose, it is my hope that I would be able to do much of the work myself - who likes a crasher that brings nothing to the party? However, looking at your code, it might take me more than 6 months to hone my amateur programming skills to match your quality of work!

Interesting. I'm considering joining the platform myself, mainly to take advantage of the low margins and leverage up.
Do we have another IB user in the house?

If you're an active trader, Interactive Brokers is the best on many metrics (not all of which are important to me). They are the UNIX of stockbrokers. However, although they'e improved a lot over the years, their tax accounting leaves much to be desired; they struggle with these data structures too.

I think there's a possibility that you may be able to aggregate cost at the importer level and book an aggregate, simplified trade.
Not sure, I'd have to see what the downloaded data looks like.

We did this for years (aggregating trades by day); it's very undesirable. It caused a lot of manual spreadsheet reconcilation to tie our books out to brokerage statements that didn't aggregate trades... I could give you many examples. I've been much happier since we started matching broker cost-lot accounting.

I think it can tolerate some, but like I said, if you have tons of trades/day, you're better off writing something custom.

I suspect that's the case, and not just because of the problem of matching trades to accounts. In any case it's good to talk; it's clear I have much to learn from beancount.

Christopher Singley

unread,

Jan 7, 2017, 2:38:39 PM1/7/17

to Beancount

On Friday, 6 January 2017 23:12:51 UTC-6, Martin Blais wrote:

Summing for cost [or cost*cost/unit] ties out to balance sheet accounts.

What do you mean by "ties out"?

That's accountant-speak for "equality testing".

Each of those subaccounts will have a subaccount for cost, which ties out to the journal with the constraint that sum of cost for a given asset has to equal the costs posted to the relevant balance sheet subaccount on the general ledger.

Instead of using a subaccount, each of the position has its cost attached to it. This representation allows us to think of aggregating any type of position. This then allows us to formulate queries using something simple like SQL. "Sum up all the contents - whatever it is - under those accounts" is a working idea.

That's fine too; cost is not really a first-class concept on the financial statements; it comes in from special journals.

Inventory's representation is equivalent, but also allows a well-defined aggregation of any position.
I see no disadvantage in the way Inventory represents it compared to a hierarchical representation.

I'm not concerned about the structure, I'm simply concerned with speed when the list gets long. Already I've got problems in that regard... the OFX parser I put up on github uses regular expressions (so it can handle OFXv1 too), and it is dog slow when I churn a year's worth of data through it. For OFXv2 data, using an expat-based parser (dumping much of the processing from Python to C) speeds things up considerably.

BTW encapsulation in this case would be a bad thing... the inventory matching logic is much better left factored out of the container. Baking and "encapsulating" the logic in a method "abstraction" is something I'd have done in the 90's when I was much less enlightened and in love with OO and Design Patterns (I'm almost ashamed to admit, there was once a time...). A better way to build this is to keep the containers dumb and the algorithms outside, it's a much more flexible and elegant design. It's on purpose.

I meant no criticism - I am here to learn, not to teach.

But let me scrutinize your booking code, and I'll be able to expose my ignorance in greater detail, on my own personal road to enlightenment.

Cheers
Chris

Martin Blais

unread,

Jan 7, 2017, 8:33:43 PM1/7/17

to Beancount

On Sat, Jan 7, 2017 at 9:38 AM, Christopher Singley <ch...@singleys.com> wrote:

On Friday, 6 January 2017 23:12:51 UTC-6, Martin Blais wrote:

Summing for cost [or cost*cost/unit] ties out to balance sheet accounts.

What do you mean by "ties out"?

That's accountant-speak for "equality testing".

On a somewhat related note: I wonder if a new type of directive could be useful, similar to Balance assertions, that would assert that the total of one account matches that of another. I haven't need it myself just yet, but it's a simple and appealing idea.

Each of those subaccounts will have a subaccount for cost, which ties out to the journal with the constraint that sum of cost for a given asset has to equal the costs posted to the relevant balance sheet subaccount on the general ledger.

Instead of using a subaccount, each of the position has its cost attached to it. This representation allows us to think of aggregating any type of position. This then allows us to formulate queries using something simple like SQL. "Sum up all the contents - whatever it is - under those accounts" is a working idea.

That's fine too; cost is not really a first-class concept on the financial statements; it comes in from special journals.

One thing to be aware of is that the goal of this project is not to replicate nor implement traditional methods of accounting.

Rather, the purpose is to come up with the simplest possible representation that allows one to enter and effectively query data from text files in the context of personal finance.

Given this, I feel quite free to deviate from well-established norms, especially when their existence derives from historical limitations.

For example, Beancount adopts Ledger's idea of doing away with credits and debits (preferring signed amounts) and there's no "reconciliation" process, instead you declare balance assertions explicitly.

Furthermore, I think there may be simpler ways to solve some accounting problems by taking a CS outlook on them.

Inventory's representation is equivalent, but also allows a well-defined aggregation of any position.
I see no disadvantage in the way Inventory represents it compared to a hierarchical representation.

I'm not concerned about the structure, I'm simply concerned with speed when the list gets long. Already I've got problems in that regard... the OFX parser I put up on github uses regular expressions (so it can handle OFXv1 too), and it is dog slow when I churn a year's worth of data through it. For OFXv2 data, using an expat-based parser (dumping much of the processing from Python to C) speeds things up considerably.

Beancount uses a mostly C parser. No real performance optimizations have been done yet, so there's lot of low-hanging fruit. We'll worry about it when it gets too slow.

BTW encapsulation in this case would be a bad thing... the inventory matching logic is much better left factored out of the container. Baking and "encapsulating" the logic in a method "abstraction" is something I'd have done in the 90's when I was much less enlightened and in love with OO and Design Patterns (I'm almost ashamed to admit, there was once a time...). A better way to build this is to keep the containers dumb and the algorithms outside, it's a much more flexible and elegant design. It's on purpose.

I meant no criticism - I am here to learn, not to teach.

But let me scrutinize your booking code, and I'll be able to expose my ignorance in greater detail, on my own personal road to enlightenment.

Cheers
Chris

--

You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/94e42078-c4f6-4c46-aa4a-84ed78daa59a%40googlegroups.com.

Christopher Singley

unread,

Jan 8, 2017, 4:09:54 PM1/8/17

to Beancount

On Saturday, 7 January 2017 14:33:43 UTC-6, Martin Blais wrote

On a somewhat related note: I wonder if a new type of directive could be useful, similar to Balance assertions, that would assert that the total of one account matches that of another. I haven't need it myself just yet, but it's a simple and appealing idea.

I doubt you'd find that useful. Such assertions are generally needed not between various accounts on the general ledger, but between G/L accounts and other records whose utility you've not yet perceived.

One thing to be aware of is that the goal of this project is not to replicate nor implement traditional methods of accounting.
Rather, the purpose is to come up with the simplest possible representation that allows one to enter and effectively query data from text files in the context of personal finance.
Given this, I feel quite free to deviate from well-established norms, especially when their existence derives from historical limitations.
For example, Beancount adopts Ledger's idea of doing away with credits and debits (preferring signed amounts) and there's no "reconciliation" process, instead you declare balance assertions explicitly.
Furthermore, I think there may be simpler ways to solve some accounting problems by taking a CS outlook on them.

The issue is not that traditional methods of accounting are crusty and benighted... mapping debits & credits to +/- is just normal industry practice; it's been pretty much universal since the release of Visicalc.

The real issue is defining the scope of the problem. Accountancy treats of operations and entities as different from one another as a hummingbird and a jellyfish. In order to do our own work, we make simplifying assumptions. It's important to be clear about what assumptions we're making, the trade-offs involved, where they break down, and how to handle it when they do break down.

The biggest assumptions made by beancount seem to involve its input format - e.g. a single monolothic file with a unified simple format suffices for all accounting needs of personal finance, and manual data entry is entirely sufficient for personal finance. I am entirely sympathetic to these design goals, but it sounds like you're already starting to bump into problems (the wash sale rules) the root cause of which is insufficient separation of concerns. I hate to tell you that there's a lot more where that came from... yes, squarely in the realm of personal finance (unless by "personal finance" you mean "trivial cases").

It's an entirely respectable position to limit the scope of the problem to just the general ledger - to say "beancount doesn't do inventory". Your system looks fine for that use (although I need to test it) - I'm pretty sure I could write what I need externally, and dump computed JEs as text files to hand off to beancount.

But you're heading into treacherous waters with inventory. Shoehorning currency exchanges and securities trades into the same inputs as general ledger entries is a kludge, and if you analyze the workings of systems that actually handle these cases well, you'll see that you will run into predictable problems. If you care about these things and want to do them, it's worth formulating a plan for dealing with the obvious issues that arise, and incorporating that understanding into the system architecture.

I'm probably not the guy you really want to talk to - my knowledge of accounting is as meager as my understanding of computer science. I can only claim to have a reasonable experience of real-world personal finance data that play havoc with all kinds of simplifying assumptions that would be nice to rely on in an elegant accounting system. Hence my appeal to generality.

A tax attorney of my acquaintance says you can separate people into two categories by placing a large bucket of silver dollars on the floor in front of them. One group won't deign to stoop, foregoing the windfall in order to retain an upright and dignified posture. The other group will dive into the bucket and start grubbing for money, with their asshole winking at the world.

It's important to understand that, at its core, accounting is all about sphincter winkery. If your wash sale problems had amounted to real money, I imagine you'd be a lot clearer on that, and have a better understanding of what reconciliation is really all about. As it is, I suppose you have this enlightenment to look forward to.

Carry on as you were!

Chris

Martin Blais

unread,

Jan 8, 2017, 6:30:01 PM1/8/17

to Beancount

On Sun, Jan 8, 2017 at 11:09 AM, Christopher Singley <ch...@singleys.com> wrote:

On Saturday, 7 January 2017 14:33:43 UTC-6, Martin Blais wrote

On a somewhat related note: I wonder if a new type of directive could be useful, similar to Balance assertions, that would assert that the total of one account matches that of another. I haven't need it myself just yet, but it's a simple and appealing idea.

I doubt you'd find that useful. Such assertions are generally needed not between various accounts on the general ledger, but between G/L accounts and other records whose utility you've not yet perceived.

I don't see a problem defining an assertion that works across files. In fact, I could probably make use of that today when I create separate ledgers for trips or projects. I think it's a cool idea.

One thing to be aware of is that the goal of this project is not to replicate nor implement traditional methods of accounting.
Rather, the purpose is to come up with the simplest possible representation that allows one to enter and effectively query data from text files in the context of personal finance.
Given this, I feel quite free to deviate from well-established norms, especially when their existence derives from historical limitations.
For example, Beancount adopts Ledger's idea of doing away with credits and debits (preferring signed amounts) and there's no "reconciliation" process, instead you declare balance assertions explicitly.
Furthermore, I think there may be simpler ways to solve some accounting problems by taking a CS outlook on them.

The issue is not that traditional methods of accounting are crusty and benighted... mapping debits & credits to +/- is just normal industry practice; it's been pretty much universal since the release of Visicalc.

The real issue is defining the scope of the problem. Accountancy treats of operations and entities as different from one another as a hummingbird and a jellyfish. In order to do our own work, we make simplifying assumptions. It's important to be clear about what assumptions we're making, the trade-offs involved, where they break down, and how to handle it when they do break down.

The biggest assumptions made by beancount seem to involve its input format - e.g. a single monolothic file with a unified simple format suffices for all accounting needs of personal finance, and manual data entry is entirely sufficient for personal finance. I am entirely sympathetic to these design goals, but it sounds like you're already starting to bump into problems (the wash sale rules) the root cause of which is insufficient separation of concerns. I hate to tell you that there's a lot more where that came from... yes, squarely in the realm of personal finance (unless by "personal finance" you mean "trivial cases").

You're very confused. I'm having no problems with tracking my wash sales. Having an open data format has allowed me to build a custom solution for it. Not having any problems with it. Part of the power of an open system like this is that you can easily extract subsets of data from it and write custom code to solve a particular problem.

It's an entirely respectable position to limit the scope of the problem to just the general ledger - to say "beancount doesn't do inventory".

.... but it does.

http://furius.ca/beancount/doc/inventories

Your system looks fine for that use (although I need to test it) - I'm pretty sure I could write what I need externally, and dump computed JEs as text files to hand off to beancount.

But you're heading into treacherous waters with inventory. Shoehorning currency exchanges and securities trades into the same inputs as general ledger entries is a kludge, and if you analyze the workings of systems that actually handle these cases well, you'll see that you will run into predictable problems. If you care about these things and want to do them, it's worth formulating a plan for dealing with the obvious issues that arise, and incorporating that understanding into the system architecture.

Well so far I've got accounts over four countries and 10 years' worth of data and usage that tell me the model I've chosen is working well, including currencies and investments. If you'd like to criticize it, I welcome it - it can only make my software better and maybe I learn something in the process - but you need to be specific, provide specific examples of where it fails. Talk is cheap; start by doing something, then present us with results or better, code. Then we can talk about performance or representational issues and limitations.

I'm probably not the guy you really want to talk to - my knowledge of accounting is as meager as my understanding of computer science. I can only claim to have a reasonable experience of real-world personal finance data that play havoc with all kinds of simplifying assumptions that would be nice to rely on in an elegant accounting system. Hence my appeal to generality.

A tax attorney of my acquaintance says you can separate people into two categories by placing a large bucket of silver dollars on the floor in front of them. One group won't deign to stoop, foregoing the windfall in order to retain an upright and dignified posture. The other group will dive into the bucket and start grubbing for money, with their asshole winking at the world.

It's important to understand that, at its core, accounting is all about sphincter winkery. If your wash sale problems had amounted to real money, I imagine you'd be a lot clearer on that, and have a better understanding of what reconciliation is really all about. As it is, I suppose you have this enlightenment to look forward to.

Carry on as you were!

That's the plan.

It sounds like you're attached to the traditional ways of doing things, I'll suggest you might look into a commercial package for your work.

Good luck,

Reply all

Reply to author

Forward