Inventory Reductions

149 views

Skip to first unread message

Christopher Singley

unread,

Jun 12, 2019, 12:26:59 AM6/12/19

to bean...@googlegroups.com

I've been reading through this:

http://furius.ca/beancount/doc/self-reductions

and puzzling through parser.booking_full.

It looks to me like the root cause of your struggles is that the keys to
your
Inventory mapping are overspecified - the necessity to perform the
calculations to
populate a Cost instance in order to look up a lot. I reckon you need
to move
the cost data from keys to values, so that inventory is a mapping from
(account, security) -> [(units, cost, date)]
instead of the current mapping from
(account, security, cost, date) -> [(units, )].
The former is a more natural data structure for cost accounting.

Any well-formed transaction natively has (account, security) fields.
Use those to look up a sequence of lots containing (lot_units, cost,
open_date).
Filter that sequence using (transaction_date, transaction_units) to find
lots that
might be closed ("booked") by the incoming transaction - it will
definitely have
transaction_date, and if it doesn't have transaction_units for some
reason, then that
is trivially interpolated.

Next step depends on your cost accounting method. Normally you'd sort
transactions
by date/time to do FIFO or average cost. To instead do specific
identification, you'd
further filter the lots for a particular date/cost/label, and require a
unique result.

NOW you do the heavy lifting.

Work through the surviving lots in order, popping lots and splitting
them as necessary
until you run out of transaction_units or lot_units. For each popped
lot, couple its
data to (transaction_date, transaction_units, transaction_price),
and you'll have all the data needed to fully populate a journal entry.

There's nothing recursive about this calculation. You can implement it
as a straight
pipeline of iterators, evaluated lazily.

An additional advantage is that this procedure is easy to extend to
handling other
securities transaction types that don't involve realizing gain.
E.g. for a split, use (account, security) to look up your position.
Filter that sequence for lots with an open_date before the transaction_date,
and replace them with copies with the units/cost adjusted for the split.
Keep a running total of the change in units, and require that total to
match the input
transaction_units (which is a hard requirement for a stock split
transaction).

Any conceptual problems with this setup? I mean, other than being a
huge PITA to
rip up existing classes and everything that touches them.

Martin Blais

unread,

Jun 12, 2019, 11:31:17 PM6/12/19

to Beancount

Thanks for your interest Christopher.

The circular nature of the problem is that interpolation may depend on booking, and booking may depend on interpolation.

There's no perfect solution to that, that I could find.

The implementation of the Inventory has already evolved since this was written (for performance reasons) and IIRC is treated mostly like a list, matching portions that have been specified to filter a list candidate positions. I'm not beyond reviewing core classes - especially if it might help - but I believe changing the mapping would make no difference at all here. I wrote an example some time ago - in a text file IIRC, which I shared on the list and had some comments about - but I can't seem to find it right now.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/e84f305c-9282-1d20-d74d-00d99620f2da%40singleys.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Singley

unread,

Jun 14, 2019, 3:36:00 PM6/14/19

to Beancount

On Wednesday, 12 June 2019 22:31:17 UTC-5, Martin Blais wrote:

Thanks for your interest Christopher.

Well it is interesting, to me at least. I enjoy looking through your code; I'm well familiar with the problem space, but your methods are quite different.

The circular nature of the problem is that interpolation may depend on booking, and booking may depend on interpolation.

There's no perfect solution to that, that I could find.ave

You mean "booking" in the sense of "realizing capital gains", yes? Why does booking depend on interpolation? Is this because of your emphasis of specific identification as a cost accounting method? There are ways out of that.

Specific identification is a very uncommon cost accounting method. It's almost always FIFO or (for mutual fund companies) the degenerate average cost method. It's good to support specific identification (generality is good!) but given its rarity, it's not unreasonable to enforce a requirement that opening/closing transactions (or augmenting/reducing transactions in your usage) must have matching labels if you want to use specific identification. Don't attempt to interpolate the opening transaction from date/price, and the problem is solved, no?

Is guessing the opening transaction from partial user input (i.e. date or price) a high priority? The algorithm cannot reliably find a solution because of underspecified inputs, as you note in your docs, and it requires the user to manually duplicate a significant effort by keeping their own inventory outside of beancount (probably in a spreadsheet). I don't know about you, but never maintaining another lot-matching spreadsheet ever again is very high on my list of priorities.

You already have a good chunk of the inventory system built into beancount. If you let go of the requirements that are introducing recursion into your algorithm, I guess you'd find the benefits a lot more valuable than the bits of interpolation that you'd need to drop in order to achieve it.

I've written an inventory system that does this for me, so I know it can be done. I doubt you'd find it terribly useful, but if you're interested I can show you how I handle cost accounting. I've got a Python package that handles trades, splits, spinoffs, mergers, return of capital distributions, all that fun stuff. It's somewhat battle-tested, too, with a relatively high volume of messy real-world transactions run through it, and the results audited (as in CPAs engaged to discover discrepancies, not just unit tests). You might find it interesting to look at an alternative implementation. The code won't win any prizes for engineering elegance, and still needs some work, but the output is demonstrably correct for the most part.

The implementation of the Inventory has already evolved since this was written (for performance reasons) and IIRC is treated mostly like a list, matching portions that have been specified to filter a list candidate positions. I'm not beyond reviewing core classes - especially if it might help - but I believe changing the mapping would make no difference at all here. I wrote an example some time ago - in a text file IIRC, which I shared on the list and had some comments about - but I can't seem to find it right now.

I'd be interested in seeing the doc if you happen to stumble across it, but it's not a big deal. You're right that the dict keys aren't a deal breaker; they can be worked around without much trouble. I'm still puzzling through how you do this.

Thanks for releasing beancount, it's nice software.

To unsubscribe from this group and stop receiving emails from it, send an email to bean...@googlegroups.com.

Martin Blais

unread,

Jun 14, 2019, 9:22:41 PM6/14/19

to Beancount

On Fri, Jun 14, 2019 at 3:36 PM Christopher Singley <ch...@singleys.com> wrote:

On Wednesday, 12 June 2019 22:31:17 UTC-5, Martin Blais wrote:
Thanks for your interest Christopher.

Well it is interesting, to me at least. I enjoy looking through your code; I'm well familiar with the problem space, but your methods are quite different.

The circular nature of the problem is that interpolation may depend on booking, and booking may depend on interpolation.
There's no perfect solution to that, that I could find.ave

You mean "booking" in the sense of "realizing capital gains", yes? Why does booking depend on interpolation? Is this because of your emphasis of specific identification as a cost accounting method? There are ways out of that.

No; it would be very easy to fix if the issue was just implementing different booking methods :-)

The problem occurs because the syntax I created specifically aims to allow users to elide some information and automatically fill in some missing numbers. For instance, you don't have to provide all the details of a reducing lot, as long as the list of lots it matches (when filtered down) yields an unambiguous set (either a single lot, or many lots for which the total number of units matches the size of the reducing lot precisely). This is the process I call "booking", that is, matching a partial specification for reducing lots against the available lots just before the transaction gets applied. It uses the accumulated inventory in order to fill in missing information.

"Interpolation," on the other hand, is a similar process that fills in missing numbers, but not by matching against the contents of the inventory just before the transaction gets applied, but rather only against the other postings, by assuming that the set of postings for each currency group must balance. This does not make use of the state of the inventory before the transaction gets applied, just the information provided on that one transaction. It basically attempts to figured out the cost currency of each posting, then groups them by cost currency, and then attempts to fill in missing bits and piece (either numbers or currencies) in each of these currency groups.

These two are similar in goal: fill in missing information automatically to ease the burden of data entry, but in some cases - cases which are which particular bits are left missing in the input and for Beancount to figure out - running booking before interpolation works, and in other cases running interpolation preceding booking works. I have seen cases that are impossible to resolve. It took me a while to figure out which order was the most useful in practice, and this is what's in there now.

Specific identification is a very uncommon cost accounting method. It's almost always FIFO or (for mutual fund companies) the degenerate average cost method. It's good to support specific identification (generality is good!) but given its rarity, it's not unreasonable to enforce a requirement that opening/closing transactions (or augmenting/reducing transactions in your usage) must have matching labels if you want to use specific identification. Don't attempt to interpolate the opening transaction from date/price, and the problem is solved, no?

Is guessing the opening transaction from partial user input (i.e. date or price) a high priority? The algorithm cannot reliably find a solution because of underspecified inputs, as you note in your docs, and it requires the user to manually duplicate a significant effort by keeping their own inventory outside of beancount (probably in a spreadsheet). I don't know about you, but never maintaining another lot-matching spreadsheet ever again is very high on my list of priorities.

You already have a good chunk of the inventory system built into beancount. If you let go of the requirements that are introducing recursion into your algorithm, I guess you'd find the benefits a lot more valuable than the bits of interpolation that you'd need to drop in order to achieve it.

I've written an inventory system that does this for me, so I know it can be done. I doubt you'd find it terribly useful, but if you're interested I can show you how I handle cost accounting. I've got a Python package that handles trades, splits, spinoffs, mergers, return of capital distributions, all that fun stuff. It's somewhat battle-tested, too, with a relatively high volume of messy real-world transactions run through it, and the results audited (as in CPAs engaged to discover discrepancies, not just unit tests). You might find it interesting to look at an alternative implementation. The code won't win any prizes for engineering elegance, and still needs some work, but the output is demonstrably correct for the most part.

I'd be curious to have a look, but unfortunately I'm too busy right now, I have very little time, just keeping my head above water, mostly.

The implementation of the Inventory has already evolved since this was written (for performance reasons) and IIRC is treated mostly like a list, matching portions that have been specified to filter a list candidate positions. I'm not beyond reviewing core classes - especially if it might help - but I believe changing the mapping would make no difference at all here. I wrote an example some time ago - in a text file IIRC, which I shared on the list and had some comments about - but I can't seem to find it right now.

I'd be interested in seeing the doc if you happen to stumble across it, but it's not a big deal. You're right that the dict keys aren't a deal breaker; they can be worked around without much trouble. I'm still puzzling through how you do this.

I'll bring it up if I can find it. It was a text file in another branch IIRC, in the midst of code.

Thanks for releasing beancount, it's nice software

Thank you!

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/66479691-deff-4982-994f-845d71aae0e1%40googlegroups.com.

Christopher Singley

unread,

Jun 15, 2019, 9:00:41 AM6/15/19

to bean...@googlegroups.com, Martin Blais

On 6/14/19 8:22 PM, Martin Blais wrote:

On Fri, Jun 14, 2019 at 3:36 PM Christopher Singley <ch...@singleys.com> wrote:

<snip>

You mean "booking" in the sense of "realizing capital gains", yes? Why does booking depend on interpolation? Is this because of your emphasis of specific identification as a cost accounting method? There are ways out of that.

No; it would be very easy to fix if the issue was just implementing different booking methods :-)

The problem occurs because the syntax I created specifically aims to allow users to elide some information and automatically fill in some missing numbers. For instance, you don't have to provide all the details of a reducing lot, as long as the list of lots it matches (when filtered down) yields an unambiguous set (either a single lot, or many lots for which the total number of units matches the size of the reducing lot precisely). This is the process I call "booking", that is, matching a partial specification for reducing lots against the available lots just before the transaction gets applied. It uses the accumulated inventory in order to fill in missing information.

"Interpolation," on the other hand, is a similar process that fills in missing numbers, but not by matching against the contents of the inventory just before the transaction gets applied, but rather only against the other postings, by assuming that the set of postings for each currency group must balance. This does not make use of the state of the inventory before the transaction gets applied, just the information provided on that one transaction. It basically attempts to figured out the cost currency of each posting, then groups them by cost currency, and then attempts to fill in missing bits and piece (either numbers or currencies) in each of these currency groups.

These two are similar in goal: fill in missing information automatically to ease the burden of data entry, but in some cases - cases which are which particular bits are left missing in the input and for Beancount to figure out - running booking before interpolation works, and in other cases running interpolation preceding booking works. I have seen cases that are impossible to resolve. It took me a while to figure out which order was the most useful in practice, and this is what's in there now.

I'm just trying to understand why you're having so much trouble with the cost accounting. I adore the Ledger style syntax, despite its obvious limitations for this kind of work. My main question is how much of the problem is inherent in the syntax & data entry format vs. the algorithm applied to it.

Your "Self-reductions" document contains this example:
"""
Assets:Invest     10 HOOL {50 USD, 2016-01-01} ;; A
Assets:Invest     10 HOOL {51 USD, 2016-01-02} ;; B

2016-12-04 *
Assets:Invest    -5 HOOL {}
AssetCash       255 USD
"""

This is not any sort of corner case; this is what normal JEs look like. As written, the interpolation is trivial. The trouble arises because this JE could theoretically contain thousands of other postings, so the algorithm needs to solve for the missing cash to figure out the proceeds of the HOOL sale.

If I've got that right, it seems like the algorithm is suffering at the hands of the syntax. Why bother trying to handle such pathological bookkeeping? It is no hardship to the user to enforce a constraint that an asset purchase/sale must only contain a single currency posting.

In general, to process securities transactions, I believe you're going to need to define new directives other than "txn" so the parser can route securities transactions to different handlers. For example, your docs contain an example of HOOL spinning off A-shares and B-shares... you need a way to signal the parser to update inventory but skip realizing gains. As it stands, I don't believe beancount's syntax offers the possibility of distinguishing "reducing" postings that realize gain from those that don't.

I've been able to get it down to 6 different types of securities transactions - trades, return of capital distributions, spinoffs, splits, transfers, and options exercise. I think you can reduce the number of needed directives. Splits are essentially a subtype of transfers. It may also be possible to treat trades and return of capital as subtypes of transfer. Spinoffs probably need their own directive. You might be able to decompose options exercise into a sequence of more fundamental types, but I'm skeptical because of the holding period rules.

I suspect minimal syntax extensions would greatly improve the algorithms at essentially no cost to the user. If that's something you're willing to consider, you might also consider at the same time what kind of ledger syntax is needed to specify cost accounting, which (unfortunately) can change from one transaction to another on the same day. You need to be able to handle input data that does this:

https://investor.vanguard.com/taxes/cost-basis/methods

Anyway, something to keep in mind next time you're working on the inventory system.

Cheers, Chris

P.S. Technical documentation nitpicking - the average cost basis method is only available for mutual funds (I think it was special pleading to allow them to keep this business logic in the database layer - SQL stored procedures). It's got nothing to do with the tax qualification of the holding account - you see average cost used both inside and outside retirement accounts.

You received this message because you are subscribed to a topic in the Google Groups "Beancount" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beancount/QxEtBO-kyKQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beancount+...@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhMXumgJXtrBXV2B7Edu%2BQTK5yygwmtKGcOt4bt%3Dcmrkgw%40mail.gmail.com.

Martin Blais

unread,

Jun 15, 2019, 3:51:40 PM6/15/19

to Christopher Singley, Beancount, Martin Blais

On Sat, Jun 15, 2019 at 9:00 AM Christopher Singley <ch...@singleys.com> wrote:

On 6/14/19 8:22 PM, Martin Blais wrote:

On Fri, Jun 14, 2019 at 3:36 PM Christopher Singley <ch...@singleys.com> wrote:

<snip>

You mean "booking" in the sense of "realizing capital gains", yes? Why does booking depend on interpolation? Is this because of your emphasis of specific identification as a cost accounting method? There are ways out of that.

No; it would be very easy to fix if the issue was just implementing different booking methods :-)

The problem occurs because the syntax I created specifically aims to allow users to elide some information and automatically fill in some missing numbers. For instance, you don't have to provide all the details of a reducing lot, as long as the list of lots it matches (when filtered down) yields an unambiguous set (either a single lot, or many lots for which the total number of units matches the size of the reducing lot precisely). This is the process I call "booking", that is, matching a partial specification for reducing lots against the available lots just before the transaction gets applied. It uses the accumulated inventory in order to fill in missing information.

"Interpolation," on the other hand, is a similar process that fills in missing numbers, but not by matching against the contents of the inventory just before the transaction gets applied, but rather only against the other postings, by assuming that the set of postings for each currency group must balance. This does not make use of the state of the inventory before the transaction gets applied, just the information provided on that one transaction. It basically attempts to figured out the cost currency of each posting, then groups them by cost currency, and then attempts to fill in missing bits and piece (either numbers or currencies) in each of these currency groups.

These two are similar in goal: fill in missing information automatically to ease the burden of data entry, but in some cases - cases which are which particular bits are left missing in the input and for Beancount to figure out - running booking before interpolation works, and in other cases running interpolation preceding booking works. I have seen cases that are impossible to resolve. It took me a while to figure out which order was the most useful in practice, and this is what's in there now.

I'm just trying to understand why you're having so much trouble with the cost accounting. I adore the Ledger style syntax, despite its obvious limitations for this kind of work. My main question is how much of the problem is inherent in the syntax & data entry format vs. the algorithm applied to it.

Your "Self-reductions" document contains this example:
"""
Assets:Invest     10 HOOL {50 USD, 2016-01-01} ;; A
Assets:Invest     10 HOOL {51 USD, 2016-01-02} ;; B

2016-12-04 *
Assets:Invest    -5 HOOL {}
AssetCash       255 USD
"""

This is not any sort of corner case; this is what normal JEs look like. As written, the interpolation is trivial. The trouble arises because this JE could theoretically contain thousands of other postings, so the algorithm needs to solve for the missing cash to figure out the proceeds of the HOOL sale.

No.

This is ambiguous because it could be a case of the user wanting to match against the lot at 50 and forgot to put in an income posting.

Or that s/he intended to actually match with the lot at 51.

Whether we should just assume the latter without a warning is a matter of design.

If I've got that right, it seems like the algorithm is suffering at the hands of the syntax. Why bother trying to handle such pathological bookkeeping? It is no hardship to the user to enforce a constraint that an asset purchase/sale must only contain a single currency posting.

Both interpolation and automatic matching of lots are very useful features and I do want them both.

Now, it is true that I could have tried less hard to allow for automation, but I struck a balance - it's not perfect, but it does handle a fair amount of the cases.

For such definitional problems one often doesn't know how far he'll be able to solve the problem at hand until he has a go at it (and a few times more after that). I was arrogant enough to think I could automatically infer more than I was actually able to. It is unclear how many users make use of the most advanced versions of interpolation; it seems that while you may be fine with taking on the burden of writing down much details; others may not.

In general, to process securities transactions, I believe you're going to need to define new directives other than "txn" so the parser can route securities transactions to different handlers. For example, your docs contain an example of HOOL spinning off A-shares and B-shares... you need a way to signal the parser to update inventory but skip realizing gains. As it stands, I don't believe beancount's syntax offers the possibility of distinguishing "reducing" postings that realize gain from those that don't.

No that's fine; works well already. You empty out the existing postings and refill them while keeping the same total cost basis. You can override the lot date too, to keep the original one. The numbers aren't automatic, it does require manual calculation, but it's rare enough I haven't addressed it explicitly other than that.

I've been able to get it down to 6 different types of securities transactions - trades, return of capital distributions, spinoffs, splits, transfers, and options exercise. I think you can reduce the number of needed directives. Splits are essentially a subtype of transfers. It may also be possible to treat trades and return of capital as subtypes of transfer. Spinoffs probably need their own directive. You might be able to decompose options exercise into a sequence of more fundamental types, but I'm skeptical because of the holding period rules.

trades: To me trades are sets of transactions identifying specific postings: augmenting postings and various matching reductions. That's how I define those. Note that there's no explicit code to extract those but I have done it in the past by running the matching and inserting metadata. See mailing-list for recent posts. I'd like that to be stnadard.

Spinoffs: don't care. Not sure how to account for them, haven't seen them yet. Would love to learn.

Splits: I'd love to hear how you've dealt with splits. I have thought of this for a while, it's not an obvious prolbem.

Transfers: What are they?

Options exercise: Already works well, I do them many times/year. Does not require a directive. Perhaps the annoying thing is that the product name goes away on expiry and that might be automatable. RIght now I insert a balance assertion manually.

I suspect minimal syntax extensions would greatly improve the algorithms at essentially no cost to the user. If that's something you're willing to consider, you might also consider at the same time what kind of ledger syntax is needed to specify cost accounting, which (unfortunately) can change from one transaction to another on the same day. You need to be able to handle input data that does this:

https://investor.vanguard.com/taxes/cost-basis/methods

I have FIFO, LIFO and specific id already. Average cost is missing, and the way I can put that in is by merging associated lots to their average cost basic right before a reduction is triggered.

Anyway, something to keep in mind next time you're working on the inventory system.

Cheers, Chris

P.S. Technical documentation nitpicking - the average cost basis method is only available for mutual funds (I think it was special pleading to allow them to keep this business logic in the database layer - SQL stored procedures). It's got nothing to do with the tax qualification of the holding account - you see average cost used both inside and outside retirement accounts.

Yes, in Canada it's like that for non-retirement accounts IIRC.

(Often I'm writing from a US-centric POV, we're all living somewhere...)

Reply all

Reply to author

Forward

0 new messages