Workflow questions

79 views
Skip to first unread message

Eric Altendorf

unread,
Jul 17, 2023, 6:18:21 PM7/17/23
to bean...@googlegroups.com
There's a ton of documentation on the importers, but it doesn't say much about how data actually gets from the extractor into beancount.  The only thing I can find is a sentence that mentions that bean-extract "produces some Beancount input text to be moved to your input file".

Is the process of adding data to your Beancount input file intended to be a manual copy/paste one?

If for some reason you need to re-import some data that you'd already imported and added to your beancount input file, is it your responsibility to manually go find the records in the input file that came from that import, remove them, and replace them with the new import data?  Is there some method or tool for tracking or managing how the beancount input file is composed from all the sources?

Let's say you run Beancount and it makes some decisions for you -- for example, matching asset sales with corresponding lots for cap gains computation.  You use these to file your taxes.  How do you make a record of those decisions and enforce that a later run of Beancount doesn't change them (e.g., if next tax year you switch to a different booking method)?

Do my questions suggest I have the wrong mental model for using Beancount?  😬

thanks,
eric

Filip Filmar

unread,
Jul 17, 2023, 6:26:18 PM7/17/23
to bean...@googlegroups.com
On Mon, Jul 17, 2023 at 3:18 PM Eric Altendorf <erical...@gmail.com> wrote:
Do my questions suggest I have the wrong mental model for using Beancount?  😬

You may be assuming more automation than there is.

IMHO the real value of accounting software is in automation. And beancount today doesn't do that, which is why I gave up on it in favor of Tiller.

It's still a great idea, and what it does, it does well. If anything, I dislike that today your financial data is mostly owned by whatever company happens to provide your books, and beancount does swimmingly as an alternative. But it's far from a product that I can use on a daily basis.

F

Eric Altendorf

unread,
Jul 17, 2023, 6:34:47 PM7/17/23
to bean...@googlegroups.com
Thanks.  My use case may be different but in a different direction.  I'm not looking to daily track income and expenses.

I have a handful of all-time transaction records and I want to be able to
- combine & merge them
- apply some manual exception transformation rules
- apply decisions made by a previous run that I want "locked in"
- run beancount to make booking decisions and compute cap gains, and record those decisions for next year

And I'd like that to be a reproducible process to "build" my tax record and generate reports.  I'll mess with it once a year, running it a bunch of times and tweaking until I've got it right, then generate the reports for taxes and lock it in for the year.

I'm starting to think that maybe Beancount is a good match for the last bullet point, but I need to write my own build system for the other points, perhaps?


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAKaOXijbZW_v6LU32unkULhL1W1f_1XAym%3DdqdJeytOeiPAfJA%40mail.gmail.com.

Filip Filmar

unread,
Jul 17, 2023, 6:42:39 PM7/17/23
to bean...@googlegroups.com
On Mon, Jul 17, 2023 at 3:34 PM Eric Altendorf <erical...@gmail.com> wrote:
I'm starting to think that maybe Beancount is a good match for the last bullet point, but I need to write my own build system for the other points, perhaps?

IIUC yeah you'd need to wire stuff up yourself.

FWIW I wrote bazel rules for beancount (based on python v2 and some docker hackery) at some point, but I forget the state they are in.
Perhaps I could try to find them and see the state of their (dis)repair if that's useful to you?

From there on, you could write your automation however you'd like I suppose.

F

Eric Altendorf

unread,
Jul 17, 2023, 7:03:28 PM7/17/23
to bean...@googlegroups.com
Probably not :)  The work that needs to be done is just the logic for transforming and composing the data.  The actual build can just run from scratch each time; there's no need to cache or reuse partial artifacts the way a real build system does.

The remaining workflow challenge is how to take booking decisions beancount makes on one run, and create some record of them, that I can use to enforce the same booking decisions are made in a later run.  Any ideas?

 

F

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
Message has been deleted

Eric Altendorf

unread,
Jul 17, 2023, 10:24:46 PM7/17/23
to bean...@googlegroups.com
On Mon, Jul 17, 2023 at 5:58 PM Sath S <grayis...@gmail.com> wrote:

There's a ton of documentation on the importers, but it doesn't say much about how data actually gets from the extractor into beancount.  The only thing I can find is a sentence that mentions that bean-extract "produces some Beancount input text to be moved to your input file".
 
Is the process of adding data to your Beancount input file intended to be a manual copy/paste one?


Thanks, got it.

This kind of workflow makes me uncomfortable in general; mixing manual effort with automation (e.g., importers) means that you can only automate the first time (this is why one never edits generated code).  But, coordinating manual effort with automation is a very hard problem to solve cleanly in general and I can see how this works well enough for a lot of use cases.
 
 
If for some reason you need to re-import some data that you'd already imported and added to your beancount input file, is it your responsibility to manually go find the records in the input file that came from that import, remove them, and replace them with the new import data?  Is there some method or tool for tracking or managing how the beancount input file is composed from all the sources?

The ingest code has a function to identify duplicates that you can use/adapt.

OK, I'll take a look, as well as take a look at the libraries mentioned on the other thread. 

 
Let's say you run Beancount and it makes some decisions for you -- for example, matching asset sales with corresponding lots for cap gains computation.  You use these to file your taxes.  How do you make a record of those decisions and enforce that a later run of Beancount doesn't change them (e.g., if next tax year you switch to a different booking method)?

You get to declare lot matching in your file (FIFO, NONE, STRICT, etc.). As long as the Beancount code doesn't change its output (and it shouldn't), it'll produce the same output given the same input.

Even that doesn't let you change your booking method from one year to the next...

A bit part of what I'm trying to do is have more control over lot matching.  E.g., HIFO is a good general default, but in the US sometimes it forces you to pay short-term gains when a very similar long-term gains lot could have been paired, for a much lower tax burden.  In my case, these differences are very large and worth trying to handle, so I need to be able to mix and match booking methods, or write my own, which will be more complex and might evolve year to year.  (It will certainly evolve if the US changes capital gains taxation.)

Also ... it's not even clear to me that Beancount would guarantee the same outputs given the same inputs -- e.g., if you use FIFO booking, there could be many buy transactions on the same day at different prices, and I don't know if Beancount would guarantee a stable ordering among those.  Seems like it could be implementation dependent...
 
The more general problem here is one of snapshotting your accounts. Beancount is best used by software engineers, and thus, version control (git), tagging, and such work well. I generate a tax reference output file for each year, and occasionally re-generate and compare with the reference to ensure nothing's changed.

I can snapshot the beancount inputs, and I can snapshot the beancount outputs, but what I am trying to understand is if there's a way to turn the beancount outputs (its choice of lot matching) around and use them as inputs (specified lot matches) next year.
 

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Daniele Nicolodi

unread,
Jul 18, 2023, 1:35:15 PM7/18/23
to bean...@googlegroups.com
There are many workflows that are valid in different contexts. Beancount
gives you tools that can be adapted to the different workflows.

This is one reason why the importers "framework" has been split out from
the core project. It's value is mostly in providing a common interface
so that importers that translate different input formats into Beancount
data structures can be shared.

On 18/07/23 04:24, Eric Altendorf wrote:
> This kind of workflow makes me uncomfortable in general; mixing manual
> effort with automation (e.g., importers) means that you can only
> automate the first time (this is why one never edits generated code).
> But, coordinating manual effort with automation is a very hard problem
> to solve cleanly in general and I can see how this works well enough for
> a lot of use cases.

There are two modalities in which you can operate, distinguished by what
you define to be the "source of truth":

1. you have the source of truth to be the input files for your
importers, and that the importers are "smart" enough to generate a
Beancont ledger with all the information you need for further
processing. In this case, you don't need to worry about mixing manual
and automated steps: you can regenerate your ledger from the input files
at any time.

2. you can have the Beancount ledger to be the source of truth, and the
importers are meant to produce a ledger that you manually edit. For this
use case, the importers framework have support for this use case via the
deduplication system: it uses some fuzzy matching between transactions
to identify transactions in the imported batch that are already in your
ledger.

What you seem to be after is a way to update the ledger entries with new
infomartion (either by running an improved version of the importers or
by combining information from different sources). This is possible, but
Beancount does not offer any facility for this. The main reason is that
the logic for merging or updating the ledger entries the core of the
problem and it cannot be generalized. However, the "boring" code around
the problem is readily available: you can easily parse a Beancount
ledger, modify it, and serialize it back into an human readable ledger,
see the beancount.parser.parser and beancount.paerser.printer modules.
The only drawback in doing so is that the Beancount parser does nor
preserve whitespace and comments, thus you loose these when you go
through this process. The autobean.refactor project should solve the
issue, but I haven't had occasion to try it yet.

> Also ... it's not even clear to me that Beancount would guarantee the
> same outputs given the same inputs -- e.g., if you use FIFO booking,
> there could be many buy transactions on the same day at different
> prices, and I don't know if Beancount would guarantee a stable ordering
> among those.  Seems like it could be implementation dependent...

It is guaranteed to be stable.

> I can snapshot the beancount inputs, and I can snapshot the beancount
> outputs, but what I am trying to understand is if there's a way to turn
> the beancount outputs (its choice of lot matching) around and use them
> as inputs (specified lot matches) next year.

I don't understand what you are asking. The automated lot matching is
not stored anywhere thus you cannot "reuse" it when you change the tot
algorithm. However, the automated lot matching algorithm is only a
convenience: you don't need to use it and always specify lots manually.

Specifying the lots explicitly is the easiest solution if you want some
"non-standard" lot matching or you want the lot matching to change over
time. More complex solutions may involve plugins to augment the
transaction read from your ledger with lot information computed with
some arbitrary algorithm.

Cheers,
Dan

Eric Altendorf

unread,
Jul 18, 2023, 2:08:02 PM7/18/23
to bean...@googlegroups.com
Thanks so much for the detailed reply.  I'm slowly but surely wrapping my head around everything.

On Tue, Jul 18, 2023 at 10:35 AM Daniele Nicolodi <dan...@grinta.net> wrote:
[...]

There are two modalities in which you can operate, distinguished by what
you define to be the "source of truth":

1. you have the source of truth to be the input files for your
importers, and that the importers are "smart" enough to generate a
Beancont ledger with all the information you need for further
processing. In this case, you don't need to worry about mixing manual
and automated steps: you can regenerate your ledger from the input files
at any time.

2. you can have the Beancount ledger to be the source of truth, and the
importers are meant to produce a ledger that you manually edit. For this
use case, the importers framework have support for this use case via the
deduplication system: it uses some fuzzy matching between transactions
to identify transactions in the imported batch that are already in your
ledger.

Yes, this is a good framing.  (2) makes sense for a lot of use cases, but
I was aiming for something more like (1).  But we'll see, I can experiment
with both.

What you seem to be after is a way to update the ledger entries with new
infomartion (either by running an improved version of the importers or
by combining information from different sources). This is possible, but
Beancount does not offer any facility for this. The main reason is that
the logic for merging or updating the ledger entries the core of the
problem and it cannot be generalized. However, the "boring" code around
the problem is readily available: you can easily parse a Beancount
ledger, modify it, and serialize it back into an human readable ledger,
see the beancount.parser.parser and beancount.paerser.printer modules.
The only drawback in doing so is that the Beancount parser does nor
preserve whitespace and comments, thus you loose these when you go
through this process. The autobean.refactor project should solve the
issue, but I haven't had occasion to try it yet.

Makes sense.  I think for me it may depend on how diverse the manual
tweaks are -- if they're diverse, then as you say it's impractical to
generalize them.  If there are a small number of regular ones in my
limited domain, then I may be able to use a simple declarative language
to specify the tweaks to be applied automatically.

> I can snapshot the beancount inputs, and I can snapshot the beancount
> outputs, but what I am trying to understand is if there's a way to turn
> the beancount outputs (its choice of lot matching) around and use them
> as inputs (specified lot matches) next year.

I don't understand what you are asking. The automated lot matching is
not stored anywhere thus you cannot "reuse" it when you change the tot
algorithm.

Well, that was what I was asking -- is there a way to do that, and you have
answered "no" :)
 
However, the automated lot matching algorithm is only a
convenience: you don't need to use it and always specify lots manually.

Specifying the lots explicitly is the easiest solution if you want some
"non-standard" lot matching or you want the lot matching to change over
time. More complex solutions may involve plugins to augment the
transaction read from your ledger with lot information computed with
some arbitrary algorithm.

I have way too many transactions to manually assign lots.  But simple
HIFO/FIFO across all accounts and all time is not sufficient (if it were,
I'd just use one of the many online services for computing cap gains).
I need a way to experiment with lot selection algorithms, then when I
find one satisfactory enough to use it to file my taxes, "freeze" those
lot selections so that future experimentation doesn't change selections
that have already been reported to the IRS.

Since Beancount already does the computation to match lots, and
since it also provides a language for specifying manually matched lots,
it seems like it could be not that difficult to extend Beancount with the
functionality to emit its lot-matching decisions using the lot-matching
features of the language.  Does this seem reasonable?  If so, I can
look into how this might be done...

I'm a little surprised this doesn't seem to be a more common problem;
I would think this would be useful to many traders.

thanks again,

eric

Martin Blais

unread,
Jul 30, 2023, 11:51:09 AM7/30/23
to bean...@googlegroups.com
On Wed, Jul 19, 2023 at 2:08 AM Eric Altendorf <erical...@gmail.com> wrote:
Thanks so much for the detailed reply.  I'm slowly but surely wrapping my head around everything.

[...] 
I'm a little surprised this doesn't seem to be a more common problem;
I would think this would be useful to many traders.

Lots to say but others have already filled in with great answers. Overall based on this thread it seems reasonably clear to me you'd be better served by building something custom for what you're trying to achieve. Kind-of like what I've done with project "Johnny" but specialized to (a) crypto and (b) booking/matching rules that corresponding to your particular tax reporting requirements. I don't think you'll get much from Beancount, you're likely to invest enough time that what it'll provide for you won't matter much.

That being said, some of the data structures could be used in implementing your booking algo, but then again, it's probably simpler to write ones that will do precisely what you need.

 
Reply all
Reply to author
Forward
0 new messages