Beancount v3

880 views
Skip to first unread message

Martin Blais

unread,
Jul 4, 2020, 2:34:50 AM7/4/20
to Beancount
Hi,
Today I'm starting development on Beancount v3. 

This is going to be a pretty big change and will take a while. 
I've laid down the details in this document:

This file describes the new set of dependencies for it:

And there is a dedicated installation file for the in-development version:

The short version is that v3's core is going to be ported to C++ using a Bazel build, and the codebase will be sectioned between core and the rest.
I just merged the new build definition in master.

The current head will be branched as "v2" and maintained stable. 
It will build with both setup.py and Bazel.
Backward compatible fixes to it will be done there and merged into v3.
v3 development will occur on branch "master" and breaking changes will occur there.

Comments appreciated (on the docs, or here if you prefer),

Martin Michlmayr

unread,
Jul 4, 2020, 3:27:47 AM7/4/20
to bean...@googlegroups.com
* Martin Blais <bl...@furius.ca> [2020-07-04 02:34]:
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

This sounds all very exciting. Thanks for writing down your ideas and
for working on this.

Some comments:

> beancount/ingest/importers: someone could revive a repository of
> importer implementations, like what LedgerHub once aimed to become"

I'd really like to see this. There are a number of importers on
GitHub but it would be nice to have one repository with high-quality
importers for beancount.

I'm not stepping up to maintain it, but I'm interested in
contributing.

> The conversion to Ledger and HLedger from Beancount now seems
> largely useless, I'm not sure anyone's using those. I'll probably
> move these to another repo, where they would eventually rot, or if
> someone cares, adopt them and maintain or evolve them.

I'm interested in maintaining the beancount to ledger conversion
scripts.

--
Martin Michlmayr
https://www.cyrius.com/

Kirill Goncharov

unread,
Jul 4, 2020, 4:20:11 AM7/4/20
to Beancount
Hi,

What is the scope of expected breaking changes and how difficult will it be to migrate? I use beancount a lot and particularly interested in API changes in core, prices, ingest, loader and query subpackages.

Will beancount v3 be available on PyPI? It's not clear from the docs.

Patrick Ruckstuhl

unread,
Jul 4, 2020, 5:45:23 AM7/4/20
to Martin Blais, Beancount
Hi,

I have to say I really like the vision and I see a lot of good ideas and points in there.

A couple of thoughts from my side.

I think splitting this up into different projects will be helpful and allow easier contributions. I think we should also have some "packaging" of the different parts together as a "distribution" for "pure users" e.g. as a docker container or maybe other formats.

I see a danger of trying to do too much. If possible I would try to have a smaller first step that converts over to the new architecture/project setup and then tries to add new and additional features.

Regards,
Patrick

Andre Engelbrecht

unread,
Jul 4, 2020, 6:09:34 AM7/4/20
to Beancount
I specifically moved from ledger to beancount because:
  1. Strictly defined directives, currencies and accounts
  2. Python, and ability to easily extend with python
  3. Currency restrictions and currency related stuff
  4. Fava
Another bonus was that it's simple to install with just pip.

I would prefer not needing to spin up a docker container and hassle with
making it communicate over my network with other apps/plugins that might use
it's API. It's an incredible barrier to entry for the non-developer. They already have to
know how to use pypi and potentially struggle with the python path. Making them
use docker now would be a bad move imo.

Hopefully this will be distributed as a binary, and simple way to "install" plugins.

Disclosure: I've converted a couple of people from using webapps or spreadsheets to
use ledger. They love ledger despite some of the nuance. I've been trying to move them
over to beancount, but having a bit harder time because of the python dependency.
Fava and being able to run it locally has been a great driver to have people at least consider
installing python and giving beancount a test run.

I do hope with this update that distributing/installing beancount wouldn't get more complicated.
We already had to deal with that in the frontend javascript world :D

Martin Blais

unread,
Jul 4, 2020, 10:04:52 AM7/4/20
to Beancount
On Sat, Jul 4, 2020 at 6:09 AM Andre Engelbrecht <litt.f...@gmail.com> wrote:
I specifically moved from ledger to beancount because:
  1. Strictly defined directives, currencies and accounts
Will not change, if anything it'll get stricter.
 
  1. Python, and ability to easily extend with python
Bindings will be provided as today. I'm hoping to be able to expose the same or a very similar API.
 
  1. Currency restrictions and currency related stuff
Not sure what you mean (conversions at price?), but it should be the same. 

  1. Fava
Another bonus was that it's simple to install with just pip.

Short of baking binaries for every single platform, that will become a bit more difficult perhaps.
I think it's possible, these are details to be figured out later.

Beancount will still have a lot of Python involved, it's really just the core part that I want to speed up, so ideally the whole core could be wrapped up in an extension module (instead of just the parser).


I would prefer not needing to spin up a docker container and hassle with
making it communicate over my network with other apps/plugins that might use
it's API. It's an incredible barrier to entry for the non-developer. They already have to
know how to use pypi and potentially struggle with the python path. Making them
use docker now would be a bad move imo.

Docker will not be required; one should be able to build it locally and run it from there, and package the resulting binaries.
That's more of a convenience than anything else; actually others will build container support if they want to.


Hopefully this will be distributed as a binary, and simple way to "install" plugins.

As previously, I'll leave most of the packaging details to distribution maintainers,.
For plugins, Python ones should be similar as before.
Faster plugins implemented in C++ will require to implement some support for dlopen and an environment to build them, those details will be figured out later. It should be possible (I don't see any particular reason it wouldn't, especially since the data interface will be well specified via protocol buffers). I suspect that many users with large ledgers will want to convert some of their plugins to C++ to get the associated performance.

 
Disclosure: I've converted a couple of people from using webapps or spreadsheets to
use ledger. They love ledger despite some of the nuance. I've been trying to move them
over to beancount, but having a bit harder time because of the python dependency.
Fava and being able to run it locally has been a great driver to have people at least consider
installing python and giving beancount a test run.

Installing Python is a pretty low bar IMHO...


I do hope with this update that distributing/installing beancount wouldn't get more complicated.
We already had to deal with that in the frontend javascript world :D

I hear you. Bazel should provide a pretty stable build experience.

  
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/986ed772-bf70-4d29-89b9-109fc2d3dc28o%40googlegroups.com.

James Cook

unread,
Jul 4, 2020, 10:54:27 AM7/4/20
to bean...@googlegroups.com
On Sat, 4 Jul 2020 at 06:34, Martin Blais <bl...@furius.ca> wrote:
> Hi,
> Today I'm starting development on Beancount v3.
>
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

I have a question about the currency accounts section.

For this example:

2020-06-02 * "Bought document camera"
Expenses:Work:Conference 59.98 EUR @ USD
Liabilities:CreditCard -87.54 USD
Equity:CurrencyAccounts:EUR -59.98 EUR
Equity:CurrencyAccounts:USD 97.54 USD

is the idea that I just enter the first three lines in my .beancount
file, and the last two postings will be implicitly added? If so, that
matches closely with what I already do, which is great.

I do not separate my trading accounts by currency, so I have been able
to get away with just writing something like this:

2020-06-02 * "Bought document camera"
Expenses:Work:Conference 59.98 EUR
Liabilities:CreditCard -87.54 USD
Equity:Trade

and let Beancount fill in the Equity:Trade amounts. This is not too
painful to do manually.

James

Martin Blais

unread,
Jul 4, 2020, 12:59:58 PM7/4/20
to Beancount
On Sat, Jul 4, 2020 at 10:54 AM James Cook <jc...@cs.berkeley.edu> wrote:
On Sat, 4 Jul 2020 at 06:34, Martin Blais <bl...@furius.ca> wrote:
> Hi,
> Today I'm starting development on Beancount v3.
>
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

I have a question about the currency accounts section.

For this example:

2020-06-02 * "Bought document camera"
  Expenses:Work:Conference      59.98 EUR @ USD
  Liabilities:CreditCard       -87.54 USD
  Equity:CurrencyAccounts:EUR  -59.98 EUR
  Equity:CurrencyAccounts:USD   97.54 USD

is the idea that I just enter the first three lines in my .beancount
file, and the last two postings will be implicitly added?

Yes.

 
If so, that
matches closely with what I already do, which is great.

I do not separate my trading accounts by currency, so I have been able
to get away with just writing something like this:

2020-06-02 * "Bought document camera"
  Expenses:Work:Conference      59.98 EUR
  Liabilities:CreditCard       -87.54 USD
  Equity:Trade

and let Beancount fill in the Equity:Trade amounts. This is not too
painful to do manually.

There's a downside to the method you're using: there's no price and mistakes will go unchecked.
It would be more robust to use @, let Beancount do its thing, and to write a plugin to strip the price and insert the postings.

To be fair, if you used two out of the 3 numbers in a conversion at price, that would also pass because interpolation would fill in whatever number is needed. An idea for validation could be to check that inferred prices aren't too far from the prior price point (in time) in the price database. I've added this idea to the document.



James


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

James Cook

unread,
Jul 4, 2020, 1:36:09 PM7/4/20
to bean...@googlegroups.com
Yes, plugin support would be good. I think a simple plugin that sees

Expenses:Work:Conference 60.00 EUR @ 2 USD

and transforms that into

Expenses:Work:Conference 60.00 EUR
Equity:Trade -60.00 EUR
Equity:Trade 120.00 USD

would match my use case pretty well. I'm not sure if that's exactly
what the existing trading accounts plugin you linked to does. If it's
not, it's simple enough I might just write it.

James

Martin Blais

unread,
Jul 4, 2020, 1:55:41 PM7/4/20
to Beancount
On Sat, Jul 4, 2020 at 1:36 PM James Cook <jc...@cs.berkeley.edu> wrote:
>
> There's a downside to the method you're using: there's no price and mistakes will go unchecked.
> It would be more robust to use @, let Beancount do its thing, and to write a plugin to strip the price and insert the postings.

Yes, plugin support would be good. I think a simple plugin that sees

  Expenses:Work:Conference      60.00 EUR @ 2 USD

and transforms that into

  Expenses:Work:Conference      60.00 EUR
  Equity:Trade -60.00 EUR
  Equity:Trade 120.00 USD

would match my use case pretty well. I'm not sure if that's exactly
what the existing trading accounts plugin you linked to does. If it's
not, it's simple enough I might just write it.

Yes, but there are a few gotchas, see comments:
It's 95% of the way there. 
In v3 I'd like to make this the main method and to iron out all the remaining issues.




James

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Justus Pendleton

unread,
Jul 5, 2020, 2:48:00 AM7/5/20
to Beancount
On Saturday, July 4, 2020 at 1:34:50 PM UTC+7, Martin Blais wrote:
Hi,
Today I'm starting development on Beancount v3. 

I'm excited to see (hopefully, fingers crossed!) more active development on beancount restarting. Even though performance hasn't been an issue for me and beancount works for 99% of my use cases, there's still lots of good ideas for improvements out there. My comments here focus on trying to increase the size of the contributors that you mention in the document, though I have a feeling the earliest days of a rewrite isn't really the best time to push for dramatic increases in contributors.

As you identify, the biggest problem facing beancount from a project management perspective is "Martin doesn't have enough hours". So I was thinking...what are things I'd do in my professional life when faced with that situation. Usually it involves some combination of delegation to others with important issues bubbling up for "sign off", high-level summary/reporting, and semi-regular synchronous check-ins. Which of those can we adapt to beancount? The overall general idea is to increase the cadence of activity by having more eyes & voices via devolved & limited kinds of authority while Martin builds up confidence in "lieutenants" before graduating to full-on BDFL. Here's are some half-baked ideas:

* Have a "bugmaster". This is a community member who isn't even necessarily that acquainted with the code base (perhaps not even technical at all) but they are engaged with the project and can help reply to new bug reports quickly and triage them. Close duplicates, ask for more information & reproduction steps quickly (i.e. while the reporter is still paying attention), and help bring important issues to the attention of Martin & other people doing development.
* Have "code reviewers" who do not (yet) have commit rights. Looking over new PRs, pointing out better ways to utilize the APIs, or even collaborating on adding tests. The idea is to reduce the amount of effort Martin has to do to accept PRs and provide a stepping stone for contributors to full-write contributions trust levels.
* Make providing tests a "two-stage commit". As Martin rightly notes, providing tests is important in any long-lived project. But it also works against the "first time contributor rush" of having a PR accepted. Are there way we can convert more of those contributors and get them more engaged? Can we accept the original PR but track that tests need to be added, either by the original contributor in a subsequent PR or by another contributor. Of course, this runs the that "core contributors" do nothing but writes tests for other people. I said my thoughts were only half-baked!
* Martin mentions "monthly team meetings" which I think is a good idea -- it provides a synchronization point for things like the bugmaster or the code reviewer to agitate for action on something that seems to have been stalled. Though I'm less sure about the exact style & format. Monthly versus quarterly? Zoom video style versus Discord text style? I think we'd have to see some proposed agendas of what such a team meeting might be about to say much more.

Stefano Zacchiroli

unread,
Jul 6, 2020, 5:00:24 AM7/6/20
to bean...@googlegroups.com
On Sat, Jul 04, 2020 at 02:34:35AM -0400, Martin Blais wrote:
> Today I'm starting development on Beancount v3.
>
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

This is very exciting. And, as usual, your design documents are very
interesting and insightful to read. I took some time to read through all
of them and I'm sharing some thoughts of mine about them below.

==================================


Directives
----------

Having as output of beancount core two streams of clearly separated
incomplete/syntactic v. complete/semantic directives sounds like a great
approach. In terms of terminology, you might use the "raw v. cooked"
terminology (which I've picked up from proof assistants years ago, but
which I find fitting here; YMMV). It's not yet clear to me if both
streams will be accessible to plugins (I think they should). And, if
they are, how will they be interleaved: a single stream with both raw
and cooked transactions? Two separate streams?


Parser
------

You mention you're gonna keep using flex/bison, which is for sure well
known technology. However, the expressivity of bison grammars make it
kinda hard to hack on existing parsers, raising the barrier for
contributors. Have you considered switching to PEG parsing?

Unrelated (but still on parsing), I don't understand your point about
getting rid of the cache. Sure, we all hope it will no longer needed for
interactive use, but it would still be useful for people building small
services on top of relatively static Beancount ledgers; including Fava.
Also, as the output of Beancount core is gonna be streams of protobufs,
those will be trivial to serialize, and also cross language, why not
imagine a cache of protobufs serialized on disks?

The rework of includes sounds great. We have discussed it on the list in
the past, so I guess it's your goal, but as it's not explicitly stated
in the design doc let me repeat it here. I think the goal should be
"include invariance", i.e., one should always be able to take an
existing Beancount ledger in a single file and break it down in an
arbitrary amount of smaller ledger files that include each other,
without any semantic change. (The stated goal in your doc of being able
to declare plugins elsewhere than in the main file will derive from
this, but this principle is more general.)

The main feature I lack to have feature parity with Ledger-CLI is the
ability to add tags to individual transaction legs. I'm assuming this
will go hand-in-hand with relaxing the distinction between metadata/
tags/ links (by making them syntactic sugar for metadata, I'm guessing),
which is great, thanks!


Ulque
-----

This sounds like an exciting project.

In addition to support for balance columns and totals, there are a bunch
of other features that would be very welcome, like the ability to filter
out 0 columns, or to add derived columns (e.g., differences between
columns, to compute P&L in investments). I don't know how much you plan
to build on top of Pandas (which will trivially offer many of these),
but it is absolutely brilliant to see the analogy between the two
worlds.

Something I'm surprising to haven't see mentioned on this is your vision
(which we discussed a while ago on list) that the hierarchical nature of
the account hierarchy is kinda arbitrary and gets in the way (e.g., one
often wants to pivot around from "Expenses:Home:Repair +
Expenses:Car:Repair" to "Expenses:Repair:Home + Expenses:Repair:Car" as
there is no right or wrong hierarchy there). Is this idea of being able
to pivot around the account hierarchy, considering each component a
facet of sort, part of your plans for Ulque, or is it out of scope?


Code quality
------------

Typing: outside of Google I've the feeling that the state-of-the-art
static type checker is Mypy. I've myself migrated a substantial codebase
to it and it's a vibrant environment (with a lot of involvement from
Guido himself) and active development that goes hand in hand with the
refinement of the type system (via periodic PEPs). I'd be weary of going
pytype instead of Mypy, even though I realized that the type annotations
are (supposed to be) compatible.

How about automated code formatting via Black?
(https://github.com/psf/black) I've recently switched to it a
substantial code base and I find it pretty life changing. It would also
help contributors I think, which is one of your worthwhile meta-goals
for v3.


Strict payee
------------

YAY, everything that makes possible to have even more automated sanity
checks is a welcome addition. I wonder if a relaxed policy where any
new payee is OK on first use even if undeclared, unless it's "near" (as
string distance) to a previous one would work well as a default policy.
But that's probably a matter for a plugin anyway...


Unsigned debit and credit
-------------------------

This is a very concrete need, which I routinely struggle with when
showing accounting reports extracted from Beancount (or Fava) to other
family members. But I'm surprised you mention it as a potential feature
for Beancount itself. Wouldn't it belong to front-ends, like Fava (or
maybe Ulque in the future), instead? In the view of "Beancount as an
accounting calculator", which I've always adhered too, that seems to
belong elsewhere.


bean-sed
--------

This is something which is not in your design documents, but seems
important enough to me to be mentioned in light of a new Beancount
generation. In plain text accounting we maintain two things at once: the
semantic information captured in our books, and the syntax of those
books, which matters more than the syntax of paper-based books (which is
why we use Git to version and often allow ourselves to amend/curate very
old transactions, which is something you never do with paper-based
books, and for sure not reaching further in the past before the most
recent book closure).

But our textual books grow larger and we often need to perform batch
changes. E.g., split an account category, merge some, rename accounts,
etc., spanning all our books. Some of these operations are purely
syntactic, some have impact on the semantics of our accounting data. I
think we need a tool to automate this, more powerful than search and
replace in vim/emacs, and with some knowledge of the data it's
manipulating.

The current style of plugins is not useful for this need. It is OK to
patch transactions/directives post parsing, but cannot reflect those
changes back to the textual books.

Would something like this fit your vision for Beancount 3? In
particular, I'd like to know if the raw/syntactic directives you imagine
coming out of the new Beancount core would be close enough to the book
concrete syntax to allow manipulation such as meddling with spacing
Provided that, and a good pretty printer for concrete syntax, a
"bean-sed" project with a dedicated manipulation language can probably
be created and maintained separately of core.


==================================


> The short version is that v3's core is going to be ported to C++ using a
> Bazel build, and the codebase will be sectioned between core and the rest.
> I just merged the new build definition in master.

Bazel is indeed a great build system, but you should know that, at least
for now, it is not in Debian/Ubuntu yet. So for the time being it will
be impossible to ship Beancount v3 on those distros (and any other
Debian-based distro) until Bazel itself is part of Debian. Work is
ongoing (see: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654
), but I'm unable to guess when it will actually happen.


Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Stefano Zacchiroli

unread,
Jul 6, 2020, 7:58:10 AM7/6/20
to bean...@googlegroups.com
On Sat, Jul 04, 2020 at 03:09:34AM -0700, Andre Engelbrecht wrote:
> *Disclosure*: I've converted a couple of people from using webapps or
> spreadsheets to use ledger. They love ledger despite some of the
> nuance. I've been trying to move them over to beancount, but having a
> bit harder time because of the python dependency. Fava and being able
> to run it locally has been a great driver to have people at least
> consider installing python and giving beancount a test run.

Same for me. I've converted some people to Beancount thanks to the fact
that "pip install beancount" was simple enough (although indeed still a
barrier for non-dev people) for them to install. Bazel will be a much
higher barrier for people to install/use Beancount. I totally understand
switching to it from a dev point of view, but it would be great to
maintain the ability to install via pip.

I've used pypi to ship weird python code depending on a huge java
bundle, and I know it works well. If there is a way to ship (and then
select) static binaries for the non-Python parts for multiple
architectures (the most popular ones) via pip, I think it'd be totally
worth it in terms of user base.

Stefano Zacchiroli

unread,
Jul 6, 2020, 10:13:42 AM7/6/20
to bean...@googlegroups.com
On Sat, Jul 04, 2020 at 11:48:00PM -0700, Justus Pendleton wrote:
> * Have a "bugmaster". This is a community member who isn't even necessarily
> that acquainted with the code base (perhaps not even technical at all) but
> they are engaged with the project and can help reply to new bug reports
> quickly and triage them. Close duplicates, ask for more information &
> reproduction steps quickly (i.e. while the reporter is still paying
> attention), and help bring important issues to the attention of Martin &
> other people doing development.

FWIW, based on my experience in maintaining very popular packages in
Debian (a long time ago...), this idea is appealing but doesn't work
well in practice. Bug triaging isn't in the end a lot of time, in
comparison to development, and a maintainer who knows the code well is
going to be extremely more efficient than someone less into the code. So
you don't gain much and you also incur the risks of mis-triaging bugs,
that are gonna cost you time to re-triage properly later. YMMV.

> * Martin mentions "monthly team meetings" which I think is a good idea
> -- it provides a synchronization point for things like the bugmaster
> or the code reviewer to agitate for action on something that seems to
> have been stalled. Though I'm less sure about the exact style &
> format. Monthly versus quarterly? Zoom video style versus Discord
> text style? I think we'd have to see some proposed agendas of what
> such a team meeting might be about to say much more.

This one is pretty cool on the other hand. In my experience what works
super well are in person hackatons, ideally over more than 1 day. I've
never tried the pure online version of them, but I guess by now there
should be some experience in how to manage them effectively (assuming
they could be made to work). Not sure if just short meetings (with no
actual coding time) would be worth it --- don't we all already have way
too many meetings anyway? --- but it might be worth a try.

I'm not sure I'll have much to contribute, given my very sporadic track
record of contributing to Beancount, but I'll be happy to try if any of
these happens.

Daniele Nicolodi

unread,
Jul 6, 2020, 4:25:20 PM7/6/20
to bean...@googlegroups.com
Hello Justus,

open source projects evolve and function in very different ways from
more structured enterprises. They are often described as do-ocracies:
the control is in the hands of those that step up to do things that go
in what is perceived to be the direction of the project as a whole.
There should be no need to assign official roles for people to do some
work as there should not be expectations on what should be the time
commitment of anyone involved, Martin included.

Beancount is not in a stage in which there are too many contributions
and the bottleneck is the organization of the contributions, but it is a
successful project that works well enough for mot people (as recent
messages on this mailing list express).

On 05/07/2020 00:48, Justus Pendleton wrote:
> * Have a "bugmaster". This is a community member who isn't even
> necessarily that acquainted with the code base (perhaps not even
> technical at all) but they are engaged with the project and can help
> reply to new bug reports quickly and triage them. Close duplicates, ask
> for more information & reproduction steps quickly (i.e. while the
> reporter is still paying attention), and help bring important issues to
> the attention of Martin & other people doing development.

Stefano already commented on this and I completely share is doubts on
the effectiveness of this. In addition to that I would like to point out
that as far as I can tell there are no bugs affecting existing
functionality in the bug tracker, but only requests for extended
functionality.

It is very easy to quickly described what is perceived as a desired
feature in a ticket in the bug tracker (or as a few lines patch),
however it is not as easy to understand all implications and evaluate
the feature in the context of the vision for the project.

If someone feels to have a sufficient understanding of the project and
of the direction in which it is aimed, nothing prevents they to comment
on specific issue or pull request. I do so when I bump something I find
interesting (either because it affect my use of Beancount or because I
find the problem stimulating).

> * Have "code reviewers" who do not (yet) have commit rights. Looking
> over new PRs, pointing out better ways to utilize the APIs, or even
> collaborating on adding tests. The idea is to reduce the amount of
> effort Martin has to do to accept PRs and provide a stepping stone for
> contributors to full-write contributions trust levels.

We are not swamped into PRs. At the moment there are only three open PRs
with important changes to the codebase and they have been open by less
than a week (I am the author of all of them). If anyone (especially
someone with a deeper knowledge of Flex/Bison than me) wants to comment
on them they are welcome, but I don't see PRs review as being the factor
that hinders Beancount development.

> * Make providing tests a "two-stage commit". As Martin rightly notes,
> providing tests is important in any long-lived project. But it also
> works against the "first time contributor rush" of having a PR accepted.
> Are there way we can convert more of those contributors and get them
> more engaged? Can we accept the original PR but track that tests need to
> be added, either by the original contributor in a subsequent PR or by
> another contributor. Of course, this runs the that "core contributors"
> do nothing but writes tests for other people. I said my thoughts were
> only half-baked!

Please, no. How is someone expected to evaluate the quality and
correctness of a change if there are no tests? Who is going to write the
tests for the added functionality? Writing tests requires precise
understand if the functionality so it is much easier for the original
author to write them. Also, writing tests is not the most fun part of
development, I definitely would not spend my free time doing it for
features I have not contributed.

I actually would go the other way around and require that PRs come also
with documentation updates or additions. However this is harder to track
with the source of the documentation being in Google Docs.

Beancount is a mature project and I don't see the reason to encourage
pass-by submission of half backed PRs. This would only increase the
burden on the maintainer(s).

> * Martin mentions "monthly team meetings" which I think is a good idea
> -- it provides a synchronization point for things like the bugmaster or
> the code reviewer to agitate for action on something that seems to have
> been stalled. Though I'm less sure about the exact style & format.
> Monthly versus quarterly? Zoom video style versus Discord text style? I
> think we'd have to see some proposed agendas of what such a team meeting
> might be about to say much more.

I agree with Stefano here: sprint like meetings (to use the jargon of
the Zope community, or hackatons if you prefer) are probably effective
in pushing development forward. Other formats may help but it depends on
many factors. Face-to-face (virtual or physical) meetings are important
to put faces behind messages on the mailing list (or Github tickets or
PRs). Because of the human nature, this allows for more effective
communication subsequently, but I don't know if a regular meeting with
many individuals involved would provide much benefit.

Finally, please no proprietary platforms. We are developing a free
software project and the tools required for participating should ideally
also be free. I don't see the reason to use Zoom or Discord when so many
free (as in free speech) exist.

Cheers,
Dan

Daniele Nicolodi

unread,
Jul 6, 2020, 6:18:46 PM7/6/20
to bean...@googlegroups.com
On 06/07/2020 03:00, Stefano Zacchiroli wrote:
> You mention you're gonna keep using flex/bison, which is for sure well
> known technology. However, the expressivity of bison grammars make it
> kinda hard to hack on existing parsers, raising the barrier for
> contributors. Have you considered switching to PEG parsing?

I toyed with the idea of writing a PEG parser for Beancount syntax, but
I haven't found a nice PEG parser generator. The Beancount syntax is
also fairly regular, thus the Bison grammar is actually not that bad to
read. Also, there is the desire to keep the v2 and v3 parser definitions
as close as possible.

> The rework of includes sounds great. We have discussed it on the list in
> the past, so I guess it's your goal, but as it's not explicitly stated
> in the design doc let me repeat it here. I think the goal should be
> "include invariance", i.e., one should always be able to take an
> existing Beancount ledger in a single file and break it down in an
> arbitrary amount of smaller ledger files that include each other,
> without any semantic change. (The stated goal in your doc of being able
> to declare plugins elsewhere than in the main file will derive from
> this, but this principle is more general.)

I have done some work on the parser and I would like to lift the current
limits on included also for v2. Once the parser rework lands, it should
fairly be straightforward.

> The main feature I lack to have feature parity with Ledger-CLI is the
> ability to add tags to individual transaction legs. I'm assuming this
> will go hand-in-hand with relaxing the distinction between metadata/
> tags/ links (by making them syntactic sugar for metadata, I'm guessing),
> which is great, thanks!

This is on the to do list.

> In
> particular, I'd like to know if the raw/syntactic directives you imagine
> coming out of the new Beancount core would be close enough to the book
> concrete syntax to allow manipulation such as meddling with spacing
> Provided that, and a good pretty printer for concrete syntax, a
> "bean-sed" project with a dedicated manipulation language can probably
> be created and maintained separately of core.

I am far from being a parsing expert, but I think having the parser emit
a syntax tree suitable to reconstruct the input file without
modifications is going to be very complex: the scanner would need to
emit many more tokens for input that is now simply ignored (ie trailing
whitespace) and the grammar would need to handle those, making it more
complex. The representation of the parsing results would also be more
complex. A lot of work to support a single tool.

I think that a tool like the one you describe should use the syntax tree
and the actual file content in combination to rewrite the input file:
the syntax tree allows to identify which elements need to be modified
and from these the position in the input files where text changes need
to happen. Sounds complex, but I believe less complex than augmenting
the parser.

> Bazel is indeed a great build system, but you should know that, at least
> for now, it is not in Debian/Ubuntu yet. So for the time being it will
> be impossible to ship Beancount v3 on those distros (and any other
> Debian-based distro) until Bazel itself is part of Debian. Work is
> ongoing (see: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654
> ), but I'm unable to guess when it will actually happen.

I had a similar reaction to Bazel. My secret plan is to maintain a
parallel build system based on Meson. I did a quick reality check and it
seems that all prerequisites can be build with Meson. I think Meson is
more non-developer and distribution friendly than Bazel.

Cheers,
Dan

Martin Blais

unread,
Jul 7, 2020, 12:43:06 AM7/7/20
to Beancount
On Sat, Jul 4, 2020 at 3:27 AM Martin Michlmayr <t...@cyrius.com> wrote:
* Martin Blais <bl...@furius.ca> [2020-07-04 02:34]:
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

This sounds all very exciting.  Thanks for writing down your ideas and
for working on this.

Some comments:

> beancount/ingest/importers: someone could revive a repository of
> importer implementations, like what LedgerHub once aimed to become"

I'd really like to see this.  There are a number of importers on
GitHub but it would be nice to have one repository with high-quality
importers for beancount.

This is in some sense a bit of a tenuous idea. I like it in theory, but it might makes little sense to share importers between people of different geographical areas, except some very generic importers (e.g. any CSV file with auto-detect column types). It's unclear to me that a single world-wide repo would find a lot of reuse and the costs may outweigh the benefits. Even between West coast and East coast USA the set of banks differs enough.
 
 
I'm not stepping up to maintain it, but I'm interested in
contributing.

> The conversion to Ledger and HLedger from Beancount now seems
> largely useless, I'm not sure anyone's using those. I'll probably
> move these to another repo, where they would eventually rot, or if
> someone cares, adopt them and maintain or evolve them.

I'm interested in maintaining the beancount to ledger conversion
scripts.

Please go forth ... ledger2beancount seems like a fine place for it.

 

--
Martin Michlmayr
https://www.cyrius.com/

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 7, 2020, 12:46:40 AM7/7/20
to Beancount
On Sat, Jul 4, 2020 at 4:20 AM Kirill Goncharov <kdgon...@gmail.com> wrote:
Hi,

What is the scope of expected breaking changes and how difficult will it be to migrate? I use beancount a lot and particularly interested in API changes in core, prices, ingest, loader and query subpackages.

Syntax and semantics will very likely remain the same.
API also likely to remain very close, but more of the code will be bound to Python from C++ (no guarantees, pybind11 may bring some challenges). In any case, the schema will have a similar shape and so it should be pretty straightforward to port existing code.
The query tool will change a lot, and by that, I mean improve. It's also going to be rewritten from the ground up and be better tested, and also have types.


Will beancount v3 be available on PyPI? It's not clear from the docs.

Not 100% sure, I'd like to; we'll have to figure out this detail later. 
In theory if we can build binaries, we should be able to package it.
 

 
On Saturday, July 4, 2020 at 9:34:50 AM UTC+3, Martin Blais wrote:
Hi,
Today I'm starting development on Beancount v3. 

This is going to be a pretty big change and will take a while. 
I've laid down the details in this document:

This file describes the new set of dependencies for it:

And there is a dedicated installation file for the in-development version:

The short version is that v3's core is going to be ported to C++ using a Bazel build, and the codebase will be sectioned between core and the rest.
I just merged the new build definition in master.

The current head will be branched as "v2" and maintained stable. 
It will build with both setup.py and Bazel.
Backward compatible fixes to it will be done there and merged into v3.
v3 development will occur on branch "master" and breaking changes will occur there.

Comments appreciated (on the docs, or here if you prefer),

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Stefano Zacchiroli

unread,
Jul 7, 2020, 3:08:30 AM7/7/20
to bean...@googlegroups.com
On Mon, Jul 06, 2020 at 04:18:41PM -0600, Daniele Nicolodi wrote:
> > In particular, I'd like to know if the raw/syntactic directives you
> > imagine coming out of the new Beancount core would be close enough
> > to the book concrete syntax to allow manipulation such as meddling
> > with spacing Provided that, and a good pretty printer for concrete
> > syntax, a "bean-sed" project with a dedicated manipulation language
> > can probably be created and maintained separately of core.
>
> I am far from being a parsing expert, but I think having the parser
> emit a syntax tree suitable to reconstruct the input file without
> modifications is going to be very complex: the scanner would need to
> emit many more tokens for input that is now simply ignored (ie
> trailing whitespace) and the grammar would need to handle those,
> making it more complex. The representation of the parsing results
> would also be more complex. A lot of work to support a single tool.
>
> I think that a tool like the one you describe should use the syntax
> tree and the actual file content in combination to rewrite the input
> file: the syntax tree allows to identify which elements need to be
> modified and from these the position in the input files where text
> changes need to happen. Sounds complex, but I believe less complex
> than augmenting the parser.

All good points. Its indeed a bit tricky (I've done it in the past for
an unrelated project) and it boils down to keeping around both a
concrete syntax tree (with all the spacing, for instance) and an
abstract syntax tree. The former is particularly annoying because to
have one that round-trips with textual input you often have to adapt the
lexer too.

I agree it looks like quite a burden for a single tool --- even though I
think it's a very important one to have, due to the intrinsic nature of
plain text accounting. And also, the alternative looks worse to me:
people just sed or search/replace in their ledgers messing up spacing or
worse. I don't think the user experience in doing that is great, and
that affects our user base.

There is an alternative though. Define a single canonical way to indent
Beancount textual ledgers and have a tool like Python's Black that
reformats a Beancount ledger (or even isolated directives) that way.
Right now there are some ambiguities, e.g., do you indent a metadata
attached to a transaction leg or not? Do you put them on the same line
of the transaction leg or on the line beneath it? Etc. And it gets
tricky with comments (which generally you want to keep as-is), both in
general and even more so when they are mixed with tags. If you have such
an "opinionated" pretty printer you can do all your changes on the AST
and just pretty print your result.

Daniele Nicolodi

unread,
Jul 7, 2020, 4:00:38 AM7/7/20
to bean...@googlegroups.com
Ciao Stefano,

On 07/07/2020 01:08, Stefano Zacchiroli wrote:

> I agree it looks like quite a burden for a single tool --- even though I
> think it's a very important one to have, due to the intrinsic nature of
> plain text accounting. And also, the alternative looks worse to me:
> people just sed or search/replace in their ledgers messing up spacing or
> worse. I don't think the user experience in doing that is great, and
> that affects our user base.

I do mechanical transformations to my ledger in Emacs and with a tiny
bit of Emacs list all transformations I've done so far were very easy to
implement. I can see how an imperative way to perform those operations
may be interesting for those not familiar with a less programmable text
editor, but I am not sure I would design Beancount around that.

> There is an alternative though. Define a single canonical way to indent
> Beancount textual ledgers and have a tool like Python's Black that
> reformats a Beancount ledger (or even isolated directives) that way.
> Right now there are some ambiguities, e.g., do you indent a metadata
> attached to a transaction leg or not? Do you put them on the same line
> of the transaction leg or on the line beneath it? Etc. And it gets
> tricky with comments (which generally you want to keep as-is), both in
> general and even more so when they are mixed with tags. If you have such
> an "opinionated" pretty printer you can do all your changes on the AST
> and just pretty print your result.

In a way Emacs is my way to enforce a consistent formatting. I see the
value in an automatic indentation and re-formatting tool for Beancount,
but I am not sure it should live in Beancount itself.

Cheers,
Dan

geof...@gmail.com

unread,
Jul 8, 2020, 2:39:49 PM7/8/20
to Beancount
 After reading through the v3 design document, once thing that wasn't clear to me is whether it will be possible to access just the parser without running booking from the exposed API.

Today, I have a workflow which:
1) Reads in my existing beancount files (I have many, ~ one per account) using loader.load_file() and (the unexposed) loader._parse_recursive(), the results of this gets categorized and preserved in memory
2) Parse data from PDF/OFX files which get converted into beancount objects
3) Run booking and validation on the new items to ensure they are complete and error-free (but don't rewrite them)
4) Categorize these new items and append to the categorization found in (1)
5) Write out new beancount files retaining order/file information for the items from (1), replacing the files from (1) (after backing them up of course)

The key here is that I never run booking on the data that gets written out, because booking a transaction will create bean elements for the inferred transactions, and I don't want those saved in the bean-file.  Additionally booking will convert CostSpec objects to Cost objects (filling in the inferred info), and again, I don't want that stored in my resultant beancount files.  The automation will update the beancount files, but the goal is to write unmodified entries exactly as they are read such that they are still easily manageable by human eyes, and preserve any manually-added goodness.

I realize that automating the generation of beancount files is not a design-point of beancount, but I've found that it is very amenable to reversing the parser process, resulting in a very effective way to enter data into beancount.

By implementing the parser and booking both in C++, will it still be possible to run the parser, modify the results, and then (optionally) run the booking and validate functions all from the python layer?

Máté FARKAS

unread,
Jul 8, 2020, 4:24:20 PM7/8/20
to bean...@googlegroups.com
Hi, that's a good point, I am planning something similar, so it would be good to be able modify/ingest transactions between parsing and booking phase. Could've been done with hooks or plugins as well.

Máté.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Chary Chary

unread,
Jul 12, 2020, 5:35:38 PM7/12/20
to Beancount
Sounds very exiting!

For me the fact, that considerations are given to welcome a broader community collaboration is a very positive sign, as I think it gives people more confidence in the future sustainability of the product.


Practical question about beancount query:

You write that you envisage that most of the future users of it will be outside of beancount.

I thought the the reason  beancount query was created is because it was difficult to construct SQL query for beancount data.

So, my question is: what will the future beancount query tool offer for users outside of beancount, what they do not get now in Pandas of SQL?

Martin Blais

unread,
Jul 14, 2020, 1:33:50 AM7/14/20
to Beancount
On Mon, Jul 6, 2020 at 5:00 AM Stefano Zacchiroli <za...@upsilon.cc> wrote:
On Sat, Jul 04, 2020 at 02:34:35AM -0400, Martin Blais wrote:
> Today I'm starting development on Beancount v3.
>
> This is going to be a pretty big change and will take a while.
> I've laid down the details in this document:
> https://docs.google.com/document/d/1qPdNXaz5zuDQ8M9uoZFyyFis7hA0G55BEfhWhrVBsfc/

This is very exciting. And, as usual, your design documents are very
interesting and insightful to read. I took some time to read through all
of them and I'm sharing some thoughts of mine about them below.

==================================


Directives
----------

Having as output of beancount core two streams of clearly separated
incomplete/syntactic v. complete/semantic directives sounds like a great
approach. In terms of terminology, you might use the "raw v. cooked"
terminology (which I've picked up from proof assistants years ago, but
which I find fitting here; YMMV). It's not yet clear to me if both
streams will be accessible to plugins (I think they should). And, if
they are, how will they be interleaved: a single stream with both raw
and cooked transactions? Two separate streams?

One will really just be an AST and the other the actual desired stream.
One reason for letting plugins access the AST would be to let them make changes to the input *before* other plugins run over their result.
I'm not sure I have a super compelling use case for this though, and won't bother with the extra complexity of exposing this if I don't.



Parser
------

You mention you're gonna keep using flex/bison, which is for sure well
known technology. However, the expressivity of bison grammars make it
kinda hard to hack on existing parsers, raising the barrier for
contributors. Have you considered switching to PEG parsing?

It's fashionable, it came through my feed a couple of times recently. 
I like the idea of removing the distinction between the scanner and grammar.
But at the EOTD, what's there works pretty well and changes to the grammar are also rare (on purpose).
Maybe a toy project for later.


Unrelated (but still on parsing), I don't understand your point about
getting rid of the cache. Sure, we all hope it will no longer needed for
interactive use, but it would still be useful for people building small
services on top of relatively static Beancount ledgers; including Fava.
Also, as the output of Beancount core is gonna be streams of protobufs,
those will be trivial to serialize, and also cross language, why not
imagine a cache of protobufs serialized on disks?

Yes, that's precisely the idea. That's why I'm linking Riegeli, it'll the the container for that.
 
 
The rework of includes sounds great. We have discussed it on the list in
the past, so I guess it's your goal, but as it's not explicitly stated
in the design doc let me repeat it here. I think the goal should be
"include invariance", i.e., one should always be able to take an
existing Beancount ledger in a single file and break it down in an
arbitrary amount of smaller ledger files that include each other,
without any semantic change. (The stated goal in your doc of being able
to declare plugins elsewhere than in the main file will derive from
this, but this principle is more general.)

Yes, that should be the goal, though I have in mind a perhaps more restricted version where, like today, the options have to be set in the top-level file; the only difference is that it'll barf when you try to set options in included files (which it always should have, this is essentially a bug fix).


The main feature I lack to have feature parity with Ledger-CLI is the
ability to add tags to individual transaction legs. I'm assuming this
will go hand-in-hand with relaxing the distinction between metadata/
tags/ links (by making them syntactic sugar for metadata, I'm guessing),
which is great, thanks!

You mean you'd like to have the ability to add #.... at the end of a posting line?
That should be easy to add, but I'd have to change the schema.
Can you motivate it?
When / how / why do you need to tag individual postings whereby tagging the transaction isn't enough?
That would be added in v2.

 
Ulque
-----

This sounds like an exciting project.

In addition to support for balance columns and totals, there are a bunch
of other features that would be very welcome, like the ability to filter
out 0 columns, or to add derived columns (e.g., differences between
columns, to compute P&L in investments). I don't know how much you plan
to build on top of Pandas (which will trivially offer many of these),
but it is absolutely brilliant to see the analogy between the two
worlds.

Something I'm surprising to haven't see mentioned on this is your vision
(which we discussed a while ago on list) that the hierarchical nature of
the account hierarchy is kinda arbitrary and gets in the way (e.g., one
often wants to pivot around from "Expenses:Home:Repair +
Expenses:Car:Repair" to "Expenses:Repair:Home + Expenses:Repair:Car" as
there is no right or wrong hierarchy there). Is this idea of being able
to pivot around the account hierarchy, considering each component a
facet of sort, part of your plans for Ulque, or is it out of scope?

I haven't thought much about changing anything in Beancount's core for that, it seems to me like it belongs in the query tool, as transformations on the account names. Just functions provided to manipulate the components of account names would be sufficient.
 
 
Code quality
------------

Typing: outside of Google I've the feeling that the state-of-the-art
static type checker is Mypy. I've myself migrated a substantial codebase
to it and it's a vibrant environment (with a lot of involvement from
Guido himself) and active development that goes hand in hand with the
refinement of the type system (via periodic PEPs). I'd be weary of going
pytype instead of Mypy, even though I realized that the type annotations
are (supposed to be) compatible.

No opinion on that... I find pytype to be slow, wouldn't mind giving mypy a try.
The more static checking the better IMHO.
Basically I need to figure out how to integrate this in Bazel, static type errors should be treated like build errors.


How about automated code formatting via Black?
(https://github.com/psf/black) I've recently switched to it a
substantial code base and I find it pretty life changing. It would also
help contributors I think, which is one of your worthwhile meta-goals
for v3.

Auto-formatting actually drives me a bit crazy someitmes. One of the guys on my team at work hasn't figured out how to setup his editor to disable clang-format, and it'll arbitrarily fill function call arguments that were carefully arranged for readabilty on code he didn't otherwise touch.  I'd use auto-formatting if it was smart enough to figure out that it should only change code near a diff hunk...  Sometimes I just want to write things a certain way.

Maybe it's just one of those things one day you just give up and learn to love the lack of control.


Strict payee
------------

YAY, everything that makes possible to have even more automated sanity
checks is a welcome addition.  I wonder if a relaxed policy where any
new payee is OK on first use even if undeclared, unless it's "near" (as
string distance) to a previous one would work well as a default policy.
But that's probably a matter for a plugin anyway...

Yeah, I don't know, maybe. 
If the right solution required dedicated syntax it might find its way a little closer to the core than a plugin.


Unsigned debit and credit
-------------------------

This is a very concrete need, which I routinely struggle with when
showing accounting reports extracted from Beancount (or Fava) to other
family members. But I'm surprised you mention it as a potential feature
for Beancount itself. Wouldn't it belong to front-ends, like Fava (or
maybe Ulque in the future), instead? In the view of "Beancount as an
accounting calculator", which I've always adhered too, that seems to
belong elsewhere.

I agree, but the parser (the input) is located in the core.
For the output part, I think you're right.


be created and maintained separately of core.   j

I find operating on the source to have been pretty sufficient for those things.
It would require tracking whitespace and comments in the AST in order to ensure a full round-trip, and it's not obvious.
It doesn't seem worth the effort to me.


==================================


> The short version is that v3's core is going to be ported to C++ using a
> Bazel build, and the codebase will be sectioned between core and the rest.
> I just merged the new build definition in master.

Bazel is indeed a great build system, but you should know that, at least
for now, it is not in Debian/Ubuntu yet. So for the time being it will
be impossible to ship Beancount v3 on those distros (and any other
Debian-based distro) until Bazel itself is part of Debian. Work is
ongoing (see: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654
), but I'm unable to guess when it will actually happen.

As always, I'm not too concerned about packaging... more concerned about writing good code that will compile and install easily for a long time.
I'm sure they'll figure it out.


 
Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 14, 2020, 1:40:32 AM7/14/20
to Beancount
On Mon, Jul 6, 2020 at 7:58 AM Stefano Zacchiroli <za...@upsilon.cc> wrote:
On Sat, Jul 04, 2020 at 03:09:34AM -0700, Andre Engelbrecht wrote:
> *Disclosure*: I've converted a couple of people from using webapps or
> spreadsheets to use ledger. They love ledger despite some of the
> nuance. I've been trying to move them over to beancount, but having a
> bit harder time because of the python dependency.  Fava and being able
> to run it locally has been a great driver to have people at least
> consider installing python and giving beancount a test run.

Same for me. I've converted some people to Beancount thanks to the fact
that "pip install beancount" was simple enough (although indeed still a
barrier for non-dev people) for them to install. Bazel will be a much
higher barrier for people to install/use Beancount. I totally understand
switching to it from a dev point of view, but it would be great to
maintain the ability to install via pip.

I've used pypi to ship weird python code depending on a huge java
bundle, and I know it works well. If there is a way to ship (and then
select) static binaries for the non-Python parts for multiple
architectures (the most popular ones) via pip, I think it'd be totally
worth it in terms of user base.

I'm sure we can figure out how to bake binaries that install with pip.
But at the EOTD, if it works well on Linux by building and running from source, I'm happy enough.

I stopped looking at this project as a way to convert others to it a long time ago... If it sells itself, it's because of the characteristics of the language itself: the problems it solves, how it does that elegantly and clearly, and how it allows you to integrate custom scripts with it. If people want to e.g. port it to Windows, make nice packaging, etc. I'm happy to apply patches and do some very moderate work to facilitate that, but I find open source works best when it's a mildly selfish endeavour.  I enjoy solving problems, so that's what I want to focus on for the most part. We'll see later about packaging. I'm doing this for fun...


Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 14, 2020, 1:44:42 AM7/14/20
to Beancount
I have a different experience; hackathons never did anything much for me. For some number of years I would attend the hacking days before/after PyCon, and I didn't get a lot out of the coding sessions. The fun part of these to me was always the hallway conversations, where I'd learn some mildly esoteric stuff. I like to write code alone.

What I think could be appealing to a VC chat is to bring a human element to these discussions, see people's faces, hear them rant and rave about what they really care about. Mailing-lists are dry, and the dynamics change a lot when you know the people face-to-face.

 
I'm not sure I'll have much to contribute, given my very sporadic track
record of contributing to Beancount, but I'll be happy to try if any of
these happens.

Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 14, 2020, 1:49:23 AM7/14/20
to Beancount
On Mon, Jul 6, 2020 at 6:18 PM Daniele Nicolodi <dan...@grinta.net> wrote:
On 06/07/2020 03:00, Stefano Zacchiroli wrote:
> You mention you're gonna keep using flex/bison, which is for sure well
> known technology. However, the expressivity of bison grammars make it
> kinda hard to hack on existing parsers, raising the barrier for
> contributors. Have you considered switching to PEG parsing?

I toyed with the idea of writing a PEG parser for Beancount syntax, but
I haven't found a nice PEG parser generator. The Beancount syntax is
also fairly regular, thus the Bison grammar is actually not that bad to
read. Also, there is the desire to keep the v2 and v3 parser definitions
as close as possible.

> The rework of includes sounds great. We have discussed it on the list in
> the past, so I guess it's your goal, but as it's not explicitly stated
> in the design doc let me repeat it here. I think the goal should be
> "include invariance", i.e., one should always be able to take an
> existing Beancount ledger in a single file and break it down in an
> arbitrary amount of smaller ledger files that include each other,
> without any semantic change. (The stated goal in your doc of being able
> to declare plugins elsewhere than in the main file will derive from
> this, but this principle is more general.)

I have done some work on the parser and I would like to lift the current
limits on included also for v2. Once the parser rework lands, it should
fairly be straightforward.

Seems doable.
I mean, frankly just getting errors for unsupported syntax in included files goes a very long way here.
 

 
> The main feature I lack to have feature parity with Ledger-CLI is the
> ability to add tags to individual transaction legs. I'm assuming this
> will go hand-in-hand with relaxing the distinction between metadata/
> tags/ links (by making them syntactic sugar for metadata, I'm guessing),
> which is great, thanks!

This is on the to do list.

+1
We'll just have to figure out how to do this the same way for postings and transactions.
It wouldn't be super nice if they are stored one way on transactions and a different way on postings.
That's why I was thinking we could rework the way transactions store tags and links first, and then do that.


> In
> particular, I'd like to know if the raw/syntactic directives you imagine
> coming out of the new Beancount core would be close enough to the book
> concrete syntax to allow manipulation such as meddling with spacing
> Provided that, and a good pretty printer for concrete syntax, a
> "bean-sed" project with a dedicated manipulation language can probably
> be created and maintained separately of core.

I am far from being a parsing expert, but I think having the parser emit
a syntax tree suitable to reconstruct the input file without
modifications is going to be very complex: the scanner would need to
emit many more tokens for input that is now simply ignored (ie trailing
whitespace) and the grammar would need to handle those, making it more
complex. The representation of the parsing results would also be more
complex. A lot of work to support a single tool.

I think that a tool like the one you describe should use the syntax tree
and the actual file content in combination to rewrite the input file:
the syntax tree allows to identify which elements need to be modified
and from these the position in the input files where text changes need
to happen. Sounds complex, but I believe less complex than augmenting
the parser.

+1
Not worth it IMO, too much work.
 

> Bazel is indeed a great build system, but you should know that, at least
> for now, it is not in Debian/Ubuntu yet. So for the time being it will
> be impossible to ship Beancount v3 on those distros (and any other
> Debian-based distro) until Bazel itself is part of Debian. Work i> ongoing (see: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654

> ), but I'm unable to guess when it will actually happen.

I had a similar reaction to Bazel. My secret plan is to maintain a
parallel build system based on Meson. I did a quick reality check and it
seems that all prerequisites can be build with Meson. I think Meson is
more non-developer and distribution friendly than Bazel.

Do you drink Pepsi? ;-)


Cheers,
Dan


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 14, 2020, 1:56:38 AM7/14/20
to Beancount
On Sun, Jul 12, 2020 at 5:35 PM Chary Chary <char...@gmail.com> wrote:
Sounds very exiting!

For me the fact, that considerations are given to welcome a broader community collaboration is a very positive sign, as I think it gives people more confidence in the future sustainability of the product.

I'm more concerned about being able to actually respond to most tickets, so I need less stuff to maintain.


Practical question about beancount query:

You write that you envisage that most of the future users of it will be outside of beancount.

I thought the the reason  beancount query was created is because it was difficult to construct SQL query for beancount data.

The main reason the tool exists, is to fill in the gaps for things you can't easily do with just a table.
In particular, you need a custom type for positions with an optional cost basis, and an aggregation rule for those, and a way to split up the units and currency to separate columns.
Then there's the bells and whistles: the ability to clip a time period (open and close), a balance column, and a bottom line,.
 

So, my question is: what will the future beancount query tool offer for users outside of beancount, what they do not get now in Pandas of SQL?

That tool will become a tool that can process any table data, from various sources. It'll automatically infer schemas (by sniffing the data), and support the kinds of interfaces that are useful for Beancount (e.g. Google Sheets). It'll be less powerful and less efficient than something like Pandas (it'll process row-by-row, not columnar) but it should be more automatic. I think that the "one liner" nature of it will find broad usage. You should be aebl to process and join CSV files as if they were databases, for instance.


On Saturday, July 4, 2020 at 8:34:50 AM UTC+2, Martin Blais wrote:
Hi,
Today I'm starting development on Beancount v3. 

This is going to be a pretty big change and will take a while. 
I've laid down the details in this document:

This file describes the new set of dependencies for it:

And there is a dedicated installation file for the in-development version:

The short version is that v3's core is going to be ported to C++ using a Bazel build, and the codebase will be sectioned between core and the rest.
I just merged the new build definition in master.

The current head will be branched as "v2" and maintained stable. 
It will build with both setup.py and Bazel.
Backward compatible fixes to it will be done there and merged into v3.
v3 development will occur on branch "master" and breaking changes will occur there.

Comments appreciated (on the docs, or here if you prefer),

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Jul 14, 2020, 2:02:08 AM7/14/20
to Beancount
On Wed, Jul 8, 2020 at 2:39 PM <geof...@gmail.com> wrote:
 After reading through the v3 design document, once thing that wasn't clear to me is whether it will be possible to access just the parser without running booking from the exposed API.

Today, I have a workflow which:
1) Reads in my existing beancount files (I have many, ~ one per account) using loader.load_file() and (the unexposed) loader._parse_recursive(), the results of this gets categorized and preserved in memory
2) Parse data from PDF/OFX files which get converted into beancount objects
3) Run booking and validation on the new items to ensure they are complete and error-free (but don't rewrite them)
4) Categorize these new items and append to the categorization found in (1)
5) Write out new beancount files retaining order/file information for the items from (1), replacing the files from (1) (after backing them up of course)

The key here is that I never run booking on the data that gets written out, because booking a transaction will create bean elements for the inferred transactions, and I don't want those saved in the bean-file.  Additionally booking will convert CostSpec objects to Cost objects (filling in the inferred info), and again, I don't want that stored in my resultant beancount files.  The automation will update the beancount files, but the goal is to write unmodified entries exactly as they are read such that they are still easily manageable by human eyes, and preserve any manually-added goodness.

I realize that automating the generation of beancount files is not a design-point of beancount, but I've found that it is very amenable to reversing the parser process, resulting in a very effective way to enter data into beancount.

That's pretty cool, I'd never heard of someone doing that in so structured a way.

  
By implementing the parser and booking both in C++, will it still be possible to run the parser, modify the results, and then (optionally) run the booking and validate functions all from the python layer?

I think so. I'll keep that in mind in defining the new schema.

 

 
--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Stefano Zacchiroli

unread,
Jul 14, 2020, 8:50:36 AM7/14/20
to bean...@googlegroups.com
Thanks for your feedback Martin.

On Tue, Jul 14, 2020 at 01:33:34AM -0400, Martin Blais wrote:
> Yes, that should be the goal, though I have in mind a perhaps more
> restricted version where, like today, the options have to be set in the
> top-level file; the only difference is that it'll barf when you try to set
> options in included files (which it always should have, this is essentially
> a bug fix).

But why? What's the added value of restricting options to only be in the
master file? I'm having other family members read the textual version of
our books and they can make sense of the double-accounting part, but the
beancount-specific options don't make sense for them, so I'd really like
to hide them away with a simple oneliner include at the beginning of the
books.

FWIW I do something similar with other textual document system (e.g.,
LaTeX), and I find that hiding low-level details in a single "you
shouldn't care about this stuff" file has a lot of value.

> > The main feature I lack to have feature parity with Ledger-CLI is
> > the ability to add tags to individual transaction legs. I'm assuming
> > this will go hand-in-hand with relaxing the distinction between
> > metadata/ tags/ links (by making them syntactic sugar for metadata,
> > I'm guessing), which is great, thanks!
>
> You mean you'd like to have the ability to add #.... at the end of a
> posting line? That should be easy to add, but I'd have to change the
> schema. Can you motivate it? When / how / why do you need to tag
> individual postings whereby tagging the transaction isn't enough?
> That would be added in v2.

(This is https://github.com/beancount/beancount/issues/144 and we should
probably have the discussion there, but just in case: )

A classic example for me is:

2020-07-08 * "foobar bookshop" "books + card game"
Expenses:Books 32.90 EUR ; book A, B, and C
Expenses:Games 15.00 EUR ; card game for kid #hulk
Assets:Checking -47.90 EUR

where I want to tag one of the lag has pertaining to my kid (no, he is
not actually called Hulk), but not the rest of the transaction.

(Yes, I can refactor this using two transactions, but it's annoying.)

Martin Blais

unread,
Jul 14, 2020, 9:45:19 AM7/14/20
to Beancount
On Tue, Jul 14, 2020 at 8:50 AM Stefano Zacchiroli <za...@upsilon.cc> wrote:
Thanks for your feedback Martin.

On Tue, Jul 14, 2020 at 01:33:34AM -0400, Martin Blais wrote:
> Yes, that should be the goal, though I have in mind a perhaps more
> restricted version where, like today, the options have to be set in the
> top-level file; the only difference is that it'll barf when you try to set
> options in included files (which it always should have, this is essentially
> a bug fix).

But why? What's the added value of restricting options to only be in the
master file? I'm having other family members read the textual version of
our books and they can make sense of the double-accounting part, but the
beancount-specific options don't make sense for them, so I'd really like
to hide them away with a simple oneliner include at the beginning of the
books.

I get it, it's not for value, it's done only for implementation simplicity.
I can review that old code and see if this can be reviewed and implemented better.


FWIW I do something similar with other textual document system (e.g.,
LaTeX), and I find that hiding low-level details in a single "you
shouldn't care about this stuff" file has a lot of value.

> > The main feature I lack to have feature parity with Ledger-CLI is
> > the ability to add tags to individual transaction legs. I'm assuming
> > this will go hand-in-hand with relaxing the distinction between
> > metadata/ tags/ links (by making them syntactic sugar for metadata,
> > I'm guessing), which is great, thanks!
>
> You mean you'd like to have the ability to add #.... at the end of a
> posting line?  That should be easy to add, but I'd have to change the
> schema.  Can you motivate it?  When / how / why do you need to tag
> individual postings whereby tagging the transaction isn't enough?
> That would be added in v2.

(This is https://github.com/beancount/beancount/issues/144 and we should
probably have the discussion there, but just in case: )

A classic example for me is:

  2020-07-08 * "foobar bookshop" "books + card game"
      Expenses:Books                             32.90 EUR ; book A, B, and C
      Expenses:Games                             15.00 EUR ; card game for kid  #hulk
      Assets:Checking                           -47.90 EUR

where I want to tag one of the lag has pertaining to my kid (no, he is
not actually called Hulk), but not the rest of the transaction.

(Yes, I can refactor this using two transactions, but it's annoying.)

In the meantime you can use posting metadata, but I think it's sensible syntax (aside from being inside a command).
We'll have to figure out a reasonable schema modification.


 

Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Stefano Zacchiroli

unread,
Jul 14, 2020, 10:05:17 AM7/14/20
to bean...@googlegroups.com
On Tue, Jul 14, 2020 at 09:45:05AM -0400, Martin Blais wrote:
> In the meantime you can use posting metadata, but I think it's
> sensible syntax (aside from being inside a [comment]). We'll have to
> figure out a reasonable schema modification.

*nod*

To be clear: I don't think the fact it's inside a comment is a feature.
It's Ledger-inherited design (and personal bad practice...) which I
don't think Beancount should replicate. But that's a minor point.

Daniele Nicolodi

unread,
Jul 14, 2020, 3:53:43 PM7/14/20
to bean...@googlegroups.com
On 13/07/2020 23:49, Martin Blais wrote:
> On Mon, Jul 6, 2020 at 6:18 PM Daniele Nicolodi <dan...@grinta.net
> I had a similar reaction to Bazel. My secret plan is to maintain a
> parallel build system based on Meson. I did a quick reality check and it
> seems that all prerequisites can be build with Meson. I think Meson is
> more non-developer and distribution friendly than Bazel.
>
> Do you drink Pepsi? ;-)

I don't drink sodas or cool-aids :-)

It is a while I want to dive a bit deeper into Meson and this seems a
good opportunity, also to contrast / compare it with Bazel.

Cheers,
Dan

TRS-80

unread,
Jul 15, 2020, 10:42:02 AM7/15/20
to bean...@googlegroups.com
On 2020-07-14 01:40, Martin Blais wrote:

>> On Sat, Jul 04, 2020 at 03:09:34AM -0700, Andre Engelbrecht wrote:
>>> having a bit harder time because of the python dependency.

Isn't Python pretty much everywhere (that matters anyway) already by
default by now?

> I stopped looking at this project as a way to convert others to it a
> long time ago... If it sells itself, it's because of the
> characteristics of the language itself: the problems it solves, how it
> does that elegantly and clearly, and how it allows you to integrate
> custom scripts with it. If people want to e.g. port it to Windows,
> make nice packaging, etc. I'm happy to apply patches and do some very
> moderate work to facilitate that, but I find open source works best
> when it's a mildly selfish endeavour. I enjoy solving problems, so
> that's what I want to focus on for the most part. We'll see later
> about packaging. I'm doing this for fun...

This is actually the key to sustainability, IMO. Seen far too much
burnout otherwise, once it stops being "fun." Not only in F/LOSS but
life in general (also my personal experience). Good for you realizing
this Martin, especially nowadays in this crazy rat race most people have
allowed their lives to become. Less is often more, IMHO.

There is no helping people anyway who think that layers upon layers of
graphical obfuscations are somehow "easier" than simply doing things at
command line (I have tried!). :)

TRS-80

TRS-80

unread,
Jul 15, 2020, 11:22:46 AM7/15/20
to bean...@googlegroups.com
>>> On 06/07/2020 03:00, Stefano Zacchiroli wrote:
>>>
>>> Bazel is indeed a great build system, but you should know that, at
>>> least for now, it is not in Debian/Ubuntu yet. So for the time being
>>> it
>>> will be impossible to ship Beancount v3 on those distros (and any
>>> other
>>> Debian-based distro) until Bazel itself is part of Debian. Work is
>>> ongoing (see:
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654
>>> ), but I'm unable to guess when it will actually happen.

> On 2020-07-14 01:49, Martin Blais wrote:
>
> Do you drink Pepsi? ;-)

Not that I imagine I will sway Martin on this, but count me among those
concerned that Basel is not (yet) available in Debian.

Personally, my concerns are less "non-developer" as Danielle mentioned
(you
can only help people so much anyway, and this is text based accounting
after all) and more along the lines of "distribution friendly" as Debian
based distributions are such a wide swath. Also, FWIW, "is it in
Debian?"
is one of my personal (heavy, in my case) criteria for weighing
software,
and I know a lot of others feel similarly.

Along those lines, I would argue that we are getting close to Martin's
own
criteria (I seem to recall reading somewhere) of "compiling on GNU/Linux
being good enough" here. Of course I realize that Debian != GNU/Linux,
however it's pretty close, and certainly for me they are ~=.

Cheers!

TRS-80

Chary Chary

unread,
Aug 6, 2020, 4:24:49 PM8/6/20
to Beancount
Are you also planning to define beancount "API" more clearly and safely?

We touched upon the subject in this discussion:  https://groups.google.com/g/beancount/c/UatQey1X0OY/m/0FGCrY5fAwAJ 

Would it not be logical, that by using defined API (I think you called it "a contract") user is not able to "break" something, the API would simply not allow this to happen?

I know this may be a naive question (as I am not professional), but why doesn't beancount use object-oriented approach?  So you have an object, representing all transactions, and user can manipulate them (add / delete) through methods, but implementation of the class does not allow unsafe changes / operations? 

Martin Blais

unread,
Aug 9, 2020, 1:19:22 AM8/9/20
to Beancount
On Thu, Aug 6, 2020 at 4:24 PM Chary Chary <char...@gmail.com> wrote:
Are you also planning to define beancount "API" more clearly and safely?

Roughly speaking the same data model and similar access to data structures, special containers, and functions, via Python.

We touched upon the subject in this discussion:  https://groups.google.com/g/beancount/c/UatQey1X0OY/m/0FGCrY5fAwAJ 

Would it not be logical, that by using defined API (I think you called it "a contract") user is not able to "break" something, the API would simply not allow this to happen?

This is a new major revision, I do want the freedom to evolve things and make them better, at least a bit.
It won't be perfectly the same as before, but it'll be very close, at least conceptually.


I know this may be a naive question (as I am not professional), but why doesn't beancount use object-oriented approach?  So you have an object, representing all transactions, and user can manipulate them (add / delete) through methods, but implementation of the class does not allow unsafe changes / operations? 

That's a much longer discussion to have an out of scope for this forum, but I can offer a view. Essentially, OO in general was a rather unfortuitous and pointless 20 year detour in the history of programming application, a gigantic waste of time, whose promises never yielded tangible benefits, and I don't like it. It never became an accurate realization of Alan Kay's original concept for OO either. What people call OO today encourages side-effects on the object's private attributes, which is barely less worse than mutating global variables (in terms of making programs difficult to understand), so in a sense, by moving to OO we only trade one form of evil for another. Every mature software practitioner eventually comes to appreciate that mutation is the central aspect of what we do that makes programs difficult to reason about and the one that matters most above all; the whole game of programming eventually reduces to an act of balance, whereby you have to mutate *something* to get anything done, but every mutation of data inserts a little bit of extra complexity in your programs--and makes it that much harder to maintain, so the focus is shifted mostly to structuring where and when you do choose to break the rules (and mutate) in order to keep your designs simple and isolated so that they're easily testable and maintainable. Academics are lagging behind - many of them don't practice large-scale development - so students are still trickling out of schools with aspirations to do OO Java or OO Python; this is a mild tragedy, but redemption is usually quick if they choose to go work somewhere that has adopted a good amount of discipline and rigor. Moreover, most of the large "big tech" companies have all drifted in that direction over time, i.e., much of the codes in these companies consist of distributed systems passing around "write-once thereafter read-only" data structures (e.g., protocol buffers or equivalent) with only a little bit of transient mutable state contained here and there. Distributed batch computation pipelines (i.e., cascades of MapReduce) are similarly usually functional in nature. This is in sharp contrast to the style of "OO" we used to do in 1995: lots of object references and threads and mutexes and shared access coordination all over the place--it was a debugging disaster regularly, and a kind of perverse fun for those of us who got good at it. Nowadays, I seem to hardly ever use tracing debuggers anymore--something really has changed in the culture, even if we use the same languages (e.g., C++). A subset of people "get it" early on - those inclined toward principles, e.g., usually people whole enjoy rigor and math - and commit themselves to absolutes, choosing to program in languages which provide lots of support and constraints for controlling mutation tightly (e.g. Haskell), or programs whose data structures are all immutable (e.g., Clojure). Others, like me, follow a more pragmatic approach (at least in this particular project), sticking with deeply popular and well-established languages but choosing to use them in somewhat limited and unorthodox ways (I call this the "functional-ish" style), avoiding mutation most of the time (thus getting most of its benefit) but also harvesting the much broader set of existing library code and tools those languages offer, due to their much higher level of adoption. So Beancount has that style. This isn't a religion for me, I just really like to build nice things that work... but I'm not alone going slightly askew in that way - e.g., Google's internal style guide has been conservative w.r.t. the broader C++ community for a long time for instance (https://google.github.io/styleguide/cppguide.html), and a subset of the Python community mostly stays away from classes and OO as well. As a programmer one does have the freedom to avoid entire subsets of idiomatic language features... Inheritance is on the menu? Thank you sir, but I won't have any.


 
On Saturday, July 4, 2020 at 8:34:50 AM UTC+2 bl...@furius.ca wrote:
Hi,
Today I'm starting development on Beancount v3. 

This is going to be a pretty big change and will take a while. 
I've laid down the details in this document:

This file describes the new set of dependencies for it:

And there is a dedicated installation file for the in-development version:

The short version is that v3's core is going to be ported to C++ using a Bazel build, and the codebase will be sectioned between core and the rest.
I just merged the new build definition in master.

The current head will be branched as "v2" and maintained stable. 
It will build with both setup.py and Bazel.
Backward compatible fixes to it will be done there and merged into v3.
v3 development will occur on branch "master" and breaking changes will occur there.

Comments appreciated (on the docs, or here if you prefer),

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Aug 9, 2020, 2:03:15 AM8/9/20
to Beancount
On Sun, Aug 9, 2020 at 1:19 AM Martin Blais <bl...@furius.ca> wrote:
On Thu, Aug 6, 2020 at 4:24 PM Chary Chary <char...@gmail.com> wrote:
Are you also planning to define beancount "API" more clearly and safely?

Roughly speaking the same data model and similar access to data structures, special containers, and functions, via Python.

We touched upon the subject in this discussion:  https://groups.google.com/g/beancount/c/UatQey1X0OY/m/0FGCrY5fAwAJ 

Would it not be logical, that by using defined API (I think you called it "a contract") user is not able to "break" something, the API would simply not allow this to happen?

This is a new major revision, I do want the freedom to evolve things and make them better, at least a bit.
It won't be perfectly the same as before, but it'll be very close, at least conceptually.


I know this may be a naive question (as I am not professional), but why doesn't beancount use object-oriented approach?  So you have an object, representing all transactions, and user can manipulate them (add / delete) through methods, but implementation of the class does not allow unsafe changes / operations? 

That's a much longer discussion to have an out of scope for this forum,
[...]

Actually, never mind this long-winded answer. The short answer is that the "API" is mostly just the data Beancount produces + a few simple library functions and a container object (the Inventory). There's no point in hiding the full set of directives behind an API, it would be like replicating all the data model.


Chary Chary

unread,
Aug 9, 2020, 5:01:18 AM8/9/20
to Beancount
Martin,
thanks for the detailed answer!

Chary Chary

unread,
Aug 13, 2020, 4:47:32 PM8/13/20
to Beancount
Martin,

how do you practically work with  with 2 branches on one PC and at the same time, whilst still continuing using beancount for personal usage every day?

Which branch do you use for personal finances?

I understand when you switch between branches with "git checkout" you also have to rebuild every time, or am I missing something?

What is the trick?

Martin Blais

unread,
Aug 13, 2020, 6:46:54 PM8/13/20
to Beancount
I just git co v2 + rebuild, it's quick
Or use two separate clones


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Chary Chary

unread,
Aug 14, 2020, 7:54:49 AM8/14/20
to Beancount
On Friday, August 14, 2020 at 12:46:54 AM UTC+2 bl...@furius.ca wrote:
Or use two separate clones

2 separate clones I assume will be in 2 separate directories. Where in this case do you point PATH and PYTHONPATH?
 

Chary Chary

unread,
Sep 3, 2020, 3:53:26 AM9/3/20
to Beancount
Martin,

I know this is a bit of a deviation from the topic, but did you consider porting the core to Golang, instead of C++?

I am actually sure you did, but just interesting to hear why you didn't go this way

Martin Blais

unread,
Sep 4, 2020, 2:42:08 AM9/4/20
to Beancount
On Thu, Sep 3, 2020 at 3:53 AM Chary Chary <char...@gmail.com> wrote:
Martin,

I know this is a bit of a deviation from the topic, but did you consider porting the core to Golang, instead of C++?

I am actually sure you did, but just interesting to hear why you didn't go this way

Some years ago I had the opportunity to define a brand new project and build a new team for it, and it turned out the implementation was well suited for something like Go: relatively simple long-running servers mediating between various distributed services, with some basic data analysis code. I was excited to discover Go at the time and chose to implement everything in it, and I was hoping that it might become my favorite new powertool and dove in with great enthusiasm. I walked out the other end of that tunnel after a year or two, disheartened at the many weird quirks of the language, the various missed opportunities for helping keep things functional (not even close), some of the hype (no, co-routines will not solve all your concurrency problems and they're tricky), and its community prescribing methods I disagree with ("don't worry about abstracting, just cut-n-paste these same patterns of code all over"). And the absence of generics hurts. Even the aesthetics compelled by its clever use of capitalization started to weigh down on the team's appreciation of their source (it worked fine and ran fast, but I don't think anyone really "liked" their code). There's a lot to say; my point is that I've invested the time to learn it well enough that I know it's not my favorite thing to play with. Go gets some things so right - in particular, interfaces - but it gets many other things wrong, in my view. I'd use Go again in the future, but for pragmatic reasons, just to get things done, and if I wasn't going to have an emotional attachment to the project; but not for my fun projects. Plus Python for scripting things gets you so much more in terms of libraries.

The good thing is that the new core will have a protobuf boundary, so if you like you could process it in Go.

 


On Friday, August 14, 2020 at 1:54:49 PM UTC+2 Chary Chary wrote:
On Friday, August 14, 2020 at 12:46:54 AM UTC+2 bl...@furius.ca wrote:
Or use two separate clones

2 separate clones I assume will be in 2 separate directories. Where in this case do you point PATH and PYTHONPATH?
 

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Daniele Nicolodi

unread,
Sep 4, 2020, 3:40:43 AM9/4/20
to bean...@googlegroups.com
I am glad to see my impressions about Go, developed looking at it from
far, are confirmed by someone that spent the time to dig deeper.

Cheers,
Dan

Alex Kowalenko

unread,
Oct 30, 2020, 2:34:43 PM10/30/20
to Beancount
Hi I've been looking for a replacement for Bankivity and stumbled upon this project, and have been starting to use it to trial it out, and have been reading the plans for V3, and the use of C++. I agree with your decision not to use Go, and I found it somewhat similar, and in the end the pedantic error handling in Go drove me away. 

I read some of your comments on C++, and I support the decision to use it. You said that you want to use a basic level of the language for portability reasons. I am sure that most of the popular compilers for C++  (gcc, clang, MSC) on most platforms fully support C++11, and even now C++17 is mostly implemented and the STL to C++17. I am not sure whether portability is a reason to hold you back to a basic level of the language. Strong typing and speed is a positive over Python. 

Keep up the good work,
Alex

Martin Blais

unread,
Oct 30, 2020, 4:19:09 PM10/30/20
to Beancount
On Fri, Oct 30, 2020 at 2:34 PM Alex Kowalenko <alex.ko...@gmail.com> wrote:
Hi I've been looking for a replacement for Bankivity and stumbled upon this project, and have been starting to use it to trial it out, and have been reading the plans for V3, and the use of C++. I agree with your decision not to use Go, and I found it somewhat similar, and in the end the pedantic error handling in Go drove me away. 

I read some of your comments on C++, and I support the decision to use it. You said that you want to use a basic level of the language for portability reasons. I am sure that most of the popular compilers for C++  (gcc, clang, MSC) on most platforms fully support C++11, and even now C++17 is mostly implemented and the STL to C++17. I am not sure whether portability is a reason to hold you back to a basic level of the language. Strong typing and speed is a positive over Python. 

Yes, I could expound on that a bit, but essentially, there is a section of the C++ universe that's trying way too hard to do everything at compile-time (e.g. with meta-programming). The Boost library suffers from this disease, in particular. I think it's a bit of a "macho" thing for some of my peers from finance. I don't mind the occasional template here and there, but I think this project's performance requirements are quite modest (given the right implementation) and I'd rather trade-off the simplicity and ease of long-term maintainability you get by keeping to a basic set or primitives & dependencies. Using all the latest whizzbang may be cool to look at, but the ultimate goal is to keep things straightforward enough that long-term maintenance is possible (like, this shouldn't be a place where I spend many hours every week if I'm going to maintain it for another decade, which I intend to).

 
 
Keep up the good work,

Thanks Alex!

 
Alex


On Saturday, 4 July 2020 at 7:34:50 am UTC+1 bl...@furius.ca wrote:
Hi,
Today I'm starting development on Beancount v3. 

This is going to be a pretty big change and will take a while. 
I've laid down the details in this document:

This file describes the new set of dependencies for it:

And there is a dedicated installation file for the in-development version:

The short version is that v3's core is going to be ported to C++ using a Bazel build, and the codebase will be sectioned between core and the rest.
I just merged the new build definition in master.

The current head will be branched as "v2" and maintained stable. 
It will build with both setup.py and Bazel.
Backward compatible fixes to it will be done there and merged into v3.
v3 development will occur on branch "master" and breaking changes will occur there.

Comments appreciated (on the docs, or here if you prefer),

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Aaron Lindsay

unread,
Oct 30, 2020, 9:29:56 PM10/30/20
to Beancount
On Wednesday, July 8, 2020 at 2:39:49 PM UTC-4 geof...@gmail.com wrote:
 After reading through the v3 design document, once thing that wasn't clear to me is whether it will be possible to access just the parser without running booking from the exposed API.

Today, I have a workflow which:
1) Reads in my existing beancount files (I have many, ~ one per account) using loader.load_file() and (the unexposed) loader._parse_recursive(), the results of this gets categorized and preserved in memory
2) Parse data from PDF/OFX files which get converted into beancount objects
3) Run booking and validation on the new items to ensure they are complete and error-free (but don't rewrite them)
4) Categorize these new items and append to the categorization found in (1)
5) Write out new beancount files retaining order/file information for the items from (1), replacing the files from (1) (after backing them up of course)

The key here is that I never run booking on the data that gets written out, because booking a transaction will create bean elements for the inferred transactions, and I don't want those saved in the bean-file.  Additionally booking will convert CostSpec objects to Cost objects (filling in the inferred info), and again, I don't want that stored in my resultant beancount files.  The automation will update the beancount files, but the goal is to write unmodified entries exactly as they are read such that they are still easily manageable by human eyes, and preserve any manually-added goodness.

I realize that automating the generation of beancount files is not a design-point of beancount, but I've found that it is very amenable to reversing the parser process, resulting in a very effective way to enter data into beancount.

By implementing the parser and booking both in C++, will it still be possible to run the parser, modify the results, and then (optionally) run the booking and validate functions all from the python layer?

Have you posted your code anywhere, by chance (and, if not, would you be willing to)? I am very interested in your approach - I've been toying with a similar idea myself, but I'm not sure I've thought it through as well as you have.

-Aaron
Reply all
Reply to author
Forward
0 new messages