Loving beangulp

325 views
Skip to first unread message

Felipe Flores

unread,
May 20, 2024, 7:17:41 AMMay 20
to Beancount
Hey all, came here to express some gratitude for all the work you guys are doing. I've been using beancount with fava to manage all my finances down to the dollar for the last year or so, and I absolutely love it.

Just wanted to add that I just migrated a bunch of importers to beangulp, and it was way easier than I expected. Wonderful job there. The whole reason I tried it out is that I spent a couple hours pulling my hair trying to connect a debugger to my regression tests to no avail. With beangulp, however, I just had to tell the debugger config to launch the file with args (one of identify, generate, test, etc) and it worked just like that! I'd even suggest you guys advertise this as a feature!

If anyone stumbles upon this thread wondering whether they should move over to beangulp: do it. It's an easy migration, and they've even added an ImporterProtocol class that will make the transition even smoother.

Thanks again!

James Cook

unread,
May 20, 2024, 11:48:31 AMMay 20
to bean...@googlegroups.com
In case people haven't heard of it, I've been quite happy with
beancount-import:

https://github.com/jbms/beancount-import

I have not tried beangulp, so I don't know how they compare. But I
will mention beancount-import:

- Has a web interface letting me quickly categorize transactions
(i.e. choose which account my money went to).

- Chooses the right account automatically most the time (maybe 95%),
so mostly in this web interface I'm just pressing enter to confirm
each transaction. To do this, it trains a model based on transactions
already in my beancount ledger.

- Adds metadata to imported transactions to keep track of where they
came from (e.g. a transaction ID from an ofx file). This lets me
(a) not worry about whether or not I've imported something (e.g. when
I export from my bank, I just replace my existing ofx download with
a new one covering a longer time range) and (b) see a list of
postings I've entered manually that don't correspond to anything
imported (a sign something went wrong). This is my favourite
feature and I wonder if beangulp has something similar.

- Also, I have been able to write my own custom importer including
unit tests without much difficulty.

--
James

Timothy Jesionowski

unread,
May 20, 2024, 2:59:55 PMMay 20
to bean...@googlegroups.com
So now I'm wondering, since I've got a handful of purely custom importers and a whole bunch of new importers to write next quarter, what does the preponderance of the community use and why? I've heard of Red's importers, beancount-import, and now beangulp. I know there's others. I haven't looked at any of them personally yet because I'll need to package them for my niche linux distro first (NixOS has some downsides).

Just looking at the github repos:
  1. beangulp was written by the same guy that wrote beancount itself (Hi Martin!) so I would expect it to integrate very well.
  2. beancount-import has a web UI, which seems like a very useful tool for verifying all this automation (especially for expense categorization, which I'm skeptical can ever be particularly reliable)
  3. red's importers has the most active community by far, and seems to focus heavily on a "run the script every time you look at the reports" workflow
I don't have unit tests on my importers, and I'm importing from CSV's because I just got the simplest thing working. It's a KISS setup that's exactly as messy as it sounds. So given that I need to do an overhaul anyways, I'm curious why, for example, James doesn't use red's scripts.

Is it just that a fully automated setup is harder to build? The peace of mind from looking at the web UI to verify stuff?

Sincerely,
Timothy Jesionowski


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/3ku5paorsc7iygd3zg7i433b7j46ecycgwgd3hvun757niycfy%40pnnt32hdaald.

Daniele Nicolodi

unread,
May 20, 2024, 5:04:13 PMMay 20
to bean...@googlegroups.com
beangulp is a redesign of the old importers code in Beancount v2. The
main goal of the project has always been to define a API for importers
and to build a minimal framework around it that implements most of the
tedious parts.

The common importers API should have enabled importers to be easily
shared and used in import applications implementing additional features
on top or simply implementing a specific workflow.

Judging from the overwhelming lack of feedback we received, I thing that
the goal has not been met.

On the other hand, implementing an importer is so simple that I am not
surprised that there are multiple projects dealing with importing
transactions into a Beancount ledger.

Personally I use beangulp with some extra functionality bolted on top to
do automatic categorization and some cleanup of the imported
transactions. I use Emacs and some personal tools built on top of
beancount-mode for reviewing the imported transactions. I would never
use a browser based solution because it would always be too slow
compared to working directly in Emacs.

Cheers,
Dan


On 20/05/24 20:59, Timothy Jesionowski wrote:
> So now I'm wondering, since I've got a handful of purely custom
> importers and a whole bunch of new importers to write next quarter, what
> does the preponderance of the community use and why? I've heard of Red's
> importers, beancount-import, and now beangulp. I know there's others. I
> haven't looked at any of them personally yet because I'll need to
> package them for my niche linux distro first (NixOS has /some downsides/).
>
> Just looking at the github repos:
>
> 1. beangulp was written by the same guy that wrote beancount itself (Hi
> Martin!) so I would expect it to integrate very well.
> 2. beancount-import has a web UI, which seems like a very useful tool
> for verifying all this automation (especially for expense
> categorization, which I'm skeptical can /ever/ be particularly reliable)
> 3. red's importers has the most active community by far, and seems to
> focus heavily on a "run the script every time you look at the
> reports" workflow
>
> I don't have unit tests on my importers, and I'm importing from CSV's
> because I just got the simplest thing working. It's a KISS
> <https://en.wikipedia.org/wiki/KISS_principle> setup that's exactly as

Martin Blais

unread,
May 20, 2024, 9:42:05 PMMay 20
to Beancount
+1 on all points 


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Red S

unread,
May 21, 2024, 10:59:14 PMMay 21
to Beancount

Nice question.

red’s importers has the most active community by far, and seems to focus heavily on a “run the script every time you look at the reports” workflow

The purpose of reds_importers is to make writing importers easy by factoring out the most laborious parts. It actually uses beangulp*, and is simply a higher level layer over it.

reds_importers doesn’t impose any particular workflow outside beangulp’s, but if you are referring to this script, that’s only a reference workflow presented to inspire and jumpstart people new to Beancount, importers, or scripting. The sample script is a simple layer that calls beangulp’s bean-identify, bean-extract, and bean-file, which you can do yourself too. The idea is to have you build a workflow that works for you (including, for example, verifying smart_importer's categorization).

I wrote reds_importers originally because I wanted to eliminate a lot of repeated code in importers. The fundamental idea is, all any importer should do is handle quirks in the input file, and map data columns to standard fields. Everything else should be handled by a common transaction builder, which exist for investment accounts, banks/credit cards, and paychecks.

A few other problems were also solved along the way:

  • readers + common functions to manipulate (single and multi-table) csvs, pdfs, xml, json, etc. (libreader)
  • a set of utilities: bean-download (for direct downloads, and determining what to download), ofx-summarize (for peeking into ofxs)
  • Balance assertions and several options for dating these

So for each importer, you simply specify a reader and a transaction builder, and then express only the quirks and field mapping.

reds_importers follows the standard approach of relying on other tools to do what they do well (eg: ofxparse, petl, beangulp, smart_importer for categorization, etc.), which lends itself to building your own automation workflow.

I don’t have unit tests on my importers, and I’m importing from CSV’s because I just got the simplest thing working. It’s a KISS setup that’s exactly as messy as it sounds.

You probably already noticed the advantage of beangulp, which has a framework to write unit tests. Which reds_importers also uses.

So given that I need to do an overhaul anyways, I’m curious why, for example, James doesn’t use red’s scripts.

Is it just that a fully automated setup is harder to build? The peace of mind from looking at the web UI to verify stuff?

I’d be curious too.

Mini rant: I’d love for anyone who writes an importer to come to the same realization Felipe did, and write it using the beangulp API. This makes workflow and testing integration far easier for other users not to mention pulling it into higher level import systems like reds_importers, beancount-import, and such. Of course, consider writing importers for reds_importers, beancount-import and so on, but of course, these are third-party projects, so I understand if you don't.

I’m guessing this is not as common because most either aren’t aware of beangulp’s existence or perhaps because it seems like a barrier to understand the API and work with it. As Felipe discovered, it’s actually very simple.

*Technically, it uses the importers code in Beancount v2: bean-{identify, extract, file}, because beangulp has no release right now. But for the purposes of this discussion, these two are practically the same, and I refer to the v2 importers code as beangulp.

Justus Pendleton

unread,
May 27, 2024, 10:32:52 PMMay 27
to Beancount
On Wednesday, May 22, 2024 at 9:59:14 AM UTC+7 Red S wrote:
Mini rant: I’d love for anyone who writes an importer to come to the same realization Felipe did, and write it using the beangulp API.

I have several written several importers. I've never looked at moving them to beangulp. Because when I look at beangulp's github repository there are no releases, no commits in 8 months, and the "Status" documentation says "As of February 2021, the project has just been forked out of Beancount. Expect some changes to be made here."

I assume, perhaps incorrectly(?), that if beangulp isn't in a state to be released yet then it isn't in a state for me to migrate my importers to it.

Felipe Flores

unread,
May 27, 2024, 10:49:30 PMMay 27
to Beancount
Yes, please consider me posting here my little grain of sand to make beangulp more visible. It could definitely benefit from being available on PyPI (presumably not everyone knows the GitHub repo is fully installable with pip install git+https://...) and from some better documentation. I'd be happy to contribute with some docs if you guys would like that!

Red S

unread,
May 28, 2024, 1:29:46 AMMay 28
to Beancount

I’ve done the same thing for exactly the same reasons, and thus, beancount_reds_importers is written against bean-{identify, extract, file} in Beancount v2 instead of beangulp. Good catch, thank you, I should’ve been more explicit and instead recommended that people write importers against released code, which in this case is Beancount v2’s importers.



Martin Blais

unread,
May 28, 2024, 10:58:38 PMMay 28
to Beancount
I mean it does the job, should we make releases just so people feel like it's changing? 
I can commit a diff every month on the readme file if it helps  /shrug


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Red S

unread,
May 28, 2024, 11:19:30 PMMay 28
to Beancount

Not releases, but just a single release would suffice.

The first release implies the author/maintainer deemed it release-worthy, which is meaningful when viewed in the context of the reputation of the author/maintainer. This message explicitly says it’s not ready for a release, and that being a strong warning. Fair enough if that’s the state of it, but that may also reasonably interpreted to mean one shouldn’t be coding and releasing other software against it. Plus, a release would result in a PyPI package presumably, which is also required for other packages to depend on it.

Regular releases/commits help signify activity, and are not an ask, at least from me here.

Does that help clarify?

Martin Blais

unread,
May 28, 2024, 11:48:09 PMMay 28
to bean...@googlegroups.com
It would be indeed nice if I created a PyPI for it.
I think when the projects got forked from the beancount repo I never did that.




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Daniele Nicolodi

unread,
May 29, 2024, 3:41:41 PMMay 29
to bean...@googlegroups.com
Here you go: https://pypi.org/project/beangulp/

Cheers,
Dan

On 29/05/24 05:47, Martin Blais wrote:
> It would be indeed nice if I created a PyPI for it.
> I think when the projects got forked from the beancount repo I never did
> that.
>
>
>
>
> On Tue, May 28, 2024 at 11:19 PM Red S <redst...@gmail.com
> <mailto:redst...@gmail.com>> wrote:
>
> Not releases, but just a single release would suffice.
>
> The first release implies the author/maintainer deemed it
> release-worthy, which is meaningful when viewed in the context of
> the reputation of the author/maintainer. This message
> <https://groups.google.com/g/beancount/c/c_NwZGbgOXo/m/qYX-qEtsBAAJ>
> explicitly says it’s not ready for a release, and that being a
> strong warning. Fair enough if that’s the state of it, but that may
> also reasonably interpreted to mean one shouldn’t be coding and
> releasing other software against it. Plus, a release would result in
> a PyPI package presumably, which is also required for other packages
> to depend on it.
>
> Regular releases/commits help signify activity, and are not an ask,
> at least from me here.
>
> Does that help clarify?
>
> ​
>
> On Tuesday, May 28, 2024 at 7:58:38 PM UTC-7 bl...@furius.ca
> <mailto:bl...@furius.ca> wrote:
>
> I mean it does the job, should we make releases just so people
> feel like it's changing?
> I can commit a diff every month on the readme file if it helps/shrug
>
>
> On Tue, May 28, 2024, 01:29 Red S <redst...@gmail.com> wrote:
>
>
>
> On Monday, May 27, 2024 at 7:32:52 PM UTC-7
> just...@gmail.com wrote:
>
> On Wednesday, May 22, 2024 at 9:59:14 AM UTC+7 Red S wrote:
>
> Mini rant: I’d love for anyone who writes an
> importer to come to the same realization Felipe did,
> and *write it using the beangulp API*.
>
>
> I have several written several importers. I've never
> looked at moving them to beangulp. Because when I look
> at beangulp's github repository there are no releases,
> no commits in 8 months, and the "Status" documentation
> says "As of February 2021, the project has just been
> forked out of Beancount. Expect some changes to be made
> here."
>
> I assume, perhaps incorrectly(?), that if beangulp isn't
> in a state to be released yet then it isn't in a state
> for me to migrate my importers to it.
>
>
> I’ve done the same thing for exactly the same reasons, and
> thus, beancount_reds_importers
> <https://github.com/redstreet/beancount_reds_importers> is
> written against bean-{identify, extract, file} in Beancount
> v2 instead of beangulp. Good catch, thank you, I should’ve
> been more explicit and instead recommended that people write
> importers against released code, which in this case is
> Beancount v2’s importers.
>
> ​
>
> --
> You received this message because you are subscribed to the Google
> Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to beancount+...@googlegroups.com
> <mailto:beancount+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/7332b74a-39c8-476d-83dd-9dae2ed7c864n%40googlegroups.com <https://groups.google.com/d/msgid/beancount/7332b74a-39c8-476d-83dd-9dae2ed7c864n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to beancount+...@googlegroups.com
> <mailto:beancount+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/CAK21%2BhP0pW7pxyq4yO%3DxWt_JkffaF3g8k-NTG9bJkiSZzZzqsQ%40mail.gmail.com <https://groups.google.com/d/msgid/beancount/CAK21%2BhP0pW7pxyq4yO%3DxWt_JkffaF3g8k-NTG9bJkiSZzZzqsQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Paul Walker

unread,
May 30, 2024, 11:55:30 PMMay 30
to Beancount
I only just started with automatic imports and am trying out beangulp after this thread. Coming from a literal pipe-chain workflow:
    cat citi.csv | citi2beans.py | dedup.py >> ledger.beans
I like the scalability and deduplication potential of handling all statements at once. It seems like much of the importer interface is opinionated about finding+moving files, but I can't think of many alternatives (download -> delete below) - just that the core bit I'd imagine sharing is ofx|csv|etc-to-beans. Having the unit tests baked into the framework and examples is very helpful to encourage quality in community code. What about...

Cli flags in import.py? So users don't have to type "--existing ledger.beans ~/Downloads" every time. import.py already has specific importers/accounts/parameters, it could at least supply the ledger file - this might extend to a generic import.py with yaml input config.

Can archive just remove files? ofxget makes it easy to retrieve, I may only keep novel/representative statements. Or an alternative "delete" command.

Downloading statements? I could see "import.py fetch" calling per-importer fetch methods to fill the Downloads folder before extract. It would be nice to extend the importer classes for a simple coupling between fetched files and identify.

If any of these are "issue worthy" I could add to the repo. Might be able to hack on statement deletion or import flags myself.

Paul
Reply all
Reply to author
Forward
0 new messages