beangulp: custom output (eg: one file per account)

163 views
Skip to first unread message

Red S

unread,
Oct 19, 2022, 3:38:17 AM10/19/22
to Beancount
I use one file per real-world account, which looks like:
beancount/
  main.bc # includes the other files
  accounts/
     Assets.Banks.ABCDBank.Checking.bc
     Assets.Investments.Taxable.BTrade.bc
     Liabilities.Credit-Cards.Blue-Mastercard-0123.bc

I use this patch to extract.py in beancount to have my importers output to multiple files.

From this mailing list, I gather several users use organizations that involve multiple files, split in different ways. So I'm wondering whether custom importer output is something beangulp would consider supporting. Happy to think about a patch for this if so.

One could always write a separate tool to take the output stream from beangulp and do this. Though that would still involve changes to beangulp. For example, not commenting out the deduped transactions, and not removing the duplicate metadata marker, so that a loader can subsequently load them.

Martin Blais

unread,
Oct 19, 2022, 7:15:52 AM10/19/22
to Beancount
On Wed, Oct 19, 2022, 03:38 Red S <redst...@gmail.com> wrote:
I use one file per real-world account, which looks like:
beancount/
  main.bc # includes the other files
  accounts/
     Assets.Banks.ABCDBank.Checking.bc
     Assets.Investments.Taxable.BTrade.bc
     Liabilities.Credit-Cards.Blue-Mastercard-0123.bc

I use this patch to extract.py in beancount to have my importers output to multiple files.

From this mailing list, I gather several users use organizations that involve multiple files, split in different ways. So I'm wondering whether custom importer output is something beangulp would consider supporting. Happy to think about a patch for this if so.

I like the idea, see other email, especially if we think of this as a new stage that can insert transactions in existing files. These ideas are intertwined. Ideally by default the current behavior would be the same. Sure.


One could always write a separate tool to take the output stream from beangulp and do this. Though that would still involve changes to beangulp. For example, not commenting out the deduped transactions, and not removing the duplicate metadata marker, so that a loader can subsequently load them.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/47a78872-b571-4d97-9126-357ad25d46aen%40googlegroups.com.

Red S

unread,
Oct 19, 2022, 9:59:25 PM10/19/22
to Beancount
On Wednesday, October 19, 2022 at 4:15:52 AM UTC-7 bl...@furius.ca wrote:
On Wed, Oct 19, 2022, 03:38 Red S <redst...@gmail.com> wrote:
I use one file per real-world account, which looks like:
beancount/
  main.bc # includes the other files
  accounts/
     Assets.Banks.ABCDBank.Checking.bc
     Assets.Investments.Taxable.BTrade.bc
     Liabilities.Credit-Cards.Blue-Mastercard-0123.bc

I use this patch to extract.py in beancount to have my importers output to multiple files.

From this mailing list, I gather several users use organizations that involve multiple files, split in different ways. So I'm wondering whether custom importer output is something beangulp would consider supporting. Happy to think about a patch for this if so.

I like the idea, see other email, especially if we think of this as a new stage that can insert transactions in existing files. These ideas are intertwined. Ideally by default the current behavior would be the same. Sure.
This would get useful indeed. Maybe we should support an importer returning a Dict[str, List[Transaction]] or adding metadata with the intended "file group", and a separate stage that writes out files (or better: inserts the transaction group before unique tags from a set of Beancount files).
Now that there are three independent ideas in the previous paragraph:
- file groups so that an importer can return more than one list of files
- routing these (or just the output of each importer) to multiple locations
- automatic insertion of imported transactions in a specific place in a file
 
Hello Martin,
Great, thanks for the ideas. I'll put it on my list and get around to it when time opens up.

Meanwhile, beangulp has several major blockers for me, all of which seem like regressions from the former beancount.ingest. See my other threads from this week:
- dedupe is broken when specifying multiple input files. PR is here but not yet merged
- multiple importers on the same file not supported
- PyPI releases not available, meaning I can't move beancount_reds_importers to beangulp, which I'd like to do

Any idea if there are plans to address the above?

Thanks again!

Red S

unread,
Oct 21, 2022, 12:32:33 PM10/21/22
to Beancount
Hello Martin,
Is beangulp being maintained? And if so, how can I find out answers to the questions below?

Thanks,
Red S.

Daniele Nicolodi

unread,
Oct 21, 2022, 3:31:54 PM10/21/22
to bean...@googlegroups.com
On 21/10/2022 18:32, Red S wrote:

> Is beangulp being maintained? And if so, how can I find out answers to
> the questions below?

I have been taking care of beangulp lately. However, you have very high
expectations regarding the response time to requests regarding an
unpublished open source package that cannot be met in the free time I
can allocate to work beangulp.

> Meanwhile, beangulp has several major blockers for me, all of which
> seem like regressions from the former beancount.ingest. See my other
> threads from this week:

> - dedupe is broken <https://github.com/beancount/beangulp/issues/93>
> when specifying multiple input files. PR is here but not yet merged
> <https://github.com/beancount/beangulp/pull/64>

beancount.ingest had the same exact issue. I haven't merged the patches
that I wrote because I haven't yet written the tests to demonstrate that
they work.

> - multiple importers on the same file
> <https://groups.google.com/g/beancount/c/G-GhFeBLnO4/m/AKuLgYDPAQAJ>
> not supported

To make document archival easily predictable and idempotent, beangulp
assumes is that there is one and only one importer per file. I don't see
what advantage there is in splitting the code that handles one file into
multiple importers.

If you already have multiple importers, it is trivial to implement a
"multiplexer" importer that implements the identify(), account(),
date(), and filename() methods and that delegates the extract() method
to other importers instances.

> - PyPI releases
> <https://groups.google.com/g/beancount/c/G-GhFeBLnO4/m/AKuLgYDPAQAJ>
> not available, meaning I can't move beancount_reds_importers
> <https://github.com/redstreet/beancount_reds_importers> to beangulp,
> which I'd like to do

beangulp is not yet ready for a release. I thought that the lack of a
release would have been a strong enough warning to possible users to
expect rough edges. If beancount.ingest works for you there is no reason
for switching to beangulp.

Cheers,
Dan

Reply all
Reply to author
Forward
0 new messages