load_file omits some entries (balances)

102 views
Skip to first unread message

Florian Lindner

unread,
May 9, 2019, 11:12:44 AM5/9/19
to Beancount
Hello,

I extended my importer to also work as transformer, applying the rules used at input on existing beancount files:

    entries, error, option_map = bc.loader.load_file(args.inputfile)

    transformed_entries = []
    for e in entries:
        transformed_entries.append(transform_txn(e) if type(e) == data.Transaction else e)

    with open(args.outputfile, "w") as f:
        bc.parser.printer.print_entries(transformed_entries, file = f)


My beancount files are organized as one main.beancount file which contains the open and pad directives and an account.beancount for each banking account.

To keep the transactions separate, I apply the transformation only on account.beancount (=args.filename). Naturally, there are several validation errors because of invalid accounts and failing balances. At this point, however, I don't care about the failing validations.

The problem is, that entries from load_file does not contain all transactions. I am not sure which transactions are omitted, but all balances are definitely left out. When I write out the transformed entries, information is lost.

How can I read in all entries of a file, get them as entries, but not perform any validation checks? Or otherwise, what is the best way to work around that?

Thanks!
Florian

Martin Blais

unread,
May 9, 2019, 8:24:07 PM5/9/19
to Beancount
I read your message twice. I don't understand what you're trying to do.



--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/fd082e96-693c-47ba-9f32-3425c7ff179c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Florian Lindner

unread,
May 10, 2019, 5:00:04 AM5/10/19
to Beancount
Am 10.05.19 um 02:23 schrieb Martin Blais:> I read your message twice. I don't understand what you're trying to do.
Sorry, for being unclear. I try to demonstrate it:

  % cat main.beancount
option "operating_currency" "EUR"

2018-05-01 open Equity:Opening-Balances

2018-05-01 open Assets:Giro
2018-05-01 open Expenses:Unknown

2018-05-01 pad Assets:Giro Equity:Opening-Balances
2018-05-02 balance Assets:Giro 10.00 EUR

include "giro.beancount"

  % cat giro.beancount
2018-05-11 * "Someone" "Somedesc"
  Assets:Giro       100.00 EUR
  Expenses:Unknown           

2018-05-12 balance Assets:Giro 110.00 EUR

  % bean-check main.beancount

Everything is fine.

Now, in Python:

  entries, errors, options = beancount.loader.load_file("main.beancount")

entries contains all entries, errors is empty.

But if I load the giro.beancount file:

  entries, errors, options = beancount.loader.load_file("giro.beancount")

errors contain Balance and Validation errors. Sure, because the account declarations and padding from main.beancount are missing.

The problem I have, is that entries does not contain all entries from the file. Here, it contains only one transaction, the balance is omitted.

Now, if I do some changes on the entries and write them out again using

  beancount.parser.printer.print_entries(transformed_entries, file = f)

balances are not written to the file and information is lost.

Because I want to read all entries of one file (which not necessarily validates) and write them out to the same file, this is a problem for me.

A workaround I see, is to read in main.beancount and write out the entries to different files based on entries[6].meta["filename"]. Basically rewriting the entire ledger.

I hope, I was able to express my problem.

Best Thanks,
Florian

Martin Blais

unread,
May 10, 2019, 8:32:49 PM5/10/19
to Beancount
I see.
Well FWIW, entries which have errors are not guaranteed to show up in the output stream at all.
It's unclear to me whether this is always the best outcome, but a long while ago I decided to do this for transactions and for some other directives.

I don't have a solution for you. This is an unusual case.





--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Florian Lindner

unread,
May 13, 2019, 10:36:51 AM5/13/19
to Beancount
I see.
Well FWIW, entries which have errors are not guaranteed to show up in the output stream at all.
It's unclear to me whether this is always the best outcome, but a long while ago I decided to do this for transactions and for some other directives.

I don't have a solution for you. This is an unusual case.

I tried to apply the workaround I mentioned:

    entries, error, option_map = bc.loader.load_file(args.inputfile)
    sorted_entries = {} # file -> list of entries

    for e in entries:
        entry = transform_txn(e) if type(e) == data.Transaction else e
        name = entry.meta["filename"]
        sorted_entries[name] = sorted_entries.get(name, []) + [entry]
  
    for filename in sorted_entries:
        with open(filename, "w") as f:
            bc.parser.printer.print_entries(sorted_entries[filename], file = f)

A problem that shows up, is that in main.beancount I have some options set (e.g. operation_currency). They don't show up in entries, but in option_map. However, I don't know how to write them to file.

Another idea: At a first try, it seems that reading the entire file into a string and use beancount.parser.parser.parse_many would work and also parser the balances:

    with open(args.inputfile, "r") as f:
        instr = f.read()
   
    entries = bc.parser.parser.parse_many(instr)

Seems to work fine so far. What do you think?

Best,
Florian

Martin Blais

unread,
May 14, 2019, 8:57:21 PM5/14/19
to Beancount
But why are you trying to do this? What's your purpose?

> A workaround I see, is to read in main.beancount and write out the entries to different files based on entries[6].meta["filename"]. Basically rewriting the entire ledger.

I was going to suggest this. 
Still, when you write entries out, they won't look precisely the same as the input. Numbers will have been filled in, cost bases will show up, etc. I don't see the point.



--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Florian Lindner

unread,
May 15, 2019, 4:12:48 AM5/15/19
to Beancount
Hi,

Am 15.05.19 um 02:57 schrieb Martin Blais:> But why are you trying to do this? What's your purpose?
My importer applies a set of rules to convert payee names and assign certain kind of transactions to accounts:

# List of tuples (regular expression, replacement)
payee_replacements = [
    ("^AMAZON", "Amazon"),
]

# List of tuples (python expression to match, second account to set)
accounts_assignments = [
    ("desc == 'Miete PSW 1'", "Expenses:Miete"),
    ("payee in ['REWE', 'Kaufland', 'ALDI']", "Expenses:Groceries"),
    ("True", "Expenses:Unknown")
]


def transform_txn(txn):
    payee = txn.payee

    for pattern, substitute in payee_replacements:
        if re.match(pattern, payee):
            payee = substitute
            break

    txn = txn._replace(payee = payee)

    local_vars = {"payee" : txn.payee, "desc" : txn.narration, "buchungsart" : txn.meta["buchungsart"]}   
    if txn.postings[1].account == "Expenses:Unknown":
        for expr, acc in accounts_assignments:
            if eval(expr, local_vars):
                account = acc
                break

        txn.postings[1] = txn.postings[1]._replace(account = account)
       
    return txn


These two rulesets are applied on import.

I want to also apply them on existing ledgers.

Usecase: I identify a recurring transaction pattern, such as "buchungsart == 'GAA,Spk.Netz'. All matching transaction to imported as well as existing ones should have the account "Assets:Bargeld" assigned. For that, I need a method to read in all transactions, transform them and write them to a beancount file.

This is my solution to this question: https://groups.google.com/forum/#!topic/beancount/e93VI4s4YCQ

An alternative approach are plugins. So far I understand plugins they only apply live transformations, i.e., they transform data as it is loaded from a file, but do not write back the data to the file.


>> A workaround I see, is to read in main.beancount and write out the entries to different files based on entries[6].meta["filename"]. Basically rewriting the entire ledger.
>
> I was going to suggest this.
> Still, when you write entries out, they won't look precisely the same as the input. Numbers will have been filled in, cost bases will show up, etc. I don't see the point.
Yes, I have noticed that, but that seems ok to me.

I hope I was able to explain my use case. I am open to any thoughts and ideas to achieve that differently.

Best Regards,
Florian


>
>
>
> On Mon, May 13, 2019 at 10:36 AM Florian Lindner <mailin...@xgm.de <mailto:mailin...@xgm.de>> wrote:
>
>         I see.
>         Well FWIW, entries which have errors are not guaranteed to show up in the output stream at all.
>         It's unclear to me whether this is always the best outcome, but a long while ago I decided to do this for transactions and for some other directives.
>         https://bitbucket.org/blais/beancount/src/d1b2cbf2841669e988f6692ec1d39db3708730cc/beancount/ops/balance.py#lines-119
>
>         I don't have a solution for you. This is an unusual case.
>
>
>     I tried to apply the workaround I mentioned:
>
>         entries, error, option_map = bc.loader.load_file(args.inputfile)
>         sorted_entries = {} # file -> list of entries
>
>         for e in entries:
>             entry = transform_txn(e) if type(e) == data.Transaction else e
>             name = entry.meta["filename"]
>             sorted_entries[name] = sorted_entries.get(name, []) + [entry]
>      
>         for filename in sorted_entries:
>             with open(filename, "w") as f:
>                 bc.parser.printer.print_entries(sorted_entries[filename], file = f)
>
>     A problem that shows up, is that in main.beancount I have some options set (e.g. operation_currency). They don't show up in entries, but in option_map. However, I don't know how to write them to file.
>
>     Another idea: At a first try, it seems that reading the entire file into a string and use |beancount.parser.parser.||parse_many|would work and also parser the balances:

Martin Blais

unread,
May 16, 2019, 7:01:26 PM5/16/19
to Beancount
Alright now I see what you want to do.
You want to rewrite your payees, but in the source file itself.
That's a nice idea.

However, I don't think you'll be able to put together a nice solution with rewriting after processing.
I would work off the source text itself.
Or even better: as a combination of both.
Here's what you could do: parse the entire thing, filter just the transactions.
For each transaction  you have the filename and line number.
Do whatever remapping / processing / cleaning you want to do on the payee names in your script.
Then, process each file, using a regexp to replace the first string that occurs on the lines where you have transactions with renamed payees.

This is better than working purely from the source file because you won't have to write a full alternative parser to make your replacements; all you need to ace is replacement of the first string on those transaction lines and leave all the other lines untouched. Should be pretty easy and robust enough (tip: make sure you safeguard your files in a git/hg repo and diff just in case). The benefit is your source files will keep all the other formatting and comments and spacing and and ordering and whatever else.

This is how I'd go about this.
I think it would even be possible to template this and provide helper functions.








--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Stefano Zacchiroli

unread,
May 17, 2019, 4:37:59 AM5/17/19
to bean...@googlegroups.com
On Thu, May 16, 2019 at 07:01:12PM -0400, Martin Blais wrote:
> Alright now I see what you want to do.
> You want to rewrite your payees, but in the source file itself.
> That's a nice idea.

JFTR, a source-to-source equivalent of the current plugin system is
something I'd love to have too. My typical use case is "fixing the
past", e.g., when I refactor the account hierarchy or more generally
find a better way to account for something and I want to automate the
process of fixing past entries.

--
Stefano Zacchiroli . za...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Florian Lindner

unread,
May 19, 2019, 3:02:24 PM5/19/19
to Beancount
Am 17.05.19 um 01:01 schrieb Martin Blais:> Alright now I see what you want to do.

> You want to rewrite your payees, but in the source file itself.
> That's a nice idea.
Thanks!

> However, I don't think you'll be able to put together a nice solution with rewriting after processing.
> I would work off the source text itself.
> Or even better: as a combination of both.
> Here's what you could do: parse the entire thing, filter just the transactions.
> For each transaction  you have the filename and line number.
> Do whatever remapping / processing / cleaning you want to do on the payee names in your script.
> Then, process each file, using a regexp to replace the first string that occurs on the lines where you have transactions with renamed payees.
>
> This is better than working purely from the source file because you won't have to write a full alternative parser to make your replacements; all you need to ace is replacement of the first string on those transaction lines and leave all the other lines untouched. Should be pretty easy and robust enough (tip: make sure you safeguard your files in a git/hg repo and diff just in case). The benefit is your source files will keep all the other formatting and comments and spacing and and ordering and whatever else.
>
> This is how I'd go about this.
> I think it would even be possible to template this and provide helper functions.
Ok, I understand what you're suggesting, but I am not really sure if that is the way to go. For an easy case, such as replacing payees it is ok, but I think for more complex tasks, like adding new meta data fields, changing accounts, or even splitting transactions between accounts a search-and-replace approach will evolve into just rewriting the entire transactions in the source file from the Transaction object.

Right now, I think reading in the beancount file into a string, parse them using bc.parser.parser.parse_many and perform the transformations is the best way for me. Then, rewrite the entire file using bc.parser.printer.print_entries.


You wrote:

> Still, when you write entries out, they won't look precisely the same as the input. Numbers will have been filled in, cost bases will show up, etc. I don't see the point.

Given my very simple transactions, e.g.,

2018-05-20 * "KREDITKARTENABRECHNUNG" "18.05.18 1234"
  buchungsart: "Lastschr.Kreditkarten"
  empfaenger: " / "
  hash: "e4a580e7002e606a4314f864f64f30a12fb8673f"
  Assets:Giro       -115.00 EUR
  Expenses:Unknown            

Rewriting the ledger file, as I mentioned above, does not change an entry like that.

As I just use simple beancount syntax, but potentially want to use more, do you consider that kind of rewriting a problem?

In the long run, I think a rewriting protocol would make a beneficial addition to beancount, as Stefano also suggested.

Maybe something like the importer protocol:

class MyRewriter:

  def rewriteTransaction(self, txn):
    return txn

  def rewriteBAlance(self, bal):
    return bal

or alike, one function for all types. Then you invoke bean-rewrite on a file or a set of transactions. Just a first idea...  ;-)

Best Regards,
Florian

Florian Lindner

unread,
May 19, 2019, 3:04:11 PM5/19/19
to bean...@googlegroups.com
Am 17.05.19 um 01:01 schrieb Martin Blais:
> Alright now I see what you want to do.
> You want to rewrite your payees, but in the source file itself.
> That's a nice idea.

Thanks!

> However, I don't think you'll be able to put together a nice solution with rewriting after processing.
> I would work off the source text itself.
> Or even better: as a combination of both.
> Here's what you could do: parse the entire thing, filter just the transactions.
> For each transaction  you have the filename and line number.
> Do whatever remapping / processing / cleaning you want to do on the payee names in your script.
> Then, process each file, using a regexp to replace the first string that occurs on the lines where you have transactions with renamed payees.
>
> This is better than working purely from the source file because you won't have to write a full alternative parser to make your replacements; all you need to ace is replacement of the first string on those transaction lines and leave all the other lines untouched. Should be pretty easy and robust enough (tip: make sure you safeguard your files in a git/hg repo and diff just in case). The benefit is your source files will keep all the other formatting and comments and spacing and and ordering and whatever else.
>
> This is how I'd go about this.
> I think it would even be possible to template this and provide helper functions.

Ok, I understand what you're suggesting, but I am not really sure if that is the way to go. For an easy case, such as replacing payees it is ok, but I think for more complex tasks, like adding new meta data fields, changing accounts, or even splitting transactions between accounts a search-and-replace approach will evolve into just rewriting the entire transactions in the source file from the Transaction object.

Right now, I think reading in the beancount file into a string, parse them using bc.parser.parser.parse_many and perform the transformations is the best way for me. Then, rewrite the entire file using bc.parser.printer.print_entries.

You wrote:

> Still, when you write entries out, they won't look precisely the same as the input. Numbers will have been filled in, cost bases will show up, etc. I don't see the point.

Given my very simple transactions, e.g.,

2018-05-20 * "KREDITKARTENABRECHNUNG" "18.05.18 1234"
buchungsart: "Lastschr.Kreditkarten"
empfaenger: " / "
hash: "e4a580e7002e606a4314f864f64f30a12fb8673f"
Assets:Giro -115.00 EUR
Expenses:Unknown

Rewriting the ledger file, as I mentioned above, does not change an entry like that.

As I just use simple beancount syntax, but potentially want to use more, do you consider that kind of rewriting a problem?

In the long run, I think a rewriting protocol would make a beneficial addition to beancount, as Stefano also suggested.

Maybe something like the importer protocol:

class MyRewriter:

def rewriteTransaction(self, txn):
return txn

def rewriteBAlance(self, bal):
return bal

or alike, one function for all types. Then you invoke bean-rewrite on a file or a set of transactions. Just a first idea... ;-)

Best Regards,
Florian


> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com <mailto:beancount+...@googlegroups.com>.
> To post to this group, send email to bean...@googlegroups.com <mailto:bean...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/a404f3ac-e9db-4470-b15c-4ee2bc611525%40googlegroups.com <https://groups.google.com/d/msgid/beancount/a404f3ac-e9db-4470-b15c-4ee2bc611525%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com <mailto:beancount+...@googlegroups.com>.
> To post to this group, send email to bean...@googlegroups.com <mailto:bean...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhMvPQ7_8hY98oi1mFXvcJ4X0yishrUNiSVeV76Cq1d_kA%40mail.gmail.com <https://groups.google.com/d/msgid/beancount/CAK21%2BhMvPQ7_8hY98oi1mFXvcJ4X0yishrUNiSVeV76Cq1d_kA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages