[beancount] keeps choking on my input file

Martin Andreas Andersen

unread,

Sep 3, 2014, 3:15:41 PM9/3/14

to ledge...@googlegroups.com

I have the following (beginnings of) a ledger:

; Indstillinger
option "title" "Bogføring for familien andersen-jensen"

option "name_assets"                    "Aktiver"
option "name_liabilities"               "Passiver"
option "name_equity"                    "Kapital"
option "name_income"                    "Indtægter"
option "name_expenses"                  "Udgifter"

option "account_previous_balances"      "Kapital:OpstartsBalance"
option "account_previous_earnings"      "Indtjening:Tidligere"
option "account_previous_conversions"   "Væksling:Nuværende"
option "account_current_earnings"       "Indtjening:Nuværende"
option "account_current_conversions"    "Væksling:Tidligere"
option "conversion_currency"            "INTET"
option "operating_currency"             "DKK"
option "documents"                      "./Dokumenter/"

; Definer konti
1984-01-01 open Kapital:OpstartsBalance

(I have mapped accounts to equivalent danish names in these settings - the error below still occurs)

bean-check gives the following error:

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:19: syntax error, unexpected ERROR, expecting ACCOUNT

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:19: Lexer error; erroneous token: 'Kapital:OpstartsBalance'

I've tried removing all settings, copying examples over from the docs, retyping directives.. still get errors. I even thought the occasional danish character ('æ-ø-å') might give issues... but removing those did not help either.

I can bean-check the examples from the source just fine, with no errors.

Any suggestions? I'm rather stumped at the moment.

- Martin

Martin Blais

unread,

Sep 3, 2014, 4:44:35 PM9/3/14

to ledger-cli

Hi Martin,

I just quickly cat'ed your example to a file on my machine and ran bean-check on it, and I get this:

blais-macbookair:~/tmp$ bean-check andersen.beancount

/Users/blais/tmp/andersen.beancount:0: Document root '/Users/blais/tmp/Dokumenter' does not exist

/Users/blais/tmp/andersen.beancount:20: Unused account 'Kapital:OpstartsBalance'

1984-01-01 open Kapital:OpstartsBalance

Which is exactly what I'd expect.

Which platform are you on?

Which version are you running?

Can you try updating to the tip of the default branch, run "make build" and let me know if it works?

Note: unicode is yet untested (I'm using old GNU tools from way back for parsing), but it might work.

P.S. Ledgerians: I'll be creating a mailing-list dedicated to Beancount soon, so Beancount-specific questions can be moved there eventually.

--

---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Andreas Andersen

unread,

Sep 4, 2014, 12:21:56 AM9/4/14

to ledge...@googlegroups.com

Which platform are you on?

Linux Mint 17/Ubuntu.

Which version are you running?

Latest default branch (updated yesterday)

Can you try updating to the tip of the default branch, run "make build" and let me know if it works?

Sure :)

Note: unicode is yet untested (I'm using old GNU tools from way back for parsing), but it might work.

My first suspicion was actually that Unicode was the problem. Apparently not, though...

P.S. Ledgerians: I'll be creating a mailing-list dedicated to Beancount soon, so Beancount-specific questions can be moved there eventually.

+1 :)

You received this message because you are subscribed to a topic in the Google Groups "Ledger" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ledger-cli/C6GOOj8kGtQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ledger-cli+...@googlegroups.com.

Martin Andreas Andersen

unread,

Sep 4, 2014, 3:10:13 AM9/4/14

to ledge...@googlegroups.com

Updated beancount, everything works as expected. However, if I add the directive

1984-01-01 open Aktiver:NørresundbyBank:Nemkonto

i get this error:

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:21: syntax error, unexpected ERROR, expecting ACCOUNT

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:21: Lexer error; erroneous token: 'Aktiver:NørresundbyBank:Nemkonto'

which seems to indicate the parser can't handle the 'ø'. For now, I can replace the non-unicode characters.

How hard would it be to add unicode support? And where would I look in the source, if I wanted to hack at it?

Martin Blais

unread,

Sep 4, 2014, 9:59:24 AM9/4/14

to ledger-cli, bean...@googlegroups.com

On Thu, Sep 4, 2014 at 3:10 AM, Martin Andreas Andersen <martin.andr...@gmail.com> wrote:

Updated beancount, everything works as expected. However, if I add the directive

1984-01-01 open Aktiver:NørresundbyBank:Nemkonto

i get this error:

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:21: syntax error, unexpected ERROR, expecting ACCOUNT

/home/martin/Dropbox/Documents/Finances/Budget/test.beancount:21: Lexer error; erroneous token: 'Aktiver:NørresundbyBank:Nemkonto'

which seems to indicate the parser can't handle the 'ø'. For now, I can replace the non-unicode characters.

That's a good workaround.

How hard would it be to add unicode support? And where would I look in the source, if I wanted to hack at it?

I knew this day would come, but I did not expect it would be so soon.

So here's the context: I've been using flex and bison3, and the main reason for that is that I'm really sensitive about dependencies. I feel that using old tools that are available literally everywhere and the C language makes it much easier to deal with the gigantic cosmic mess that is installation and portability. So I've stuck with these old crochetty tools for a reason (they're not even that easy to use, so it's actually a bit of a liability, but I really like the ease of installation it procures, and you benefit when all it takes is a 2 sec "make build" that just works so it's worth it IMO).

Now, I've been unhappy with flex's ability to handle word boundaries and error reporting, so I have considered manually writing my own lexer recently, it's not very hard, just haven't had time. This could be done with unicode support in C. Also, flex apparently produces 8-bit clean output and technically it should be possible to make it grok UTF-8, but I haven't tried it. As for bison, I think that if the lexer spits out wchar tokens or whatever I don't really see why it shouldn't be able to handle unicode. Python can convert encoded C through its API, so once the Python builder callbacks get invoked, Bob's your uncle and everything else should work. But it isn't a trivial project nor a small "few lines" patch.

The code is under

beancount/src/python/beancount/parser/...

look at files:

lexer.l

lexer.py

grammar.y

parser.c

parser.py

I'll consider anything if you wanted to submit a patch for such a big change; overall my criteria for automatically including a large change in the parsing tech stack are (1) C code only, no C++ or exotic languages, (2) absolute minimal dependencies on third-party packages, all the best if even the code it depends on is itself written in just C, and (3) it runs on Linux & Mac OS X, or at least generates code that does. Also, it should be easy to write a second parser module in _parallel_ with the existing one (e.g. what gets compiled as beancount/parser/_parser.so) and reusing all the other bits of Python, the builder, etc. - we could very easily have two or more parser implementations in the interest of transition and experimentation.

Finally, my dev priorities at the moment: finish example & documentation, then implement all reports to text, a filtering expression syntax, implement the new inventory booking proposal (to support average cost booking and cost basis adjustments), and then all the other stuff comes after. So Unicode is farther down the line unfortunately. I hope you can be happy with romanization of those characters for a bit.

So in summary: not a trivial project, if you can wait, I'll handle it myself eventually (think: maybe within a year). If you can't wait, you're welcome to have a go at it and send some code.

Thank you,

Martin Andreas Andersen

unread,

Sep 5, 2014, 4:14:02 PM9/5/14

to bean...@googlegroups.com, ledge...@googlegroups.com

Hi Martin,

thats... way out of my depth :/ oh well. I can wait.

Finally, my dev priorities at the moment: finish example & documentation, then implement all reports to text, a filtering expression syntax, implement the new inventory booking proposal (to support average cost booking and cost basis adjustments), and then all the other stuff comes after. So Unicode is farther down the line unfortunately. I hope you can be happy with romanization of those characters for a bit.

Sounds good :)

So in summary: not a trivial project, if you can wait, I'll handle it myself eventually (think: maybe within a year). If you can't wait, you're welcome to have a go at it and send some code

I think I'll go with "wait" then. Romanized account names are fine, just not very pretty.

-- Martin

Reply all

Reply to author

Forward