A quick word about CSV importers

421 views
Skip to first unread message

Martin Blais

unread,
Dec 5, 2021, 4:20:21 PM12/5/21
to Beancount
Hi!

It's been a while since I've done much, but a few weekends ago I rewrote all my CSV importers.
I had new changes to update my code for, and I was also behind on updating from changes from updates in beangulp.
Some nice experience came out of it.

I had been unhappy with the object-oriented mixins and CSV importer that's in beangulp for a long time. 
Looking around for which file provided which implementation was always a bit annoying.
It's a lot simpler to have a single protocol (beangulp.Importer) with all abstract methods and just implementations of that (no inheritance of functionality).
In fact, even if I have to duplicate some code in the implementation, I'm still happier with the result that way.
The simplicity is worth the repetition and having all the code locally visible in a single file is advantageous, especially since this is the type of thing that you end up doing reluctantly (in general when I'm doing accounting imports the last thing I want to do is having to hack to adapt code due to changed file formats; the easier I can make it the better).

As it turns out, a heavily configurable CSV importer is not best served by a class + config abstraction. It's a lot simpler to read and massage the input table with "petl" to convert the types (dates and numbers, mostly), normalize the column names and then call a generic little helper function to construct Transaction instances. For many of my simple CSVs, I've been using this extremely simple helper:
and these parser functions:
The petl code really is as simple - and much more powerful - than a custom configuration that attempts to support all variations and think ahead about all the possibilities.
This is the key: that code *is* the transformation configuration, and the petl API is quite elegant and minimal in that way.
(If you're interested in more involved usage of petl you can look here: https://github.com/beancount/johnny/tree/master/johnny/sources)

Here's an example of such a CSV importer using petl (but not the helper above, this one creates transactions for groups of rows with the same id):

What I ended up with is so much easier to work with when debugging is needed that I'm tempted to declare the CSV importer implementation that's in beangulp deprecated.
I have no intention of adding to that functionality going forward.
I think we should even probably delete the mixins and it on the next release. I have a feeling nobody's been using them anyway (nobody ever asked questions about them, I was probably alone using them) and it's less code to maintain. If you rely on them say something. 
We could add a tag for the last version with them available.

Any thoughts?

Ben Blount

unread,
Dec 5, 2021, 4:40:57 PM12/5/21
to Beancount
Sounds great, and aligned with the ethos of v3 splitting and generalizing.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhPNuL1yFhzn91pAgHRKBaG0r8%2BYMhzOKNcj4-kb65%3D_mw%40mail.gmail.com.

Alan H

unread,
Dec 7, 2021, 4:27:39 PM12/7/21
to Beancount
I'm in agreement too - having recently suffered exactly this issue; remembering how I implemented the importers and fixing some dates (and wishing I had written unittests). 

Martin; is beanbuff a private repo? I don't see the example CSV importer in github.

Alan

Martin Blais

unread,
Dec 7, 2021, 4:43:28 PM12/7/21
to Beancount
On Tue, Dec 7, 2021 at 4:27 PM Alan H <al...@polyphase.ca> wrote:
I'm in agreement too - having recently suffered exactly this issue; remembering how I implemented the importers and fixing some dates (and wishing I had written unittests). 

Martin; is beanbuff a private repo? I don't see the example CSV importer in github.

Ooops you're right. I think I took it private a while ago because the work I was doing on Johnny (https://github.com/beancount/johnny) was obsoleting large chunks of beanbuff. 
I made it public again, and I'll update the README instead. I have to update the importers in there that intersect with Johnny (which has much more sophisticated imports for those).


 

Reed Law

unread,
Dec 24, 2021, 8:11:37 AM12/24/21
to Beancount
I was not able to get this importer (or v3) to run so I wrote a Coinbase Pro importer for v2: https://github.com/reedlaw/beancount_coinbase_pro

One thing I found is that the transaction logs don't record the network fees for withdrawing cryptocurrencies. You either have to separately import the wallet transactions or manually copy the fees from the Coinbase Pro "Withdrawal" tab for each token.

Martin Blais

unread,
Dec 24, 2021, 1:30:02 PM12/24/21
to Beancount
I used 
  (User) > Statements > Statements Tab > Generate > Account
but I didn't realize there were separate network fees.
How do you find them in CB pro?
The fees I see appear to match the "Fee" lines in the All Activity tab.

I have to say two things:
- the fees are crazy high, it doesn't bode well for this idea of a future where crypto is used for payments. I'm relatively new to this this year, and I was stunned by the amounts of the fees when I first looked at it.
- the data coming out of Coinbase is not great. Why can't they report the price of transactions? That wouldn't be hard. You basically have to download and compute yourself. I have no idea why they do that.





Reed Law

unread,
Dec 29, 2021, 8:54:47 AM12/29/21
to Beancount
In Coinbase Pro under the Portfolio > Withdrawals tab you can expand each line item to see the fee, subtotal, etc. The Account statement doesn't include a Fee column so there's no way to export the fees for withdrawals. The Fills statement does show fees for trades.

There are a number of solutions to high network fees. The currently viable option is to bridge tokens to alternative chains such as Avalanche, Polygon, Solana, etc. When Ethereum finally transitions to Proof-of-Stake there are supposed to be improvements to the scalability and high fee issues.
Reply all
Reply to author
Forward
0 new messages