Additions to the CSV importer

63 views
Skip to first unread message

kuba....@gmail.com

unread,
Feb 12, 2021, 12:00:36 PM2/12/21
to Beancount
Hi all,

I'd like to unpack the comment from Dan regarding additions to the CSV importer

Also, if anything, I expect the CSV importer in beangulp to become
simpler, and not to gain new features. It was intended as an example to
demonstrate how to write an importer and grow into something with a know
for every aspect, and it deviated from its original scope.

So is there a definite plan to remove functionality, or in fact the CSV importer from bean gulp moving forward?

The feature I would like to add is the support for a currency column as I have some transactions from PayPal that are in multiple currencies which at the moment all come out as my default currency.

But I would like to understand whether I should be putting my effort into the beangulp importer or whether it would make more sense for me to create my own CSV importer moving forward.

Kind regards,
Jakub.


Martin Blais

unread,
Feb 12, 2021, 12:42:05 PM2/12/21
to Beancount
It's unclear.
I built a CSV importer as an example - that was the original purpose - and then I started using it, and since it works, well, more people started using it, and now it's a thing and it has a bunch of options.
But it's a thing without unit testing. I don't feel enthralled about supporting code without tests on my off time.  I'm not even sure all the options are compatible with each other.

So in beangulp we should either
a) clearly mark it as an example and make it as lame as possible (so that nearly everyone would want not to use it), or
b) blanket it with unit tests specifically crafted for all the myriad options it has grown to accept over time and commit to supporting it.

I think (b) would be favored, but it requires a moderate amount of work.

The ultimate CSV importer is something that could totally live within Beangulp, or even acquire its own repo (no strong preference there). I imagine a better CSV importer, one that automatically and intelligently sniff out the semantics of columns in the majority of cases and produce a Beancount file without any configuration, but that is also explicitly configurable if desired. In fact, the sniffing code should be its own library, and it should also be reused for importing Google Sheets spreadsheets, which are so incredibly useful for collaboration with other people unfamiliar with Beancount (they can input postings in there).

I think the OFX example should eventually migrate outside or be deleted.

In any case, Beangulp will not grow to include various random importers; I tried that a long time ago (LedgerHub) and there wasn't enough interest. It's also not clear how much reuse between users there is. I think we could create a repo like that, with an incredibly liberal PR acceptance policy, but the problem is that the easiest testing requires user files, so it'll become implicitly broken by nature. So perhaps many individual importers living in various repos is best.  It would be nice if we can create a unique tag so that they're easy to find, e.g., if you're looking for an existing importer for a particular bank and what-not, or a registry somewhere (a file in Beangulp? A Google Doc with a list?).





Daniele Nicolodi

unread,
Feb 12, 2021, 3:39:27 PM2/12/21
to bean...@googlegroups.com
On 12/02/2021 18:41, Martin Blais wrote:
> On Fri, Feb 12, 2021 at 12:00 PM kuba....@gmail.com
> <mailto:kuba....@gmail.com> <kuba....@gmail.com
> <mailto:kuba....@gmail.com>> wrote:
>
> Hi all,
>
> I'd like to unpack the comment from Dan regarding additions to the
> CSV importer
>
> /Also, if anything, I expect the CSV importer in beangulp to become
> simpler, and not to gain new features. It was intended as an example to
> demonstrate how to write an importer and grow into something with a know
> for every aspect, and it deviated from its original scope.
> /
> /https://groups.google.com/g/beancount/c/YhBQEh7xVdk
> <https://groups.google.com/g/beancount/c/YhBQEh7xVdk>/
>
> So is there a definite plan to remove functionality, or in fact the
> CSV importer from bean gulp moving forward?
>
> The feature I would like to add is the support for a currency column
> as I have some transactions from PayPal that are in multiple
> currencies which at the moment all come out as my default currency.

I think it is a worthy addition.
As Martin wrote, there are no concrete plans. I am not very familiar
with the code, but the current scheme of things is hard to maintain and
to use: in the couple of occasions that I had to roll an importer from a
CSV-style input files I found it easier to write one from scratch using
the csv module than to understand the gazilions of options of the CVS
importer. I think it would be much better if the CSV importer would
become a base class for people to build their importers upon via
subclassing. In this way the parameter space that would need to be
tested could be drastically reduced.

There are many other things I would like to work on before this, thus,
unless someone else does pick up the job, don't expect progress on this
any time soon.

Cheers,
Dan

kuba jamro

unread,
Feb 13, 2021, 2:12:36 PM2/13/21
to bean...@googlegroups.com
On Fri, 12 Feb 2021 at 21:39, Daniele Nicolodi <dan...@grinta.net> wrote:
On 12/02/2021 18:41, Martin Blais wrote:
> On Fri, Feb 12, 2021 at 12:00 PM kuba....@gmail.com
> <mailto:kuba....@gmail.com> <kuba....@gmail.com
> <mailto:kuba....@gmail.com>> wrote:
>
>     Hi all,
>
>     I'd like to unpack the comment from Dan regarding additions to the
>     CSV importer
>
>     /Also, if anything, I expect the CSV importer in beangulp to become
>     simpler, and not to gain new features. It was intended as an example to
>     demonstrate how to write an importer and grow into something with a know
>     for every aspect, and it deviated from its original scope.
>     /
>     /https://groups.google.com/g/beancount/c/YhBQEh7xVdk
>     <https://groups.google.com/g/beancount/c/YhBQEh7xVdk>/
>
>     So is there a definite plan to remove functionality, or in fact the
>     CSV importer from bean gulp moving forward?
>
>     The feature I would like to add is the support for a currency column
>     as I have some transactions from PayPal that are in multiple
>     currencies which at the moment all come out as my default currency.

I think it is a worthy addition.

I'm glad to hear it, so I knock something up (tested, of course) some evening soon.
From my perspective, it would be nice if there was at least one fully maintained importer to help people start and in my mind that's a coin toss between CSV and OFX.

If it were not for the CSV importer in the source tree, I would not have discovered the Mixin's which now feature in most of my importers. It is an opportunity for the project to set out the standard of what an importer should look like that others can base their own on. It also helps those less fluent in beancount, which is a bit of a learning curve, to see something tangible.

As progress seems to be going in the direction of splitting out source into repos specific for their responsibility, I would be pretty happy if the importer(s) had its own and perhaps all the importer helpers were also in their own library repository. 

Martin suggested exactly that and that seems to be the most versatile as developers would only need to clone the repo if they actually needed it.
 
There are many other things I would like to work on before this, thus,
unless someone else does pick up the job, don't expect progress on this
any time soon.

Cheers,
Dan

--
You received this message because you are subscribed to a topic in the Google Groups "Beancount" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beancount/O23OdsVUqUw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/c38d1146-c9cd-09d9-c1b9-9463d5a85c0f%40grinta.net.

Daniele Nicolodi

unread,
Feb 13, 2021, 3:15:32 PM2/13/21
to bean...@googlegroups.com
On 13/02/2021 20:12, kuba jamro wrote:
> From my perspective, it would be nice if there was at least one fully
> maintained importer to help people start and in my mind that's a coin
> toss between CSV and OFX.

I agree that it is nice to have an example importer, however, for this
to be effective in illustrating how to write an importer it should be as
simple as possible while being feature complete. The CVS importer has a
ton of code that has nothing to do with writing an importer but only
with jiggling CSV field values around.

> If it were not for the CSV importer in the source tree, I would not have
> discovered the Mixin's which now feature in most of my importers.

One thing Martin and I discussed briefly is to get rid of the mixins.
Mixins remember me of the old days in which I was hacking on the Zope2
codebase, but they can be replaced by easier paradigms (as testimonies
what happened to most of the Zope2 codebase...)

Which Mixins do you find useful, and why?

Currently, an importer is a class that implements 6 methods (one of
which is optional). The use of the mixins, in my opinion, makes the code
more complex without much benefits.

> As progress seems to be going in the direction of splitting out source
> into repos specific for their responsibility, I would be pretty happy if
> the importer(s) had its own and perhaps all the importer helpers were
> also in their own library repository.

New repository come with non zero overhead in maintenance and
coordination, especially in a phase where we are redesigning the
interfaces. I would wait to see i

> Martin suggested exactly that and that seems to be the most versatile as
> developers would only need to clone the repo if they actually needed it.

As you cannot run an importer withou beangulp, I don't think this would
save anyone anything, and the beangulp codebase is really tiny.

Cheers,
Dan

Martin Blais

unread,
Feb 13, 2021, 3:57:37 PM2/13/21
to Beancount
On Sat, Feb 13, 2021 at 3:15 PM Daniele Nicolodi <dan...@grinta.net> wrote:
On 13/02/2021 20:12, kuba jamro wrote:
> From my perspective, it would be nice if there was at least one fully
> maintained importer to help people start and in my mind that's a coin
> toss between CSV and OFX.

I agree that it is nice to have an example importer, however, for this
to be effective in illustrating how to write an importer it should be as
simple as possible while being feature complete. The CVS importer has a
ton of code that has nothing to do with writing an importer but only
with jiggling CSV field values around.

> If it were not for the CSV importer in the source tree, I would not have
> discovered the Mixin's which now feature in most of my importers.

One thing Martin and I discussed briefly is to get rid of the mixins.
Mixins remember me of the old days in which I was hacking on the Zope2
codebase, but they can be replaced by easier paradigms (as testimonies
what happened to most of the Zope2 codebase...)

Indeed. Mixins are opaque. I really don't like using them.
When you look at your class, it's not immediately obvious which implementation of which method is picked up and what it contains.
(Abstract inheritance is the only good kind of inheritance; the opacity that concrete inheritance brings is just never worth it in my experience.)
I'd much rather we provide a library with one-liner invocations to it you cut and paste in each importer.
 

Which Mixins do you find useful, and why?

Currently, an importer is a class that implements 6 methods (one of
which is optional). The use of the mixins, in my opinion, makes the code
more complex without much benefits.

> As progress seems to be going in the direction of splitting out source
> into repos specific for their responsibility, I would be pretty happy if
> the importer(s) had its own and perhaps all the importer helpers were
> also in their own library repository.

New repository come with non zero overhead in maintenance and
coordination, especially in a phase where we are redesigning the
interfaces. I would wait to see i

> Martin suggested exactly that and that seems to be the most versatile as
> developers would only need to clone the repo if they actually needed it.

As you cannot run an importer withou beangulp, I don't think this would
save anyone anything, and the beangulp codebase is really tiny.

Cheers,
Dan

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/9aa781e7-229f-18f9-4a52-8c307e8b5240%40grinta.net.
Reply all
Reply to author
Forward
0 new messages