A couple questions about importers

457 views
Skip to first unread message

Patrick Ruckstuhl

unread,
Feb 3, 2018, 12:21:14 PM2/3/18
to bean...@googlegroups.com
Hi,

I'm looking at some specific cases for importers and wondering what the
best way to tackle this is:


-API based import

For some transactions there's a REST api I can use to fetch the
transactions.

So there is no file to use with the bean-extract framework.

I currently created a custom script which works but I would like to
unify the different importers if possible.


- Using prices in imports

For some imports I would like to enhance the transactions with prices
based on current/daily price, I'm currently fetching and storing prices
in beancount, so prices are available in the beancount file but I'm not
sure what the best way to hook this into the importer framework is


How do you deal with things like that? Do you ignore bean-extract tools
and just create independent scripts?


Regards,

Patrick


Patrick Ruckstuhl

unread,
Feb 3, 2018, 12:58:02 PM2/3/18
to Beancount
Just got an idea for the api import (always getting those ideas after asking about it on a mailing list). I just create a dummy file for the importer, this file can actually even be the config with api key and so on for doing the import

Martin Blais

unread,
Feb 3, 2018, 2:47:18 PM2/3/18
to Beancount
On Sat, Feb 3, 2018 at 12:21 PM, 'Patrick Ruckstuhl' via Beancount <bean...@googlegroups.com> wrote:
Hi,

I'm looking at some specific cases for importers and wondering what the
best way to tackle this is:


-API based import

For some transactions there's a REST api I can use to fetch the
transactions.

So there is no file to use with the bean-extract framework.

I currently created a custom script which works but I would like to
unify the different importers if possible.

The original design doc for the importers (http://furius.ca/beancount/doc/ingest-design-doc) contains a "fetch" stage, that was intended to support automation of downloading the files. However, given the difficulty in doing that over several financial institutions (e.g., scraping modern websites is often made really difficult by fancy libraries and web UI toolkits), I gave up on that step. There is also very little common code that I could provide to meaningfully help with that.  (And OFX doesn't deliver widely enough IMO.)

I submit that you should write a separate script to automate the download and then run bean-extract on that.
(At least you won't have to log in manually.)


- Using prices in imports

For some imports I would like to enhance the transactions with prices
based on current/daily price, I'm currently fetching and storing prices
in beancount, so prices are available in the beancount file but I'm not
sure what the best way to hook this into the importer framework is

Fetching prices automatically /is/ OTOH intended to be automated.
(Note that we're in a funny situation right now with both Yahoo and Google Finance APIs disabled.)

These are two separate processes at the moment; run one, then the other.
Concatenate to a file if you want to.


How do you deal with things like that? Do you ignore bean-extract tools
and just create independent scripts?

Yep.
In any case you'll end up with a patchwork of scripts for part of the job, I just don't believe there's any way around it.
You're welcome to try your hand at creating reusable code to do this, but IMO it's all over the place and would be very difficult.

Another idea would be to reuse other people's work in automating fetching and converting to another common data format, e.g. Yodlee (costs money) or convert from a Mint download.


 
Regards,

Patrick


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/29e70aca-b985-5291-38fc-93c2d80956d4%40ch.tario.org.
For more options, visit https://groups.google.com/d/optout.

Martin Blais

unread,
Feb 3, 2018, 2:48:27 PM2/3/18
to Beancount
On Sat, Feb 3, 2018 at 12:58 PM, 'Patrick Ruckstuhl' via Beancount <bean...@googlegroups.com> wrote:
Just got an idea for the api import (always getting those ideas after asking about it on a mailing list).

Writing is thinking.
That's why if you do a PhD you should start writing the thesis early and not at the end...



 
I just create a dummy file for the importer, this file can actually even be the config with api key and so on for doing the import

It's just Python all the way down indeed

 


On Saturday, February 3, 2018 at 6:21:14 PM UTC+1, Patrick Ruckstuhl wrote:
Hi,

I'm looking at some specific cases for importers and wondering what the
best way to tackle this is:


-API based import

For some transactions there's a REST api I can use to fetch the
transactions.

So there is no file to use with the bean-extract framework.

I currently created a custom script which works but I would like to
unify the different importers if possible.


- Using prices in imports

For some imports I would like to enhance the transactions with prices
based on current/daily price, I'm currently fetching and storing prices
in beancount, so prices are available in the beancount file but I'm not
sure what the best way to hook this into the importer framework is


How do you deal with things like that? Do you ignore bean-extract tools
and just create independent scripts?


Regards,

Patrick


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Patrick Ruckstuhl

unread,
Feb 3, 2018, 3:01:27 PM2/3/18
to bean...@googlegroups.com
In my case there isn't a file at all, but like I wrote that will work fine with a dummy/config file which always stays in download folder

>- Using prices in imports
>>
>> For some imports I would like to enhance the transactions with prices
>> based on current/daily price, I'm currently fetching and storing
>prices
>> in beancount, so prices are available in the beancount file but I'm
>not
>> sure what the best way to hook this into the importer framework is
>>
>
>Fetching prices automatically /is/ OTOH intended to be automated.
>(Note that we're in a funny situation right now with both Yahoo and
>Google
>Finance APIs disabled.)
>
>These are two separate processes at the moment; run one, then the
>other.
>Concatenate to a file if you want to.

That's what I'm doing. What I'm looking is how to access the prices from the beancount file in the importer, do I have to parse/load the beancount file o my own?

>
>How do you deal with things like that? Do you ignore bean-extract tools
>> and just create independent scripts?
>>
>
>Yep.
>In any case you'll end up with a patchwork of scripts for part of the
>job,
>I just don't believe there's any way around it.
>You're welcome to try your hand at creating reusable code to do this,
>but
>IMO it's all over the place and would be very difficult.
>
>Another idea would be to reuse other people's work in automating
>fetching
>and converting to another common data format, e.g. Yodlee (costs money)
>or
>convert from a Mint download.
>
>

Yeah I see that. The concept of the common importer api is what makes sense for me. As this is then usable from fava and allows to avoid having to reimplement things like duplicate detection and auto discover of counter accounts
https://github.com/johannesjh/smart_importer
>
>> Regards,
>>
>> Patrick
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it,
>send an
>> email to beancount+...@googlegroups.com.

Martin Blais

unread,
Feb 3, 2018, 3:08:48 PM2/3/18
to Beancount
Oh I see, the question is that you'd like to download and trigger the importer from bean-extract.
(Sorry I hadn't understood initially.)
Hmmm, it's not really designed to do that indeed.
For that, you could just write a script that outputs new directives and call some of the functions of the ingest library.
Or even use two scripts: one for download which writes to a file in a trivial format (e.g. a pickle), and an importer that reads that.
The advantage of the latter solution is that you could invoke extract once and it will process that file along all the other "real" downloads.



>- Using prices in imports
>>
>> For some imports I would like to enhance the transactions with prices
>> based on current/daily price, I'm currently fetching and storing
>prices
>> in beancount, so prices are available in the beancount file but I'm
>not
>> sure what the best way to hook this into the importer framework is
>>
>
>Fetching prices automatically /is/ OTOH intended to be automated.
>(Note that we're in a funny situation right now with both Yahoo and
>Google
>Finance APIs disabled.)
>
>These are two separate processes at the moment; run one, then the
>other.
>Concatenate to a file if you want to.

That's what I'm doing. What I'm looking is how to access the prices from the beancount file in the importer, do I have to parse/load the beancount file o  my own?

Yes.
You'd call beancount.loader.load_file() on your existing file, and then build a price_map dict.
Grep for "price_map" in the source code, you'll find several examples of doing that.


 

>
>How do you deal with things like that? Do you ignore bean-extract tools
>> and just create independent scripts?
>>
>
>Yep.
>In any case you'll end up with a patchwork of scripts for part of the
>job,
>I just don't believe there's any way around it.
>You're welcome to try your hand at creating reusable code to do this,
>but
>IMO it's all over the place and would be very difficult.
>
>Another idea would be to reuse other people's work in automating
>fetching
>and converting to another common data format, e.g. Yodlee (costs money)
>or
>convert from a Mint download.
>
>

Yeah I see that. The concept of the common importer api is what makes sense for me. As this is then usable from fava and allows to avoid having to reimplement things like duplicate detection and auto discover of counter accounts
https://github.com/johannesjh/smart_importer

SGTM


 

>
>> Regards,
>>
>> Patrick
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it,
>send an

>> To post to this group, send email to bean...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/beancount/29e70aca-b985-5291-38fc-93c2d80956d4%40ch.tario.org.
>> For more options, visit https://groups.google.com/d/optout.
>>

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

Patrick Ruckstuhl

unread,
Feb 4, 2018, 7:23:22 AM2/4/18
to Beancount
>- Using prices in imports
>>
>> For some imports I would like to enhance the transactions with prices
>> based on current/daily price, I'm currently fetching and storing
>prices
>> in beancount, so prices are available in the beancount file but I'm
>not
>> sure what the best way to hook this into the importer framework is
>>
>
>Fetching prices automatically /is/ OTOH intended to be automated.
>(Note that we're in a funny situation right now with both Yahoo and
>Google
>Finance APIs disabled.)
>
>These are two separate processes at the moment; run one, then the
>other.
>Concatenate to a file if you want to.

That's what I'm doing. What I'm looking is how to access the prices from the beancount file in the importer, do I have to parse/load the beancount file o  my own?

Yes.
You'd call beancount.loader.load_file() on your existing file, and then build a price_map dict.
Grep for "price_map" in the source code, you'll find several examples of doing that.


I'm wondering if it would make sense to slightly enhance the ImporterProtocol. Right now bean-extract has already the ability to parse an existing beancount file and use this for duplicate detection.
Now if those entries could be forwarded to the importer it would open up some use cases such as:

* custom duplicate logic (e.g. let's say my import file has a unique identifier which I map to metadata on the transaction, if I now get the existing import entries, I can make sure to only import new transactions and either completely ignore duplicates or tag them with the __duplicate__ meta

* my use case where I need data (e.g. prices) from the existing beancount file to enhance the new entries

I think all that would be needed is to add existingEntries to the extract method:

importer.extract(file, existing_entries)


That way there is no additional parsing of beancount file needed and there is a clear way to define which file to parse (e.g. same way to do it when called from fava as well as from bean-extract)

Martin Blais

unread,
Feb 12, 2018, 12:32:53 AM2/12/18
to Beancount
That's an interesting idea. It's an easy change to make. 

+ Note that if we add the entries to the extractor, it opens up the possibility for the particulars of the extractor to depend on particulars of previously imported transactions. To paraphrase your example, if an extractor knows that its input file contains a unique transaction id column and it consistently attaches that as a "link" on the transaction, it can then use that fact to very reliably flag transactions as duplicates in the future by inspecting the link field of those transactions (assuming the user hasn't removed them in the text). That may be a good thing, because that kind of check may NOT be generalizable across different importers, unless we'd establish some sort of guarantee that some links represent globally unique identifiers. In a sense, the current method for flagging duplicates assumes that a general method for detecting duplicates - after fiddling and manual adjustments by the user - exists.

- On the downside, preventing access to the previous entries essentially decouples the duplicate detection method and the importer logic. This would force the duplicate logic to remain generic. The importer having access to the prior directives creates a logical dependency between it and the duplicate detection. I'm not sure we have to worry about that.

Given that the duplicate logic has been iffy ever since it existed, I think it's a reasonable thing to try. Let's do it and see what happens, if people start relying on it. To be fair, I think more work could simply be done on the duplicate logic to make it more resilient, but in the interest of flexibility, let's add this.

So here's the change:
- The Importer.extract() method now accepts a new parameters with the prior entries (or None, if not specified). It's free to use that as it pleases.
- The entries returned by Importer.extract() will be checked for __duplicate__ metadata and automatically inserted to that set if it is present. This allows the importer to return some duplicate entries for context - which will be rendered as such in the output, e.g. commented out - without having to necessarily throw them away.
- Current importer parameters are still supported as legacy (I really didn't want to break everyone's importers with this API change, so I inspect the signature).

Here:

Reply all
Reply to author
Forward
0 new messages