Please help me start with a CSV importer

171 views
Skip to first unread message

viq

unread,
Apr 7, 2018, 10:27:29 AM4/7/18
to Beancount
First, apologies, I'm very new to beancount, and only starting with python. Right now I'm trying to just have a basic import working, and then will try to enrich it. I've tried looking through documentation and mailing lists, and this is as far as I managed to go with it.

"config" file test.config.py:
#!/usr/bin/env python3


'''
Configuration file for extracting MyBank data
'''

import sys
from os import path

# from beancount.ingest import extract
# from beancount.ingest.importers import csv

# from .importers import mybank
import importers.mybank

sys.path.insert(0, path.join(path.dirname(__file__)))

# from beancount.plugins import auto_accounts


CONFIG = [importers.mybank.Importer('Assets:Currency:MyBank')]


importer importers/mybank/__init__.py:
#!/usr/bin/env python3

'''
Configuration file for extracting MyBank data
'''

from beancount.ingest import regression
from beancount.ingest.importers import csv

from beancount.plugins import auto_accounts


class Importer(csv.Importer):

    config = {csv.Col.DATE: 'Posted Date',
              csv.Col.TXN_DATE: 'Transaction Date',
              csv.Col.NARRATION1: 'Description',
              csv.Col.NARRATION2: 'Payee',
              csv.Col.NARRATION3: 'Payee account',
              csv.Col.AMOUNT: 'Amount',
              csv.Col.BALANCE: 'Balance'}

    def __init__(self, account):
        csv.Importer.__init__(
            self, self.config,
            account, 'Currency',
            ('Transacton Date,Posted Date,Description,Payee,'
             'Payee account,Amount,Balance'),
            1)

    def get_description(self, row):
        payee, narration = super().get_description()
        narration = '{} ({})'.format(narration, row.category)
        return payee, narration


Yes, importers/__init__.py exists and is empty

Trying to run bean-identify (from the directory where directory with importers is) with this is very unhappy with me:
$ bean-identify test.config.py bank_history_sample.csv 
Traceback (most recent call last):
  File "/usr/bin/bean-identify", line 4, in <module>
    from beancount.ingest.identify import main; main()
  File "/usr/lib/python3.6/site-packages/beancount/ingest/identify.py", line 93, in main
    _, config, downloads_directories = scripts_utils.parse_arguments(parser)
  File "/usr/lib/python3.6/site-packages/beancount/ingest/scripts_utils.py", line 56, in parse_arguments
    mod = runpy.run_path(args.config)
  File "/usr/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "test.config.py", line 15, in <module>
    import importers.mybank
ModuleNotFoundError: No module named 'importers'


What am I doing wrong and how should I proceed from here?

Michael Droogleever

unread,
Apr 7, 2018, 7:55:42 PM4/7/18
to Beancount
I think it is this, but I have not tried running your code:
 
sys.path.insert(0, path.join(path.dirname(__file__))) 

is what adds the path to your importers package, to the system path. So the

import importers.mybank

Needs to be after it. It is admittedly not correct python style, as imports should always come first, but it is a workaround beancount uses to be able to read your custom code.

Good luck.

viq

unread,
Apr 8, 2018, 3:17:51 AM4/8/18
to Beancount


On Sunday, April 8, 2018 at 1:55:42 AM UTC+2, Michael Droogleever wrote:
I think it is this, but I have not tried running your code:
 
sys.path.insert(0, path.join(path.dirname(__file__))) 

is what adds the path to your importers package, to the system path. So the

import importers.mybank

Needs to be after it. It is admittedly not correct python style, as imports should always come first, but it is a workaround beancount uses to be able to read your custom code.

Good luck.

Indeed, with that change it doesn't complain about imports now, thank you! Apparently the same can be achieved by setting PYTHONPATH.

Now to figure out why nothing happens when I run bean-extract 

Martin Blais

unread,
Apr 8, 2018, 4:46:30 AM4/8/18
to Beancount
Yes.

I think we talked about adding the importer file's location automatically to PYTHONPATH, here:
Not done yet

 

Now to figure out why nothing happens when I run bean-extract 

Run bean-identify first. If it doesn't list the file, it hasn't identified it as  something the importer can extract from.

It uses the regexps you provide to match against the contents of the CSV file.
I see a typo in your code, which probably explains why it doesn't match it:
 
Transacton Date

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/32737949-89a6-4fb2-80ed-ba8189931e6e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

viq

unread,
Apr 8, 2018, 9:29:41 AM4/8/18
to Beancount
On Sunday, April 8, 2018 at 10:46:30 AM UTC+2, Martin Blais wrote:
On Sun, Apr 8, 2018 at 3:17 AM, viq <vic...@gmail.com> wrote:


On Sunday, April 8, 2018 at 1:55:42 AM UTC+2, Michael Droogleever wrote:
I think it is this, but I have not tried running your code:
 
sys.path.insert(0, path.join(path.dirname(__file__))) 

is what adds the path to your importers package, to the system path. So the

import importers.mybank

Needs to be after it. It is admittedly not correct python style, as imports should always come first, but it is a workaround beancount uses to be able to read your custom code. 

Good luck.

Indeed, with that change it doesn't complain about imports now, thank you! Apparently the same can be achieved by setting PYTHONPATH.

Yes.

I think we talked about adding the importer file's location automatically to PYTHONPATH, here:
Not done yet

 

Now to figure out why nothing happens when I run bean-extract 

Run bean-identify first. If it doesn't list the file, it hasn't identified it as  something the importer can extract from.

It uses the regexps you provide to match against the contents of the CSV file.
I see a typo in your code, which probably explains why it doesn't match it:
 
Transacton Date

Thanks, fixed, didn't seem to help though. bean-identify seems to work fine:
$ bean-identify test.config.py bank_history_sample.csv
**** /home/viq/Work/Own/beancount/bank_history_sample.csv
yet bean-extract does nothing:
$ bean-extract test.config.py bank_history_sample.csv
;; -*- mode: beancount -*-
**** /home/viq/Work/Own/beancount/bank_history_sample.csv

viq

unread,
Apr 8, 2018, 9:39:02 AM4/8/18
to Beancount
Although pylint complains here:
pylint: [no-member] Super of 'Importer' has no 'get_description' member [E1101] (E)
but I don't know what to make of it.

Martin Blais

unread,
Apr 8, 2018, 1:40:50 PM4/8/18
to Beancount
- bean-identity saw your file.
- It didn't match any of the configured importers (there's no section printed underneath it, if it matches it would print the importer and deduced account name, date and to-be-renamed location)
Something to do with your regexp, it probably still doesn't match the header of your file.



--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

viq

unread,
Apr 8, 2018, 4:46:12 PM4/8/18
to Beancount
On Sunday, April 8, 2018 at 7:40:50 PM UTC+2, Martin Blais wrote:
- bean-identity saw your file.
- It didn't match any of the configured importers (there's no section printed underneath it, if it matches it would print the importer and deduced account name, date and to-be-renamed location)
Something to do with your regexp, it probably still doesn't match the header of your file.

Sorry, where does the regexp come from (since what I'm putting in the "config file" doesn't look like regexp at all), and what is it to match and where?

Martin Blais

unread,
Apr 8, 2018, 5:02:14 PM4/8/18
to Beancount
The CSV importer that is provided puts together a regexp from the list of fields you provide.
This regexp is run against the contents of the files it is scanning, in order to identify whether a particular importer should be able to process a file.

The regexp is built from the fields you provide, I think, for this importer, it's in the source code.
You'll have to open up the source code to debug this, but I suspect your list of fields may not exactly match that from your files.
(There ought to be better tools for debugging this, but because setting up the import config usually requires coding, people just override methods and put prints and debug it that way.)





--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

viq

unread,
Apr 22, 2018, 3:46:17 PM4/22/18
to bean...@googlegroups.com
On 18-04-08 17:01:48, Martin Blais wrote:
> The CSV importer that is provided puts together a regexp from the list of
> fields you provide.
> This regexp is run against the contents of the files it is scanning, in
> order to identify whether a particular importer should be able to process a
> file.
>
> The regexp is built from the fields you provide, I think, for this
> importer, it's in the source code.
> You'll have to open up the source code to debug this, but I suspect your
> list of fields may not exactly match that from your files.
> (There ought to be better tools for debugging this, but because setting up
> the import config usually requires coding, people just override methods and
> put prints and debug it that way.)

Sorry for long delay, I finally tried looking into this again.
It seems like I'm severely misunderstanding something here. My importer
init now looks like this:

def __init__(self, account):
csv.Importer.__init__(
self, self.config,
account, 'Currency',
('Transaction Date,Posted Date,Description,Payee,'
'Payee account,Amount,Balance'),
1)
print("REGEXPS: {}".format(self.regexps))

but what I'm seeing, makes no sense, I don't expect it to match anything
ever, unless I'm misunderstanding how this works:
$ PYTHONPATH=`pwd` bean-identify test.config.py bank_history_sample.csv
REGEXPS: [('Transaction Date,Posted Date,Description,Payee,Payee account,Amount,Balance', re.compile('Transaction Date,Posted Date,Description,Payee,Payee account,Amount,Balance', re.IGNORECASE|re.MULTILINE|re.DOTALL))]
**** /home/viq/Work/Own/beancount/bank_history_sample.csv

Martin Blais

unread,
Apr 22, 2018, 10:12:16 PM4/22/18
to Beancount
I don't understand your question.
If you can post your example files, we can debug it.
If the regexp you doesn't match anything in your files, if won't match the importer.
The matching code is here, add some printfs there:




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages