general CSV importer

1,582 views
Skip to first unread message

Saša Janiška

unread,
Oct 19, 2016, 4:19:47 AM10/19/16
to bean...@googlegroups.com
Hello,

afaik, (h)ledger have some general CSV importer where one can define
some mapping rules for the desired CSV format and then import?

CSV is also the only format I can use since the bank just provides Excel
file which can be converted to CSV.


Sincerely,
Gour

--
While contemplating the objects of the senses, a person
develops attachment for them, and from such attachment lust
develops, and from lust anger arises.

Martin Blais

unread,
Oct 21, 2016, 11:20:55 PM10/21/16
to Beancount
On Wed, Oct 19, 2016 at 4:19 AM, Saša Janiška <go...@atmarama.com> wrote:
Hello,

afaik, (h)ledger have some general CSV importer where one can define
some mapping rules for the desired CSV format and then import?

CSV is also the only format I can use since the bank just provides Excel
file which can be converted to CSV.

Beancount offers a framework to define your own custom importers, documented here:

Unfortunately, you have to write some code to set that up.

On the other hand, it also comes with a good example CSV importer which is functional:

You just need to instantiate it and provide it with a dict that tells it how to interpret each column of your CSV file.
(I should provide some example on how to do that, in the meantime you'd have to read the source code.)

HTH,




Sincerely,
Gour

--
While contemplating the objects of the senses, a person
develops attachment for them, and from such attachment lust
develops, and from lust anger arises.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/874m48rbsv.fsf%40atmarama.com.
For more options, visit https://groups.google.com/d/optout.

Saša Janiška

unread,
Oct 23, 2016, 11:40:23 AM10/23/16
to bean...@googlegroups.com
Martin Blais <bl...@furius.ca> writes:

> Beancount offers a framework to define your own custom importers,
> documented here:
> http://furius.ca/beancount/doc/ingest

Thank you.


> On the other hand, it also comes with a good example CSV importer
> which is functional:

Good.

> You just need to instantiate it and provide it with a dict that tells
> it how to interpret each column of your CSV file.
> (I should provide some example on how to do that, in the meantime
> you'd have to read the source code.)

OK. Will do. ;)


Sincerely,
Gour

--
One must deliver himself with the help of his mind, and not
degrade himself. The mind is the friend of the conditioned soul,
and his enemy as well.

jfs...@gmail.com

unread,
Oct 23, 2016, 9:01:43 PM10/23/16
to Beancount
Hi Martin,

Just started using beancount to get away from Quicken.  Your reasons for writing it and the documentation was my reason to give it a try.  It should also help my Python skills.  You sort of  have an example in csv_test.py.  It helped me determine what to put in my .import file.  Strangely, csv.py and csv_test.py were not in my install, ofx.py and others were.  I just copied from https://bitbucket.org/blais/beancount/src/b73da2b4f8e3182b1fb537a75385406c6d4fee90/src/python/beancount/ingest/importers/csv.py?at=booking&fileviewer=file-view-default and moved to the directory where ofx.py was and it worked, sort of.

I ran bean-identify, then the bean-extract as in the example, but I used a file, not a directory.  However, the only thing in the output file was the header and the path/filename.  No transactions made it into the file.  Maybe the regexp input?  I sort of guessed on that, I couldn't exactly see what to put in there except the headings.  Here is the import file I created.

#!/usr/bin/env python3
"""Example import configuration."""

# Insert our custom importers path here.
# (In practice you might just change your PYTHONPATH environment.)
import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))

from beancount.ingest import extract
from beancount.ingest.importers import ofx
from beancount.ingest.importers import csv

Col = csv.Col

# Setting this variable provides a list of importer instances.
CONFIG = [
    csv.Importer({Col.DATE: 'Posted Date',
                  Col.TXN_DATE: 'Transaction Date',
                  Col.NARRATION1: 'Card No.',
                  Col.NARRATION2: 'Description',
                  Col.NARRATION3: 'Catagory',
                  Col.AMOUNT_DEBIT: 'Debit',
                  Col.AMOUNT_CREDIT: 'Credit'},
                 'Liabilities:US:CapOne',
                 'USD',
                 ('Transaction Date,Posted Date,Card No.,Description,'
                     'Category,Debit,Credit'),
                 'CapOne')
]


# Override the header on extracted text (if desired).
extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'


Am I correct in thinking I would need a separate import file for each different type of csv?

Thanks

Jonathan


On Friday, October 21, 2016 at 11:20:55 PM UTC-4, Martin Blais wrote:
On Wed, Oct 19, 2016 at 4:19 AM, Saša Janiška <go...@atmarama.com> wrote:
Hello,

afaik, (h)ledger have some general CSV importer where one can define
some mapping rules for the desired CSV format and then import?

CSV is also the only format I can use since the bank just provides Excel
file which can be converted to CSV.

Beancount offers a framework to define your own custom importers, documented here:

Unfortunately, you have to write some code to set that up.

On the other hand, it also comes with a good example CSV importer which is functional:

You just need to instantiate it and provide it with a dict that tells it how to interpret each column of your CSV file.
(I should provide some example on how to do that, in the meantime you'd have to read the source code.)

HTH,




Sincerely,
Gour

--
While contemplating the objects of the senses, a person
develops attachment for them, and from such attachment lust
develops, and from lust anger arises.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

Martin Blais

unread,
Oct 24, 2016, 10:07:53 PM10/24/16
to Beancount
On Sun, Oct 23, 2016 at 9:01 PM, <jfs...@gmail.com> wrote:
Hi Martin,

Just started using beancount to get away from Quicken.  Your reasons for writing it and the documentation was my reason to give it a try.  It should also help my Python skills.  You sort of  have an example in csv_test.py.  It helped me determine what to put in my .import file.  Strangely, csv.py and csv_test.py were not in my install, ofx.py and others were.  I just copied from https://bitbucket.org/blais/beancount/src/b73da2b4f8e3182b1fb537a75385406c6d4fee90/src/python/beancount/ingest/importers/csv.py?at=booking&fileviewer=file-view-default and moved to the directory where ofx.py was and it worked, sort of.

Strange. I just tested this now and the csv.py module does appear to install.
Can you try running something like this?

  mkdir -p /tmp/b/lib/python3.5/site-packages

  PYTHONPATH=/tmp/b/lib/python3.5/site-packages:$PYTHONPATH python3 setup.py install --prefix=/tmp/b

  ls -l /tmp/b/lib/python3.5/site-packages/beancount-2.0b12-py3.5-macosx-10.11-x86_64.egg/beancount/ingest/importers

and share the output?


I ran bean-identify, then the bean-extract as in the example, but I used a file, not a directory.  However, the only thing in the output file was the header and the path/filename.  No transactions made it into the file.  Maybe the regexp input?  

Does bean-identify detect and associate your input file with the importer you created below?
(If it does, it would clearly print out a few meaningful lines about that.)
I suspect your regexp may not match the file.

 
I sort of guessed on that, I couldn't exactly see what to put in there except the headings.  Here is the import file I created.

#!/usr/bin/env python3
"""Example import configuration."""

# Insert our custom importers path here.
# (In practice you might just change your PYTHONPATH environment.)
import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))

from beancount.ingest import extract
from beancount.ingest.importers import ofx
from beancount.ingest.importers import csv

Col = csv.Col

# Setting this variable provides a list of importer instances.
CONFIG = [
    csv.Importer({Col.DATE: 'Posted Date',
                  Col.TXN_DATE: 'Transaction Date',
                  Col.NARRATION1: 'Card No.',
                  Col.NARRATION2: 'Description',
                  Col.NARRATION3: 'Catagory',

There's a typo here: Catagory -> Category
(I don't think it's what caused the problem though, just pointing it out.)
 
                  Col.AMOUNT_DEBIT: 'Debit',
                  Col.AMOUNT_CREDIT: 'Credit'},
                 'Liabilities:US:CapOne',
                 'USD',
                 ('Transaction Date,Posted Date,Card No.,Description,'
                     'Category,Debit,Credit'),
                 'CapOne')
]


# Override the header on extracted text (if desired).
extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'


Am I correct in thinking I would need a separate import file for each different type of csv?

I'm not sure what you mean by "a separate import file."

- The source code for all the importers you create like this could very well be in a one Python file, no problem with that.

- You will need to instantiate a separate importer for each different type of download, e.g. one for each source.

BTW, here's an example configuration that ought to work on a US CapitalOne CSV download:

from beancount.ingest import regression
from beancount.ingest.importers import csv

class Importer(csv.Importer):

    config = {csv.Col.DATE: 'Posted Date',
              csv.Col.TXN_DATE: 'Transaction Date',
              csv.Col.AMOUNT_DEBIT: 'Debit',
              csv.Col.AMOUNT_CREDIT: 'Credit',
              csv.Col.PAYEE: 'Description',
              csv.Col.NARRATION: 'Category'}

    def __init__(self, account):
        csv.Importer.__init__(
            self, self.config,
            account, 'USD',
            ('Stage, Transaction Date, Posted Date, Card No., Description, Category, '
             'Debit, Credit'),
            'capitalone')

    def get_description(self, row):
        payee, narration = super().get_description()
        narration = '{} ({})'.format(narration, row.category)
        return payee, narration


You can place this in another .py file and import that from your importer configuration source.


 
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

jfs...@gmail.com

unread,
Oct 28, 2016, 8:13:15 PM10/28/16
to Beancount
Hi Martin,

responses to your response below.


On Monday, October 24, 2016 at 10:07:53 PM UTC-4, Martin Blais wrote:
On Sun, Oct 23, 2016 at 9:01 PM, <jfs...@gmail.com> wrote:
Hi Martin,

Just started using beancount to get away from Quicken.  Your reasons for writing it and the documentation was my reason to give it a try.  It should also help my Python skills.  You sort of  have an example in csv_test.py.  It helped me determine what to put in my .import file.  Strangely, csv.py and csv_test.py were not in my install, ofx.py and others were.  I just copied from https://bitbucket.org/blais/beancount/src/b73da2b4f8e3182b1fb537a75385406c6d4fee90/src/python/beancount/ingest/importers/csv.py?at=booking&fileviewer=file-view-default and moved to the directory where ofx.py was and it worked, sort of.

Strange. I just tested this now and the csv.py module does appear to install.
Can you try running something like this?

  mkdir -p /tmp/b/lib/python3.5/site-packages

  PYTHONPATH=/tmp/b/lib/python3.5/site-packages:$PYTHONPATH python3 setup.py install --prefix=/tmp/b

  ls -l /tmp/b/lib/python3.5/site-packages/beancount-2.0b12-py3.5-macosx-10.11-x86_64.egg/beancount/ingest/importers

and share the output?


I tried the above, but it didn't work.  I broke up the second line, but still no go.  Output below:
Enter code here...JFSMini1:Beancount$ ll /tmp

lrwxr-xr-x@ 1 root  wheel    11B Aug 12 22:07 /tmp@ -> private/tmp

JFSMini1:Beancount$ mkdir -p /tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ PYTHONPATH=/tmp/b/lib/python3.5/site-packages:$PYTHONPATH python3 setup.py install --prefix=/tmp/b

/Library/Frameworks/Python.framework/Versions/3.5/Resources/Python.app/Contents/MacOS/Python: can't open file 'setup.py': [Errno 2] No such file or directory

JFSMini1:Beancount$ echo $PYTHONPATH


JFSMini1:Beancount$ PYTHONPATH=/tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ echo $PYTHONPATH

/tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ echo $PYTHONPATH python3 setup.py install --prefix=/tmp/b

/tmp/b/lib/python3.5/site-packages python3 setup.py install --prefix=/tmp/b

JFSMini1:Beancount$ ls /tmp/b/lib/python3.5/site-packages/beancount

ls: /tmp/b/lib/python3.5/site-packages/beancount: No such file or directory

JFSMini1:Beancount$ ls /tmp/b/lib/python3.5/site-packages/

JFSMini1:Beancount$ 


I forgot to state this was installed on my MAC Mini running OS X 10.11.6, El Capitan.  I then installed on my Macbook Pro running OS X 10.10.5, Yosemite.  csv.py and csv_test.py were there.  It must have been something with that install.

 

I ran bean-identify, then the bean-extract as in the example, but I used a file, not a directory.  However, the only thing in the output file was the header and the path/filename.  No transactions made it into the file.  Maybe the regexp input?  

Does bean-identify detect and associate your input file with the importer you created below?
(If it does, it would clearly print out a few meaningful lines about that.)
I suspect your regexp may not match the file.

Since bean-identify does print out  

**** /Users/jonathan/Documents/Beancount/20161018CapOne.csv

I think it does associate it, but I must be missing something.

 
 
I sort of guessed on that, I couldn't exactly see what to put in there except the headings.  Here is the import file I created.

#!/usr/bin/env python3
"""Example import configuration."""

# Insert our custom importers path here.
# (In practice you might just change your PYTHONPATH environment.)
import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))

from beancount.ingest import extract
from beancount.ingest.importers import ofx
from beancount.ingest.importers import csv

Col = csv.Col

# Setting this variable provides a list of importer instances.
CONFIG = [
    csv.Importer({Col.DATE: 'Posted Date',
                  Col.TXN_DATE: 'Transaction Date',
                  Col.NARRATION1: 'Card No.',
                  Col.NARRATION2: 'Description',
                  Col.NARRATION3: 'Catagory',

There's a typo here: Catagory -> Category
(I don't think it's what caused the problem though, just pointing it out.)

No, that wasn't the problem, but thanks for finding that.
 
 
                  Col.AMOUNT_DEBIT: 'Debit',
                  Col.AMOUNT_CREDIT: 'Credit'},
                 'Liabilities:US:CapOne',
                 'USD',
                 ('Transaction Date,Posted Date,Card No.,Description,'
                     'Category,Debit,Credit'),
                 'CapOne')
]


# Override the header on extracted text (if desired).
extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'


Am I correct in thinking I would need a separate import file for each different type of csv?

I'm not sure what you mean by "a separate import file."

I think we are using different terminology, and I missed the 'dot' in front of the import.  I call the example.import the import file, I think you are calling them the config file. 

- The source code for all the importers you create like this could very well be in a one Python file, no problem with that.

- You will need to instantiate a separate importer for each different type of download, e.g. one for each source.

BTW, here's an example configuration that ought to work on a US CapitalOne CSV download:

from beancount.ingest import regression
from beancount.ingest.importers import csv

class Importer(csv.Importer):

    config = {csv.Col.DATE: 'Posted Date',
              csv.Col.TXN_DATE: 'Transaction Date',
              csv.Col.AMOUNT_DEBIT: 'Debit',
              csv.Col.AMOUNT_CREDIT: 'Credit',
              csv.Col.PAYEE: 'Description',
              csv.Col.NARRATION: 'Category'}

    def __init__(self, account):
        csv.Importer.__init__(
            self, self.config,
            account, 'USD',
            ('Stage, Transaction Date, Posted Date, Card No., Description, Category, '
             'Debit, Credit'),
            'capitalone')

    def get_description(self, row):
        payee, narration = super().get_description()
        narration = '{} ({})'.format(narration, row.category)
        return payee, narration

You can place this in another .py file and import that from your importer configuration source.

I put the above code in a file called capitalone.import, and incurred the following error:

JFSMini1:Beancount$ bean-identify capitalone.import 20161018CapOne.csv


Traceback (most recent call last):


  File "/usr/local/bin/bean-identify", line 2, in <module>


    from beancount.ingest.identify import main; main()


  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/beancount/ingest/identify.py", line 92, in main


    _, config, downloads_directories = scripts_utils.parse_arguments(parser)


  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/beancount/ingest/scripts_utils.py", line 56, in parse_arguments


    config = mod['CONFIG']


KeyError: 'CONFIG'


Should I have put the code in a capitaloneimport.py and then made a separate capone.import?  I have been looking at your code and reading some docs, but your Python coding is a lot more sophisticated than I am used to.  It is a good learning experience though.
 

 

Martin Blais

unread,
Oct 29, 2016, 9:09:17 AM10/29/16
to Beancount
On Fri, Oct 28, 2016 at 8:13 PM, <jfs...@gmail.com> wrote:
Hi Martin,

responses to your response below.

On Monday, October 24, 2016 at 10:07:53 PM UTC-4, Martin Blais wrote:
On Sun, Oct 23, 2016 at 9:01 PM, <jfs...@gmail.com> wrote:
Hi Martin,

Just started using beancount to get away from Quicken.  Your reasons for writing it and the documentation was my reason to give it a try.  It should also help my Python skills.  You sort of  have an example in csv_test.py.  It helped me determine what to put in my .import file.  Strangely, csv.py and csv_test.py were not in my install, ofx.py and others were.  I just copied from https://bitbucket.org/blais/beancount/src/b73da2b4f8e3182b1fb537a75385406c6d4fee90/src/python/beancount/ingest/importers/csv.py?at=booking&fileviewer=file-view-default and moved to the directory where ofx.py was and it worked, sort of.

Strange. I just tested this now and the csv.py module does appear to install.
Can you try running something like this?

  mkdir -p /tmp/b/lib/python3.5/site-packages

  PYTHONPATH=/tmp/b/lib/python3.5/site-packages:$PYTHONPATH python3 setup.py install --prefix=/tmp/b

  ls -l /tmp/b/lib/python3.5/site-packages/beancount-2.0b12-py3.5-macosx-10.11-x86_64.egg/beancount/ingest/importers

and share the output?


I tried the above, but it didn't work.  I broke up the second line, but still no go.  Output below:
Enter code here...JFSMini1:Beancount$ ll /tmp

lrwxr-xr-x@ 1 root  wheel    11B Aug 12 22:07 /tmp@ -> private/tmp

JFSMini1:Beancount$ mkdir -p /tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ PYTHONPATH=/tmp/b/lib/python3.5/site-packages:$PYTHONPATH python3 setup.py install --prefix=/tmp/b

/Library/Frameworks/Python.framework/Versions/3.5/Resources/Python.app/Contents/MacOS/Python: can't open file 'setup.py': [Errno 2] No such file or directory


You have to be in the beancount directory in order to run this command, that's where setup.py lives.


 

JFSMini1:Beancount$ echo $PYTHONPATH


JFSMini1:Beancount$ PYTHONPATH=/tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ echo $PYTHONPATH

/tmp/b/lib/python3.5/site-packages

JFSMini1:Beancount$ echo $PYTHONPATH python3 setup.py install --prefix=/tmp/b

/tmp/b/lib/python3.5/site-packages python3 setup.py install --prefix=/tmp/b

JFSMini1:Beancount$ ls /tmp/b/lib/python3.5/site-packages/beancount

ls: /tmp/b/lib/python3.5/site-packages/beancount: No such file or directory

JFSMini1:Beancount$ ls /tmp/b/lib/python3.5/site-packages/

JFSMini1:Beancount$ 


I forgot to state this was installed on my MAC Mini running OS X 10.11.6, El Capitan.  I then installed on my Macbook Pro running OS X 10.10.5, Yosemite.  csv.py and csv_test.py were there.  It must have been something with that install.

 

I ran bean-identify, then the bean-extract as in the example, but I used a file, not a directory.  However, the only thing in the output file was the header and the path/filename.  No transactions made it into the file.  Maybe the regexp input?  

Does bean-identify detect and associate your input file with the importer you created below?
(If it does, it would clearly print out a few meaningful lines about that.)
I suspect your regexp may not match the file.

Since bean-identify does print out  

**** /Users/jonathan/Documents/Beancount/20161018CapOne.csv

I think it does associate it, but I must be missing something.

This output just means it saw and file.
If there's nothing below it (e.g., an importer name with a few more lines) it did not associate an importer with this filename, in other words, it did not detect the file contents as being for your importer.
This is the problem.
No, you can put all your importers in the same Python file if you like.
The main reason to put them in their own files is that if you have a lot of them, that would make a large file. But you can. It's really just some Python code, nothing really special other than the tools evaluate it and grab the special CONFIG attribute, that's it.
It doesn't matter how you structure the files.
All that matters is that the .import Python file provide a CONFIG global variable which consists in a list of objects which implement Importer.


 
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

jfs...@gmail.com

unread,
Oct 29, 2016, 10:30:40 PM10/29/16
to Beancount
Hi Martin,

I finally got it to work.  I put the example you gave in an importers/capone/__init__.py file (the one that creates a subclass of the csv.Importer) and then created the capone.import which imports the capone with the CONFIG global variable set to capone.Importer('Liabilities:US:CapOne').  When I just put your example in a capone.import it was missing the CONFIG global variable, hence the error.  

Starting to understand a little bit.  Thanks for the help.

BTW, I have maybe 10 years of Quicken data that I exported to qif and qfx files that I would like to pull into Beancount someday.  So I wouldn't mind helping to debug and/or write some of that code.

Jonathan

Martin Blais

unread,
Oct 29, 2016, 10:35:29 PM10/29/16
to Beancount
On Sat, Oct 29, 2016 at 10:30 PM, <jfs...@gmail.com> wrote:
Hi Martin,

I finally got it to work.  I put the example you gave in an importers/capone/__init__.py file (the one that creates a subclass of the csv.Importer) and then created the capone.import which imports the capone with the CONFIG global variable set to capone.Importer('Liabilities:US:CapOne').  When I just put your example in a capone.import it was missing the CONFIG global variable, hence the error.  

Starting to understand a little bit.  Thanks for the help.

Awesome! Glad you got it to work.


BTW, I have maybe 10 years of Quicken data that I exported to qif and qfx files that I would like to pull into Beancount someday.  So I wouldn't mind helping to debug and/or write some of that code.

If you're going to use QIF, I have a fork of qifparse that fixes some of its bugs here:
(I have a pull request that is being ignored, so just use my fork.)

If you're going to use qfx, try the simplistic OFX importer from beancount.ingest.importers.ofx.
It will probably be good enough.

Good luck, let us know how it goes, I think there are other people trying to do the same thing.




To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages