Customize get_amounts function in csv importer

71 views
Skip to first unread message

walle...@gmail.com

unread,
May 9, 2020, 3:11:59 PM5/9/20
to Beancount
Hello,

I am working on csv importers by subclassing the provided csv class and wrapping __init__ to setup config, etc. It worked well, but some of my csv files have a different numbering format. The ones I run into are: debit shown as negative number, and some gives me 0 for debit when it's a credit so that I have an additional line of 0 amount.

These are all easy fixes, through monkey patching the get_amounts (with wrapping old) function in the csv module. The problem is that when I have to patch it twice for the above 2 situations, the patches spill over and change the outcomes depending on how I import my custom importers in xxx.import file.

I can pass the test for each individual importer, but not when I run all tests.

I can surely just copy and change the original csv module to move get_amounts as a class method. But I just want to check if I am doing it right.

Thanks for any suggestion.

W.E.

brodie.b...@gmail.com

unread,
May 16, 2020, 10:00:44 PM5/16/20
to Beancount
I recently encountered a very similar situation. I have transaction statements where values in the debit column are negative, but credit values are positive-valued. I am also subclassing the built-in csv importer class.   

At first I thought about using a decorator to wrap the built-in `get_amounts()` function. But the values in rows passed to `get_amounts()` are strings. Stripping the `-` characters from these strings worked alright but didn't feel too clean (I guess transactions parsing is anything but clean!). Or maybe negating the debit values returned from the function.

To make matters worse, I found that exporting CSV values from my bank's website sometimes had negative values, but other times they are positive -- for the same transactions!

In the end I just decided to reimplement `get_values()` instead of decorating it. That resulted in a more general solution that handles the shitty situation above. I just negate the absolute value of debit values. I then monkey patch the function in the CSV module.

If you haven't got a good handle on how Python namespaces work, you may get tripped up by monkey patching. Spend a bit of time getting familiar with it. You'll change the behaviour of your other importers depending on the CSV module if you're not careful.

def get_amounts(iconfig, row):
    """Get the amount columns of a row.

    This is based on the original function in the built-in CSV importer module.
    In the original function, debit amounts are negated before returning.

    If you export transactions from the transaction listing screen, debit values
    are positive. But if you export transactions from the transaction search
    screen, the very same debit values are negative. How bizarre!

    This is handled by negating the absolute value of debit values.

    Credit values are positive regardless of where the CSV export came from.

    Sometimes transactions are pending (authorisation only). In this case,
    the "Transaction Type" column is empty. We use this to ignore these
    transactions: None is returned for both debit and credit.

    The `allow_zero_amounts` argument and corresponding logic has been removed
    from the original function.

    Args:
      iconfig: A dict of Col to row index.
      row: A row array containing the values of the given row.
    Returns:
      A pair of (debit-amount, credit-amount), both of which are either an
      instance of Decimal or None, or not available.
    """
    # If transaction type is not populated, the transaction is pending. Return
    # None instead of actual values.
    if not row[iconfig[csv.Col.DRCR]]:
        return (None, None)
    debit, credit = None, None
    if csv.Col.AMOUNT in iconfig:
        credit = row[iconfig[csv.Col.AMOUNT]]
    else:
        debit, credit = [row[iconfig[col]] if col in iconfig else None
                         for col in [csv.Col.AMOUNT_DEBIT, csv.Col.AMOUNT_CREDIT]]

    # Take the absolute debit value, then negate it.
    return (-abs(csv.D(debit)) if debit else None,
            csv.D(credit) if credit else None)


# Monkey patch function in csv module.
csv.get_amounts = get_amounts



brodie.b...@gmail.com

unread,
May 17, 2020, 4:27:07 AM5/17/20
to Beancount
Posting the previous made me think about this some more. 

I think this is more concise and generalised:

def normalise(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        """Makes all debits negative, and all credits positive."""
        debit, credit = func(*args, **kwargs)
        return (
            -abs(debit) if debit else debit,
            abs(credit) if credit else credit
        )
    return wrapper


# Monkey patch function in csv module.
csv.get_amounts = normalise(csv.get_amounts)

Martin Michlmayr

unread,
May 17, 2020, 5:38:04 AM5/17/20
to bean...@googlegroups.com
Note that the cvs importer in the Mercurial repo has an invert_sign
option now, which might do what you need.

invert_sign: Optional[bool] = False,

invert_sign: If true, invert the amount's sign unconditionally.


* brodie.b...@gmail.com <brodie.b...@gmail.com> [2020-05-17 01:27]:
> --
> You received this message because you are subscribed to the Google Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/2bf01ddf-c99c-4507-8fc2-034ed07deba2%40googlegroups.com.


--
Martin Michlmayr
https://www.cyrius.com/

walle...@gmail.com

unread,
May 19, 2020, 1:12:30 PM5/19/20
to Beancount
Thanks for the pointer. But that option will invert signs for both debit and credit numbers. But I only need to invert debit sign.

W.E.


On Sunday, May 17, 2020 at 5:38:04 AM UTC-4, Martin Michlmayr wrote:
Note that the cvs importer in the Mercurial repo has an invert_sign
option now, which might do what you need.

                 invert_sign: Optional[bool] = False,

          invert_sign: If true, invert the amount's sign unconditionally.


> To unsubscribe from this group and stop receiving emails from it, send an email to bean...@googlegroups.com.

walle...@gmail.com

unread,
May 19, 2020, 1:19:17 PM5/19/20
to Beancount


Thanks for the code. I was doing essentially the same thing as yours. But as you mentioned, I did get tripped by the patching.

I have two different bank statements needing two different patches. And one importer indeed changes the behavior of the other. Any suggestion how to do it properly? Thanks!

W.E.

On Saturday, May 16, 2020 at 10:00:44 PM UTC-4, brodie....@gmail.com wrote

brodie.b...@gmail.com

unread,
May 19, 2020, 7:17:23 PM5/19/20
to Beancount
I would create an importer module for your institution, in say importers/institution/csv.py. Make sure you indicate these directories as Python packages with appropriate __init__ files.

In that module import the built-in csv import module: 
from beancount.ingest.importers import csv

Subclass the Importer defined in that module:
class Importer(csv.Importer):
Define your column mapping config, decorators and other jazz. Monkey patch functions in the csv module. In my case I chain a couple of decorators: 
csv.get_amounts = normalise(ignore_pending(csv.get_amounts))
Import and set config for your new importer class in your importer config module.

This won't trample the get_amounts function "seen" by other importers.

walle...@gmail.com

unread,
May 21, 2020, 10:48:24 AM5/21/20
to Beancount
Thank you very much for the codes. It worked. I tested on two sample csv files from two banks with different patches. No interference between them.

Just one last nag. The pytest won't pass when I let it tests all importers. It is obvious one patch is affecting the other. Pytest can pass if I specify the importer subfolder like this and both will pass:
pytest -v importers/bank1
pytest
-v importers/bank2


But it will fail if test them all
pytest -v importers


Any pointer for this? Thanks!
Reply all
Reply to author
Forward
0 new messages