Anyone got some sort of python headless browser script to fetch ofx file from their bank website?

628 views
Skip to first unread message

Jacques Gagnon

unread,
Nov 3, 2015, 8:09:03 AM11/3/15
to Beancount
Hi,

Login to all my banks account to fetch the ofx files annoy me a lot.
I'm from Canada so their is no such thing as "Direct OFX". So I have to do some web scrapping here.

I want something that do not involve me storing my password either locally or on a 3rd party (for obvious reason).

I started working on a python script with MechanicalSoup (I might switch to RoboBrowser however) that just prompt at the terminal for filing the forms etc..  but at some point javascript get in my way.

Here a list of all headless browser:

Anyone already got something working that could be share?
I'm mostly interested in example on how people are dealing with web redirect and javascript to get my own script working. :)

Eventually it would be nice to keep a repo with such script for different banks. This would complement LedgerHub nicely.



Daniel Clemente

unread,
Nov 3, 2015, 9:27:04 AM11/3/15
to bean...@googlegroups.com
El Tue, 3 Nov 2015 05:09:03 -0800 (PST) Jacques Gagnon va escriure:
>
> I want something that do not involve me storing my password either locally or on a 3rd party (for
> obvious reason).
>
> I started working on a python script with MechanicalSoup (I might switch to RoboBrowser however)
> that just prompt at the terminal for filing the forms etc..  but at some point javascript get in
> my way.
>

If the elegant solution gets too complex… I offer you the hacky one which you can build in half an hour:
1. I manually access my bank website from time to time (you're probably doing it already)
2. I select all text from the latest transactions (e.g. Ctrl-A) and copy it to the clipboard
3. I paste it into a Python script which parses the text, discards headers and gets only the transactions and prices, then outputs beancount code that can be pasted in a beancount file. I discard duplicate code manually.

I use it for my banks, which don't even have OFX.

Jacques Gagnon

unread,
Nov 3, 2015, 9:42:51 AM11/3/15
to Beancount
Yeah that wouldn't work for me, my main objective is to avoid using a browser.

That way I could connect to my machine at home via ssh and run the script to update my transactions easily.

Dave Stephens

unread,
Nov 3, 2015, 10:34:42 AM11/3/15
to Beancount
I use PhantomJS.  I haven't set it up for banking, but I use it for other headless webscraping where java script becomes a problem.  

Here's some code that sets up wither a headless or headed (for testing) browser.


from selenium import webdriver

def get_data(issuer, months, to_date=None, headless=True):
    #Go to website
    if headless:
        webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.settings.userAgent']=\
              "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "\
                   "(KHTML, like Gecko) Chrome/15.0.87"
        browser = webdriver.PhantomJS()
        # browser.get('http://httpbin.org/headers')
        # print(browser.page_source)
    else:
        firefox_profile = webdriver.FirefoxProfile()
        firefox_profile.set_preference("javascript.enabled", False)
        browser = webdriver.Firefox(firefox_profile)

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/c91aed44-e5a2-4972-b9dc-4bfd091da4a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Blais

unread,
Nov 3, 2015, 10:46:32 AM11/3/15
to Beancount
On Tue, Nov 3, 2015 at 8:09 AM, Jacques Gagnon <darth...@gmail.com> wrote:
Hi,

Login to all my banks account to fetch the ofx files annoy me a lot.
I'm from Canada so their is no such thing as "Direct OFX". So I have to do some web scrapping here.

I want something that do not involve me storing my password either locally or on a 3rd party (for obvious reason).

I started working on a python script with MechanicalSoup (I might switch to RoboBrowser however) that just prompt at the terminal for filing the forms etc..  but at some point javascript get in my way.

Here a list of all headless browser:

Anyone already got something working that could be share?
I'm mostly interested in example on how people are dealing with web redirect and javascript to get my own script working. :)

I'm still logging in manually and clicking buttons to download files, I haven't bothered tackling this problem yet.
I'd also love to automate downloads eventually.

 

Eventually it would be nice to keep a repo with such script for different banks. This would complement LedgerHub nicely.

I agree, though based on the participation rate on LedgerHub, it's more likely that the amount of shared code will be low. Too many institutions and not enough intersection in a very small community of users. There's just not a lot of overlap.


Dave Stephens

unread,
Nov 3, 2015, 10:58:04 AM11/3/15
to Beancount
My bank (TD Canada Trust) leaves the checkboxes for the accounts I want to download checked.  So all I have to do is hit download when I log in.  Then I use ledgerhub to convert the data to beancount format and append to my beancount file using an alias.  I also have an alias to file the file using ledger-hub.

So total process is:
login to website
click download all accounts
in terminal:
lef
lf
Then open the beancount file and remove duplicates / allocate to specific accounts.

I'd like to set up a script to automatically allocate accounts for payees that have been used before. 

lef: aliased to ledgerhub-extract -l ~/.beancount/[beancount file] ~/.beancount/[import file] ~/Downloads >> ~/.beancount/[beancount file]
lf: aliased to ledgerhub-file ~/.beancount/[import file] ~/Downloads ~/.beancount/Documents


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Stefano Zacchiroli

unread,
Nov 3, 2015, 11:03:17 AM11/3/15
to bean...@googlegroups.com
On Tue, Nov 03, 2015 at 05:09:03AM -0800, Jacques Gagnon wrote:
> Eventually it would be nice to keep a repo with such script for different
> banks. This would complement LedgerHub nicely.

The banking module of the weboob project http://weboob.org/ does exactly
that. Two caveats:

1) the community is mostly french ATM, so you've a lot of coverage for
french banks, and less so for other countries

2) I *think* it currently fails your requirement of not having to store
the password locally (at least I use it with a stored local password)

Cheers.
--
Stefano Zacchiroli . . . . . . . za...@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader . . . . . @zacchiro . . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

Simon Michael

unread,
Nov 3, 2015, 11:03:44 AM11/3/15
to bean...@googlegroups.com
https://gitlab.com/egh/ledger-autosync is pretty good, and/or the
ofxclient tool it uses.

Jacques Gagnon

unread,
Nov 3, 2015, 11:46:26 AM11/3/15
to Beancount
ledger-autosync is for bank who support the ofx protocol, 
we dont have that in Canada, only so random website that once logged in you can download the ofx file

I'll take a look a weboob or PhantomJS
I wanted to stay around python, but I'll use what it take and adapt it to my need.

We might not end up with script for all banks, but at least with a couple examples someone would be able hack something for its own need.

Simon Michael

unread,
Nov 3, 2015, 4:24:34 PM11/3/15
to bean...@googlegroups.com
ledger-autosync will also process an OFX file you downloaded yourself.

Dave Stephens

unread,
Nov 5, 2015, 10:40:52 PM11/5/15
to bean...@googlegroups.com
Does ledger-autosync work with Beancount? 

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To post to this group, send email to bean...@googlegroups.com.

Martin Blais

unread,
Nov 6, 2015, 8:41:03 AM11/6/15
to Beancount
No idea, but the portion that pulls the file down should be reusable


Jacques Gagnon

unread,
Nov 6, 2015, 8:48:18 AM11/6/15
to bean...@googlegroups.com
Btw I come across this for bank of america:

Haven't got time yet to adapt it for my needs.

But I don't like the fact it use a dead project (CasperJS)

Also the download part is kind of a hack since PhantomJS dont have native support for download. (Download without direct link, ie resulting from a POST)
They got experimental support for download in a branch however, but it never got in main line and it date from 2013 so probably wont happen.

Nightmare look like the most future proof project to use, no download support yet, but at least the headless browser they use Electron (base on chromium) do have download support.

Jacques Gagnon
GTalk/E-Mail: darth...@gmail.com
WLM (MSN): clou...@msn.com

nicolas...@gmail.com

unread,
Apr 12, 2018, 6:12:11 PM4/12/18
to Beancount
Hello Jacques,

I read your messages with a lots of interest and i also trying to connect and scrap my bank statement on td canada.
I tried weboob which have a complete solutions for banks, but could not get to the login page, it seems than the website make your browser run a lots of operations way before you get the login prompt...

Have you been able to get thru this first part with python...

best regards
Nicolas

Jacques Gagnon

unread,
Apr 12, 2018, 6:18:29 PM4/12/18
to bean...@googlegroups.com
I pretty much gave up, much easier to download the ofx manualy.

Jacques Gagnon
GTalk/E-Mail: darth...@gmail.com
WLM (MSN): clou...@msn.com

To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscribe@googlegroups.com.

To post to this group, send email to bean...@googlegroups.com.

mpl...@gmail.com

unread,
Apr 17, 2018, 12:52:28 PM4/17/18
to Beancount
I saw the following post on reddit related to downloading data from banks. Maybe it will be helpful to you:

I'm new to the ledger-cli ecosystem, and spent many hours recently trying to get this all sorted out. I now have a more-or-less fully automated setup that works as such:

    Component 1 is a small bash script that invokes plaid2qif to download a CSV of all transactions for each all of my bank accounts, investment accounts, and credit card accounts. https://github.com/ebridges/plaid2qif

Plaid is a back-end system that implements a common API for most financial institutions, so that you don't have to write and maintain separate chase vs. citibank vs. bofa scrapers. You can link up to 100 accounts with the development account for free.

    Once I have a CSV with each account's transactions, I have another short script that invokes into-ledger on each file and imports all of the data

Vivek Gani

unread,
Apr 17, 2018, 1:42:07 PM4/17/18
to Beancount
Haven't looked into it much, but ledger reconciler ( https://disjoint.ca/projects/ledger-reconciler/ ) works via downloading over headless chrome.

nicolas...@gmail.com

unread,
Jun 3, 2018, 7:27:47 AM6/3/18
to Beancount
Hello Jacques,

I managed to get authenticated and grab some bank accounts informations with the use of selenium and python.
I did use a base code of an another github repo, and have to change it a bit to connect and get some informations from it.

here is my repo, just clone it :
git clone https://gitlab.com/n3on3t/TdBank.git

Let me know how it goes.

Nicolas
Reply all
Reply to author
Forward
0 new messages