Reckon wants your csv files

959 views
Skip to first unread message

Edwin van Leeuwen

unread,
Jan 24, 2014, 4:55:39 AM1/24/14
to ledge...@googlegroups.com
Hi all,

Reckon needs your help :)

Reckon automagically converts CSV files for use with the command-line
accounting tool Ledger. It also helps you to select the correct
accounts associated with the CSV data using Bayesian machine learning.
For more information see:
http://blog.andrewcantino.com/blog/2010/11/06/command-line-accounting-with-ledger-and-reckon/

We would like to expand reckon's ability to automagically convert csv
files. It already supports quite a few formats, but we are interested
in taking this further. For that we need more csv examples, so that we
can make sure those are correctly detected and especially make sure no
mistakes are made. You could really help us out by sending us
(anonimized) csv files as produced by your bank. We'd add those
examples to our test suite and make sure it all works well. Ideally,
we'd need a csv file containing a minimum of 5 transactions.

The formats currently in the test suite are here:
https://github.com/cantino/reckon/blob/master/spec/reckon/csv_parser_spec.rb#L207

Full disclosure: I am not the original author, but have been
contributing code to make it correctly convert my csv files :)

Cheers, Edwin

Zack Williams

unread,
Jan 24, 2014, 9:31:40 AM1/24/14
to ledge...@googlegroups.com
On Fri, Jan 24, 2014 at 2:55 AM, Edwin van Leeuwen <edwi...@gmail.com> wrote:
> We would like to expand reckon's ability to automagically convert csv
> files. It already supports quite a few formats, but we are interested
> in taking this further.

Can it take multi-column input, for example, from a vendor where the formula:

materials + tax + shipping = total

applies, or from a payment processor where:

total_billed - fees = total_deposited

and generate a multi-line ledger entry that balances?

- Zack

Martin Blais

unread,
Jan 24, 2014, 9:57:41 AM1/24/14
to ledge...@googlegroups.com
These would be better done in two separate steps IMHO:

1. extract the data from whichever external source format (e.g. OFX) into an internal transaction data structure
2. "complete" incomplete imported transaction objects by adding missing legs using the past Ledger history

About (1): CSV files are pretty rare. The only ones I've come across (in my own little bubble of a world) are PayPal, OANDA, and Ameritrade. Much more common for banks, investment and credit card companies is OFX and Quicken files. I also find it convenient to recognize at least *some* data from PDF files, such as the date of a statement, for automatic classification and filing into a folder (you could apply machine learning to this problem, i.e. give a whole bunch of disorganized words from what is largely imperfect PDF to text conversion, classify which statement it is, but crafting a few regexps by hand has proved to work quite well so far).  I'll add anomyfied example input files to Beancount for automated testing at some point, they'll be going here:

I'm thinking.... maybe it would make sense for importers (mine and/or yours) to spit out some sort of XML/JSON format that could be converted into either Ledger of Beancount syntax or whatever else? This way all those importers could be farmed out to another project and reused by users of various accounting software. Does this make sense?

About (2): If Ledger supports input'ing incomplete transactions, you could do this without relying on CSV conversion, that would be much more reusable. In Beancount, my importers are allowed to create invalid transaction objects, and I plan to put in a simple little perceptron function that should do a good enough job of adding missing legs automatically (one might call this "automatic categorization"), independently of input data format.

Just some ideas,





--

---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Edwin van Leeuwen

unread,
Jan 24, 2014, 10:14:42 AM1/24/14
to ledge...@googlegroups.com
On Fri, 24 Jan 2014 07:31:40 -0700, Zack Williams <zdw...@gmail.com> wrote:
> Can it take multi-column input, for example, from a vendor where the formula:
>
> materials + tax + shipping = total
>
> applies, or from a payment processor where:
>
> total_billed - fees = total_deposited
>
> and generate a multi-line ledger entry that balances?

At the moment not. It currently only supports multi column csv as
produced by some banks, i.e. with a separate debit and credit column,
but only a value in either the debit or the credit column.

It could be interesting though to add this possibility. I am not sure at
Reckon's ability to work with multi-line ledger entries (as said I am
not the main/original developer), but it shouldn't be behind the realms
of possibility to add.

Did you have a specific csv example available? Are the fees already a
negative number in the column?

>
> - Zack
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "Ledger" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
--

Edwin van Leeuwen

unread,
Jan 24, 2014, 10:29:42 AM1/24/14
to ledge...@googlegroups.com
On Fri, 24 Jan 2014 09:57:41 -0500, Martin Blais <bl...@furius.ca> wrote:
> These would be better done in two separate steps IMHO:
>
> 1. extract the data from whichever external source format (e.g. OFX) into
> an internal transaction data structure
> 2. "complete" incomplete imported transaction objects by adding missing
> legs using the past Ledger history

I do agree that it could make sense to split these up into two projects
in the future. At the moment Reckon's scope is small enough though that that is
not needed yet.

>
> About (1): CSV files are pretty rare. The only ones I've come across (in my
> own little bubble of a world) are PayPal, OANDA, and Ameritrade. Much more
> common for banks, investment and credit card companies is OFX and Quicken
> files. I also find it convenient to recognize at least *some* data from PDF
> files, such as the date of a statement, for automatic classification and
> filing into a folder (you could apply machine learning to this problem,
> i.e. give a whole bunch of disorganized words from what is largely
> imperfect PDF to text conversion, classify which statement it is, but
> crafting a few regexps by hand has proved to work quite well so far). I'll
> add anomyfied example input files to Beancount for automated testing at
> some point, they'll be going here:
> https://hg.furius.ca/public/beancount/file/tip/src/python/beancount/sources

In my experience banks seem to always support at least csv and only QIF
OFX when you are lucky. I only have experience with personal banking in
the UK and the Netherlands though.

>
> I'm thinking.... maybe it would make sense for importers (mine and/or
> yours) to spit out some sort of XML/JSON format that could be converted
> into either Ledger of Beancount syntax or whatever else? This way all those
> importers could be farmed out to another project and reused by users of
> various accounting software. Does this make sense?

I would love such a project, but have little use for reading other
things than csv files myself.

>
> About (2): If Ledger supports input'ing incomplete transactions, you could
> do this without relying on CSV conversion, that would be much more
> reusable. In Beancount, my importers are allowed to create invalid
> transaction objects, and I plan to put in a simple little perceptron
> function that should do a good enough job of adding missing legs
> automatically (one might call this "automatic categorization"),
> independently of input data format.

In some ways this is the main part of what Reckon already does, since it fills in the missing
information based on entries in an existing ledger values. The csv
parsing was only added to make it easier to read in the data.

Edwin

johan...@gmail.com

unread,
Jan 24, 2014, 3:01:12 PM1/24/14
to ledge...@googlegroups.com
Thank you for the initiative! 
See my attached CSV example. 

Note about encoding: 
My bank provides western iso latin 1 encoded csv downloads, but I have converted it to utf-8 for the purpose of uploading my example into this discussion group.
example.csv

johan...@gmail.com

unread,
Jan 24, 2014, 3:13:03 PM1/24/14
to ledge...@googlegroups.com
note: i squeezed my csv example into a pull request, see https://github.com/cantino/reckon/pull/31
Johannes

Rémi Vanicat

unread,
Jan 25, 2014, 5:56:26 AM1/25/14
to ledge...@googlegroups.com
Edwin van Leeuwen <edwi...@gmail.com>
writes:

> Hi all,
>
> Reckon needs your help :)
>
> Reckon automagically converts CSV files for use with the command-line
> accounting tool Ledger. It also helps you to select the correct
> accounts associated with the CSV data using Bayesian machine learning.
> For more information see:
> http://blog.andrewcantino.com/blog/2010/11/06/command-line-accounting-with-ledger-and-reckon/
>

I've attached mine.

Note that
- my bank do not create true csv, as the separator is ';'
- this is a French bank, so it use comma (',') and not dot ('.') as
decimal separator
- first line describe in french what each column are. It translate to
Account;Account Date;Operation Date;Description;Reference;Value Date;Amount

anoynimised.csv

Edwin van Leeuwen

unread,
Jan 26, 2014, 9:34:44 AM1/26/14
to ledge...@googlegroups.com
Thank you very much for your example. I added it to the specs and it
works fine. Currently you do need to specify on the command line that
comma separates the cents and that ';' is the csv separator.

Cheers, Edwin
> --
> Rémi Vanicat

Boyd Kelly

unread,
Feb 11, 2014, 3:29:13 PM2/11/14
to ledge...@googlegroups.com, BlackEdder
Please find attached a couple of csv files, one from a Broker in Canada, the second is an export of Intuit Mint web app.  Much appreciate the work you are doing!  If any questions, let me know.  (Note the date format in the mint file is US:MM/DD/YYYY.)

Boyd


--

---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
il_rrsp.csv
mint.csv

Martin Blais

unread,
Feb 11, 2014, 3:40:41 PM2/11/14
to Martin Blais, ledge...@googlegroups.com
Thinking about this more, there's the potential for a nice big project independent of all our Ledger implementations, to deal with external data. Here's the idea, five components of a single project:

- "Fetching": code that can automatically obtain the data by connecting to various data sources. The ledger-autosync attempts to do this using ofxclient for institutions that support OFX. This could include a scraping component for other institutions.

- "Recognition": given a filename and its contents, automatically guess which institution and account it is for. Beancount's import package deals with this by allowing the user to specify a list of regexps that the file must match. I'm not entirely sure this can always be done irrespective of the user, as the account-id is often a required part of a regexp, but it might. This is used to automate "figuring out what to do" given a bunch of downloaded files in a directory, a great convenience.  There is some code in ledger-autosync and the beancount.sources Python package.

- "Extraction": parse the file, CSV or OFX or otherwise, and extract a list of double-entry transactions data structures from it in some sort of generic internal format, independent of Ledger / HLedger / Beancount / other.  The Reckon project aims to do this for CSV files.

- "Export": convert the internal transactions data structure to the syntax of one particular double-entry language implementation, Ledger or other. This spits out text.

- "Filing": given the same files as for step 4 / extraction, figure out which Ledger account they correspond to and automatically sanitize the filenames, extract and add the date into it, and move them in a directory hierarchy corresponding to each account.

Beancount's import code deals with steps 2, 3, 4, 5, but frankly I would much rather that code live in an external project shared with others. I'm thinking about forking it out and starting a new codebase for it.

John Wiegley

unread,
Feb 11, 2014, 9:48:59 PM2/11/14
to ledge...@googlegroups.com
>>>>> Martin Blais <bl...@furius.ca> writes:

> Beancount's import code deals with steps 2, 3, 4, 5, but frankly I would
> much rather that code live in an external project shared with others. I'm
> thinking about forking it out and starting a new codebase for it.

I agree. Ledger has its own "convert" subcommand, which also attempts to
address 2-5, but having that code in C++ makes it harder for others to change.

John

Gabriel Kerneis

unread,
Feb 12, 2014, 6:00:31 AM2/12/14
to ledge...@googlegroups.com
On Tue, Feb 11, 2014 at 03:40:41PM -0500, Martin Blais wrote:
> - "Fetching": code that can automatically obtain the data by connecting to
> various data sources. The ledger-autosync attempts to do this using
> ofxclient for institutions that support OFX. This could include a scraping
> component for other institutions.

http://weboob.org/

The boobank client exports to csv, and it would be easy to add other
export formats. Many backends, mostly french banks (none of them
support OFX).

Best,
--
Gabriel

Edwin van Leeuwen

unread,
Feb 15, 2014, 11:44:22 AM2/15/14
to Boyd Kelly, ledge...@googlegroups.com
On Tue, 11 Feb 2014 12:29:13 -0800, Boyd Kelly <bke...@coastsystems.net> wrote:
> Please find attached a couple of csv files, one from a Broker in Canada,
> the second is an export of Intuit Mint web app. Much appreciate the work
> you are doing! If any questions, let me know. (Note the date format in
> the mint file is US:MM/DD/YYYY.)
>
> Boyd
>

Hi Boyd,

Thank you for the files. Especially the Intuit Mint example is useful,
since it had a format we had not seen before. Namely, that the column
after the money column indicates whether the amount should be debited
or credited.

Reckon now correctly parses both files.

Cheers,

Edwin

--

AMaffei

unread,
Feb 27, 2014, 11:16:37 AM2/27/14
to ledge...@googlegroups.com, Martin Blais
Martin,

I really like the idea of a staged system, perhaps with a set of programs and drivers (see below).

I'd be interested in helping with a project along these lines. Unfortunately my programming skills are rusty, but I work with a colleague who might help out. 

My own processing approach is similar to yours. Apologies for length and detail level. I have not looked at Rekon in detail yet so perhaps some of these ideas are already employed in other manners. My comments on each stage (and one of my own added) below...

--Andy


On Tuesday, February 11, 2014 3:40:41 PM UTC-5, Martin Blais wrote:
Thinking about this more, there's the potential for a nice big project independent of all our Ledger implementations, to deal with external data. Here's the idea, five components of a single project:
- thanks for dissecting things so nicely.
 
- "Fetching": code that can automatically obtain the data by connecting to various data sources. The ledger-autosync attempts to do this using ofxclient for institutions that support OFX. This could include a scraping component for other institutions.
- the output of this stage would be a number of files of different formats -- OFX, a spectrum of CSV file formats, and others.
 
- "Recognition": given a filename and its contents, automatically guess which institution and account it is for. Beancount's import package deals with this by allowing the user to specify a list of regexps that the file must match. I'm not entirely sure this can always be done irrespective of the user, as the account-id is often a required part of a regexp, but it might. This is used to automate "figuring out what to do" given a bunch of downloaded files in a directory, a great convenience.  There is some code in ledger-autosync and the beancount.sources Python package.
- I really like the approach CSV2Ledger takes with it's FileMatches.yaml (https://github.com/jwiegley/CSV2Ledger/blob/master/FileMatches.yaml) file.  I think defining a spec for FileMatches.yaml that either Perl, Python, or whatever code could employ for following stages might be worthwhile. Filematches.yaml (or the equivalent) would provide key information for future processing stages of files from different sources. If CSV files then information about field-separators, field-names, a reg-ex for "real" records, etc. can be specified here. The result of "Recognition" would be to pass the file off to a customized driver (see my next comment).
 
- "Extraction": parse the file, CSV or OFX or otherwise, and extract a list of double-entry transactions data structures from it in some sort of generic internal format, independent of Ledger / HLedger / Beancount / other.  The Reckon project aims to do this for CSV files.
- I suggest employing small driver programs, written by others, that ingest custom formats. The path to the appropriate driver program would be included in the FileMatches.yaml file (or it's equivalent).  These drivers would ingest files output by "Fetching" stage and generate the "generic internal format" you mention. However, In support of flexibility I suggest that the result of this stage be a CSV file, that we strictly specify the format of, that would be processed by the next stage.

- I add an additional stage here I'll call "AccountAssignment". I examine several fields of the imported record (things like employeeID, PONumber, etc. that are associated with the transaction) to determine which DEB account name to assign it to. Account names for all DEB systems should be hierarchical so that could still be done in a DEB-software-agnostic manner. A more sophisticated version CSV2Ledger's PreProcess.yaml (https://github.com/jwiegley/CSV2Ledger/blob/master/PreProcess.yaml) could help drive this stage. The output of this stage is the same CSV as above with a "DEBAccount" field appended to each record.

- "Export": convert the internal transactions data structure to the syntax of one particular double-entry language implementation, Ledger or other. This spits out text.
- I once again like the approach of CSV2Ledger.pl (see source code at https://github.com/jwiegley/CSV2Ledger/blob/master/CSV2Ledger.pl#L138). It allows for the FileMatches.yaml file to include a variable called TxnOutputTemplate that specifies how to setup the ledger-cli transaction in your journal file. A similar templating approach could be used for other double-entry language file formats.

Martin Blais

unread,
Feb 27, 2014, 11:45:50 AM2/27/14
to AMaffei, ledge...@googlegroups.com, Martin Blais
Hi Andy,
This thread has been sitting in my inbox for a while, waiting for me to reply with the following. (I was hoping to get the project ready before sending the link below but I've been too busy to get it done by now.)

I'm in the process of forking out my Beancount import code into a new project, which will be called "ledgerhub," and which will be free of Beancount code dependencies, i.e. it should work for it, Ledger and any other similar implementations.

Here is the design doc for it:

Design Doc for LedgerHub

Please (anyone) feel free to comment in the margins (right-click -> Comment...).

More comments below.


On Thu, Feb 27, 2014 at 11:16 AM, AMaffei <drum...@mac.com> wrote:
Martin,

I really like the idea of a staged system, perhaps with a set of programs and drivers (see below).

I'd be interested in helping with a project along these lines. Unfortunately my programming skills are rusty, but I work with a colleague who might help out. 

My own processing approach is similar to yours. Apologies for length and detail level. I have not looked at Rekon in detail yet so perhaps some of these ideas are already employed in other manners. My comments on each stage (and one of my own added) below...

--Andy


On Tuesday, February 11, 2014 3:40:41 PM UTC-5, Martin Blais wrote:
Thinking about this more, there's the potential for a nice big project independent of all our Ledger implementations, to deal with external data. Here's the idea, five components of a single project:
- thanks for dissecting things so nicely.

I've added more detail in the deisgn doc.

 
 
- "Fetching": code that can automatically obtain the data by connecting to various data sources. The ledger-autosync attempts to do this using ofxclient for institutions that support OFX. This could include a scraping component for other institutions.
- the output of this stage would be a number of files of different formats -- OFX, a spectrum of CSV file formats, and others.

Yes.

 
- "Recognition": given a filename and its contents, automatically guess which institution and account it is for. Beancount's import package deals with this by allowing the user to specify a list of regexps that the file must match. I'm not entirely sure this can always be done irrespective of the user, as the account-id is often a required part of a regexp, but it might. This is used to automate "figuring out what to do" given a bunch of downloaded files in a directory, a great convenience.  There is some code in ledger-autosync and the beancount.sources Python package.
- I really like the approach CSV2Ledger takes with it's FileMatches.yaml (https://github.com/jwiegley/CSV2Ledger/blob/master/FileMatches.yaml) file.  I think defining a spec for FileMatches.yaml that either Perl, Python, or whatever code could employ for following stages might be worthwhile. Filematches.yaml (or the equivalent) would provide key information for future processing stages of files from different sources. If CSV files then information about field-separators, field-names, a reg-ex for "real" records, etc. can be specified here. The result of "Recognition" would be to pass the file off to a customized driver (see my next comment).

My approach is similar to this, with the regepxs, see the example code bits in the Identification section of my document, or the example importer file in my source code:

One could imagine creating instances of a more generic "CSV importer" that could take its configuration of which field maps to what. In my experience, each source has peculiarities beyond this and requires custom code, so that's the approach I've taken so far, but nothing would prevent the inclusion of such an importer in the system I propose.



- "Extraction": parse the file, CSV or OFX or otherwise, and extract a list of double-entry transactions data structures from it in some sort of generic internal format, independent of Ledger / HLedger / Beancount / other.  The Reckon project aims to do this for CSV files.
- I suggest employing small driver programs, written by others, that ingest custom formats. The path to the appropriate driver program would be included in the FileMatches.yaml file (or it's equivalent).  These drivers would ingest files output by "Fetching" stage and generate the "generic internal format" you mention. However, In support of flexibility I suggest that the result of this stage be a CSV file, that we strictly specify the format of, that would be processed by the next stage.

Ha... why do you like CSV for an internal data format? I was thinking that this data structure wouldn't even go to a file, it could just be some Python tuples/namedtuples.  We could indeed define an intermediate format, but I can't really see when that would be needed.  Ideally, no edits at ths stage would be necessary by the user so I don't see a need to output that to file.

About programs: There's really only two separate program/steps I take so far: 1. "import" which generates text that I append to my ledger file, 2. "file" which moves the files into a directory hierarchy. Both of these programs put together many of the steps described in the document and I haven't found a need to separate them so much so far, except for debugging.  Do you need them all separated?  That could be done.



- I add an additional stage here I'll call "AccountAssignment". I examine several fields of the imported record (things like employeeID, PONumber, etc. that are associated with the transaction) to determine which DEB account name to assign it to. Account names for all DEB systems should be hierarchical so that could still be done in a DEB-software-agnostic manner. A more sophisticated version CSV2Ledger's PreProcess.yaml (https://github.com/jwiegley/CSV2Ledger/blob/master/PreProcess.yaml) could help drive this stage. The output of this stage is the same CSV as above with a "DEBAccount" field appended to each record.

I do the same thing :-)  The way I weave this into my importers is that each importer in the configuration file defines a dictionary of required configuration variables, almost all of which are account names. When the importer creates the normalized transaction objects, it uses the account names from its configuration.


 

- "Export": convert the internal transactions data structure to the syntax of one particular double-entry language implementation, Ledger or other. This spits out text.
- I once again like the approach of CSV2Ledger.pl (see source code at https://github.com/jwiegley/CSV2Ledger/blob/master/CSV2Ledger.pl#L138). It allows for the FileMatches.yaml file to include a variable called TxnOutputTemplate that specifies how to setup the ledger-cli transaction in your journal file. A similar templating approach could be used for other double-entry language file formats.

That's an interesting idea.  That will be done, it would be flexible, esp. for Ledger output.

For Beancount target output, the code base has functions to convert transactions back into text into its current text input format, and those can change, I had planned to use those. The new Beancount input syntax is a lot less flexible, so it's not as necessary to provide options for it.  I'll add this to the design doc.

Thanks for your comments.
Please leave more on the doc.

I'll try to fork out my current import code as soon as possible so others can contribute.


Martin Blais

unread,
Feb 27, 2014, 11:48:31 AM2/27/14
to Martin Blais, AMaffei, ledge...@googlegroups.com
On Thu, Feb 27, 2014 at 11:45 AM, Martin Blais <bl...@furius.ca> wrote:

Here is the design doc for it:

Design Doc for LedgerHub

Please (anyone) feel free to comment in the margins (right-click -> Comment...).

One more thing: if you sign in to your Google account your name will show up in your comments. (Otherwise your comments will show up as "Anonymous Hedgehog" or somesuch.)


AMaffei

unread,
Feb 27, 2014, 2:00:20 PM2/27/14
to ledge...@googlegroups.com, AMaffei, Martin Blais
Thanks Martin.

One thing I'll comment on here is my preference of a CSV file (instead of creation of an internal data structure) as an output of the "Extraction" phase. My intent is to make the system more scalable. Edwin is currently collecting lots of different CSV files generated from many different sources and incorporating their translation into Rekon. HIs efforts can only scale so far. I'm constantly amazed at the format and content of CSV exports I run into.

A company that generates a custom CSV or a 3rd party might someday provide a service and/or code (in whatever language they prefer) to translate their custom-CSV format into a Ledger-Hub-compatible CSV that would be ingested into the later stages of Ledger-Hub.

I'll see if I can come up with a draft spec for such a CSV after I read your Google Doc and comment on it.  "How nice to see that! Thanks.

-- Andy

Edwin van Leeuwen

unread,
Mar 3, 2014, 10:35:47 AM3/3/14
to ledge...@googlegroups.com
Hi all,

Just a quick update. I have since added a number of csv files and
tests to reckon. In the end not that many changes were needed for
reckon to parse them correctly. Thank you all for the help.

As far as I can tell the money columns in csv files are structured in
4 different ways.
1. Contains one money column with all the additions and subtractions
from the account
2. Contains two columns, one with subtractions the other with additions
3. Like 2, but both columns only contain positive values. The header
indicates whether the column should be used as a debit or credit
column
4. (The most annoying one) One money column with only positive
values. Another column (normally either directly in front or behind
the money column) indicates whether the amount was credited or debited
from the account.

Sometimes the csv file also contains a column that has the balance of
the account, which can make detecting the correct money column
slightly more difficult.

The date importer was already able to recognise all date formats, but
thanks to Steve Purcell has been improved to automatically detect
inconclusive conversions of US/non-US dates (reckon will now error out
in the off chance non of the dates are clear).


Just a thought on the internal format of a library: I would be tempted
to use OFX as an internal format and then from there to
ledger/beancount format. This is because OFX is a well defined format,
so should hold any kind of financial data without problems. This will
also make it easier for other tools to adopt, because they might
already have an OFX import function.

Kind regards,

Edwin

Martin Blais

unread,
Mar 3, 2014, 11:17:35 AM3/3/14
to ledge...@googlegroups.com
On Mon, Mar 3, 2014 at 10:35 AM, Edwin van Leeuwen <edwi...@gmail.com> wrote:
Just a thought on the internal format of a library: I would be tempted
to use OFX as an internal format and then from there to
ledger/beancount format. This is because OFX is a well defined format,
so should hold any kind of financial data without problems. This will
also make it easier for other tools to adopt, because they might
already have an OFX import function.

I don't think it's a good idea. OFX is really messy, and appears to allow for a lot more interpretation than one would like. Very soon in ledgerhub I'm going to check in example of OFX statements from various institutions and you can have a look for yourself. (I'm trying to extract the code from beancount at the moment but I want to do this right, so it'll take another week or two for the codebase to come up.)

Simon Michael

unread,
Mar 3, 2014, 6:26:59 PM3/3/14
to ledge...@googlegroups.com
On 2/27/14 8:45 AM, Martin Blais wrote:
> I'm in the process of forking out my Beancount import code into a new
> project, which will be called "ledgerhub," and which will be free of
> Beancount code dependencies, i.e. it should work for it, Ledger and any
> other similar implementations.
>
> Here is the design doc for it:
>
> Design Doc for LedgerHub
> https://docs.google.com/document/d/11u1sWv7H7Ykbc7ayS4M9V3yKqcuTY7LJ3n1tgnEN2Hk/edit?usp=sharing
>
> Please (anyone) feel free to comment in the margins (right-click ->
> Comment...).

Hi Martin, that's a nice initiative, thanks for the well-written doc.

Is "ledgerhub" the right name ? No matter. I too think there should be a
converter for financial formats as good as Pandoc is for document
markups. hledger has some aspirations and structure in place to support
this. I added a few minor comments to the doc.

You may get there faster with python though.

I'm not sure speed is as unimportant as you say. If conversions are done
rarely, perhaps so.

Best - Simon

Harshad RJ

unread,
Mar 3, 2014, 9:19:59 PM3/3/14
to ledger-cli, Martin Blais
Martin et al,

Nice initiative. A suggestion:

On Wed, Feb 12, 2014 at 2:10 AM, Martin Blais <bl...@furius.ca> wrote:
- "Fetching": code that can automatically obtain the data by connecting to various data sources. The ledger-autosync attempts to do this using ofxclient for institutions that support OFX. This could include a scraping component for other institutions.

- "Recognition": given a filename and its contents, automatically guess which institution and account it is for.

The "fetching" module already has information about where the data was downloaded from. Wouldn't it be better to retain this meta-data somewhere, to help with the "recognition" / "identification" step?

Recognition from a filename or contents alone seems flaky to me.

--

Martin Blais

unread,
Mar 3, 2014, 10:36:55 PM3/3/14
to Harshad RJ, ledger-cli, Martin Blais
On Mon, Mar 3, 2014 at 9:19 PM, Harshad RJ <harsh...@gmail.com> wrote:
Martin et al,

Nice initiative. A suggestion:

On Wed, Feb 12, 2014 at 2:10 AM, Martin Blais <bl...@furius.ca> wrote:
- "Fetching": code that can automatically obtain the data by connecting to various data sources. The ledger-autosync attempts to do this using ofxclient for institutions that support OFX. This could include a scraping component for other institutions.

- "Recognition": given a filename and its contents, automatically guess which institution and account it is for.

The "fetching" module already has information about where the data was downloaded from. Wouldn't it be better to retain this meta-data somewhere, to help with the "recognition" / "identification" step?

Hmmm that's an interesting idea.
But I'm not convinced we can effectively implement fetching reliably yet.
I like to be able to recognize just from files stashed in ~/Downloads


Recognition from a filename or contents alone seems flaky to me.

 Actually, I've been using this alone for a few years and it has worked great so far, not flaky at all.  Of course you have to come up with the regexps, but within the realm of a single user's files, it's quite easy to do so. Based on my experience, I see this step as a very reliable one actually.


johan...@gmail.com

unread,
Mar 4, 2014, 3:17:57 AM3/4/14
to ledge...@googlegroups.com, Harshad RJ, Martin Blais

The "fetching" module already has information about where the data was downloaded from. Wouldn't it be better to retain this meta-data somewhere, to help with the "recognition" / "identification" step?

Hmmm that's an interesting idea.
But I'm not convinced we can effectively implement fetching reliably yet.
I like to be able to recognize just from files stashed in ~/Downloads

Some browsers / operating systems keep this metadata, and (platform-specific versions of) ledgerhub could make use of this information. E.g., Safari and Chrome on Mac OS X save the download's URL as metadata, compare http://code.google.com/p/understand/wiki/MacOSMetadata or http://apple.stackexchange.com/questions/110239/where-is-the-where-from-meta-data-stored-when-downloaded-via-chrome

Peter Gallagher

unread,
Mar 4, 2014, 3:22:47 AM3/4/14
to ledge...@googlegroups.com
Here's a CSV from the Bank of Melbourne (Australia). They unhelpfully concatenate two fields (transaction type and transaction comment) in the second field of the CSV. \

Thank you for Reckon.

Best,

Peter
BankOfMelbourneSample.csv
Reply all
Reply to author
Forward
0 new messages