I use Version Control on my Ledger data using Bazaar. I know many
others prefer Git, but any kind of version control provides important
features including:
- Change tracking
- Backup/Restore points
- Branching & Merging
- Test vs Prod
- (Potential) Multiuser Access
I VC all my Ledger files, the ledger binary, scanned receipts and all
configuration. My repo now totals nearly 2GB. Many of my scripts also
are integrated with VC, automatically adding files that are created,
or generating a balance statement on each commit.
Unlike a central database with views into the data, I've split out my
data into many files where it seemed to make sense to keep related
txns together. This helps simplify editing.
A high level view is that I use my .ledger file in the repo root to
import all other DAT files from multiple subdirectories when reporting
with Ledger, with approximately 100 files and growing. My files
revolve around my workflow, with "glue" scripts helping maintain
order.
I import data into a transitory queue file which I edit to assign
metadata, and then use scripts to relocate txns to their permanent
location. The queue is then cleared to remove txns that aren't yet
permanently stored, and the process repeats.
Because I'm importing with deduplication, removing txns that aren't
relevant to the task at hand does not cause data loss. They will be in
the queue again after the next bulk import.
An example of my typical workflow would be:
- Download latest CSV from bank, file into Archive
- Clear the Queue
- Import all CSVs to Queue
- Relies heavily on deduplication
- Edit queue and assign metadata to txns
- ER & Project & Category
- Verify current ER, editing as needed
- Business logic (ie: spending limits)
- Accurate file links
- Accurate metadata & accounts
- Uses several scripts to verify different items
- File finished txns to permanent storage
- Generate & email PDF
- Commit ER to VC
My repo includes:
- Repo root
- ledger binary
- Ensures that the compiled version is in the repo with the data
- .ledger
- Uses !include on every DAT file in every directory,
automatically generated from script
- Sets precision & mileage rates
- Queue.dat
- All txns are imported here via script, and once an ER# has been
assigned txns are "filed" to permanent DAT files in ER/
- CSV2Ledger Configuration Files
- Account matching and translation
- Txn transformations
- .MD5Sum.cache
- Used by CSV2Ledger to cache all md5's, updated by a separate script
- .env
- Provides shell functions for business logic reporting
- .startcommit
- Triggers automatic reports on commit
- Generates:
- ERStatus.txt
- Outstanding amounts on each ER
- LedgerBalance.txt
- ledger bal output
- Commits FAIL if ledger files are not valid and reports could
not run
- Projects.txt
- Human readable lookup table of project codes to customers
- Archive/
- Permanaent CSV data storage, downloaded from bank.
- Txns in files may overlap, importing must dedupe.
- Data/
- DAT files for "misc" txns with one file per year
- Opening balances in oldest file
- Often for manual txns
- ER/
- Permanent txn storage
- One DAT file for EACH expense report which includes most txns in
that report (ie: ER/AISER0123.dat)
- Some txns span multiple ER's, and are typically stored with the
first posting's ER.
- PDF/
- Finished PDF expense reports with receipt images
- Receipts/
- Each project code gets a subdirectory, storing jpeg images of
scanned receipts, one page per file
- 2,300 Receipts and growing.
- bin/
- Misc scripts, including:
- CSV2Ledger.pl
- More later on automation, but ensures I always have a working
production copy
- ClearQueue.sh
- Clears all txns with an md5sum from Queue.dat, recreates
md5sum cache
- ERStatus.sh
- Generate outstanding balance report by ER
- GenerateLatexExpeneseReport.pl
- Creates Latex expense reports from Ledger and compiles to PDF
- LoadAllCSV.sh
- Loads all archives CSV files, deduplication is built in.
- RefileER.sh
- Move txns for specified ER # from Queue.dat to permanent file
in ER/
- RegenMD5.sh
- Recreate md5sum cache, called by other scripts after changes
- RenameVisaCSV.sh
- Rename credit card CSV's from the bank for archival
- UnmatchedImages.sh
- Find receipt images NOT linked to a txn (ie: compare
filesystem to txns)
- VerifyImages.sh
- Check filenames in metadata against filesystem (ie: missing images)
Eyes crossed yet?
I had considered a database like SQL to minimize the number of files
involved, and enable a better workflow based on a view into specific
txns instead of static placement in diverse text files.
------------------------------------------------------------------
Russell Adams RLA...@AdamsInfoServ.com
PGP Key ID: 0x1160DCB3 http://www.adamsinfoserv.com/
Fingerprint: 1723 D8CA 4280 1EC9 557F 66E8 1154 E018 1160 DCB3