If you remove data by splitting your input to multiple files you remove your ability to report on arbitrary periods, e.g. any period that would straddle that date.
But given this might be needed, I just had an idea to fix this: create two features:
1. Given a date, automatically generate a closing and opening transaction that balance each other out; link them together with a unique tag; insert these to your files to split them up. You might even be able to automate the splitting of the files, imagine a script that produces two files, keeping every non-matching line, all in order, dispatching the input transaction to its respective file.
2. Add a feature (in my case a plugin) that would replace all the transactions which match a particular tag by a single summarizing transaction. if the balances are all zero, don't insert anything. This should annul the closing + opening transactions to each other.
If you support providing multiple input files, one should be able to provide the last e.g., three years, and obtain an uninterrupted stream of their transactions to carry out reporting over any subperiod within that, without viewing opening/closing transactions.
One question: What part is slow and why? It's surprising to me that my slow and naive Python code should be sufficiently fast to process the whole thing every time while your swanky single-pass Haskell implementation should be slower and require splitting. Curious to understand why it's slow.