I have a CSV importer class that is instantiated 4 times so I can use it as an importer for 4 different accounts.
I have been expecting the cache.FileMemo object to help reduce the number of times that a CSV file is parsed when running bean-extract on it.
In fact, I was expecting my CSV parser to be run only once since I wrapped it in a method of my importer class like this:
def parse(self, file):
return file.convert(parse)
where the inner parse is the actual parser function:
def parse(filename):
do-all-the-work-here
and the rest of my importer class only calls self.parse(file) to get to the results of the parser.
Unfortunately, looking at the bean-extract code in beancount/ingest/extract.py, I see the cache.FileMemo object that is passed to my importer is constructed by extract_from_file() once for each combination of importer and filename; since the cache is stored as an attribute of the FileMemo object (the 'file' variable), there is no chance to share it across importers.
On the other hand, the code in beancount.ingest.identify.find_imports does share the FileMemo object across all the importers when calling their identify() method.
Since I need the parsed contents of my CSV files in both the identify() and extract() methods of my importers, I end up with each CSV file being loaded and parsed twice.
Martin, are you aware of this?
If this is the way it is going to be, please let me know so I can use a lighter-weight parsing step for the identify() method and a beefier one for the extract() method.
Thank you