I am seeking advice on how to go about exporting liam2 entities to csv format at the end of each iteration. Exporting all variables for all entities would be fine. In the following paragraphs I have set out some background and explained how far I have got. I would be very grateful for some guidance on this!
We are looking at liam2 performance on models of the scale and complexity we ordinarily use. The simulation code runs very fast, but we have hit issues with outputting the datasets produced to a format SAS can read - SAS being our product of choice for detailed and ad hoc analyses of microsimulation model outputs. Howard Redway has posted on this in liam2-users.
SAS appears unable to read hdf5 files directly, so we are using csv as an intermediate format.
The current liam2 functionality for outputting individual-level records to csv files is accessed by using csv(dump()). The time taken for a 99-year run of our test model to output the data this way is around 1 hour. SAS only takes a few minutes to create comparably-sized csv files, and to read such files in. We would like to explore whether there are faster ways to output to csv.
I have tested the csv routine on its own, on a simple array of comparable size to that we are using. Python can export this to csv very quickly using the same csv writer as liam2. I find this puzzling - the only explanation I can come up with is that it is due to the parsing that goes on within liam2 - but in that case I imagine the simulation itself ought to be similarly affected, yet it runs very fast. So I am rather stumped.
I have also looked at using pandas HDFStore routines to translate the hdf5 files to csv format at the end of the simulation. However for a 99 year run this appears to hit the memory limit on our system [20gb ram] so that it takes about 30 minutes to translate the data, compared to around 6 minutes for a 50-year run.
Whilst we can look at expanding the memory further, I would also like to understand whether there is any direct way to access the data structures within liam2 during a run, to save them to csv format at each iteration. I do not mean to re-invent the dump() and csv() routines, but I am instead wondering whether there is something simple we can if we simply want to export all variables for all cases [and whether it might be faster to do so]?
Where I have got to with tracking down what to do is this:
- in the simulation.py module, entity.store_period_data is what calls the process for saving to the hdf5 file
- store_period_data is in the entities.py module and it's the term self.array.append_to_table(self.table) that is [presumably] moving the current period data to the hdf5 table
- append_to_table is in data.py but just seems to be an alternative name for append_carray_to_table
- append_carray_to_table seems to chop the data into chunks [based on the buffers] and append those to the table for storage
But I cannot see an appropriate point where I could add some code to output the array to csv - nor indeed how to refer to the array.