Re: [liam2-dev] Writing outputs to csv - quickly

Skip to first unread message

Gaëtan de Menten

Nov 11, 2014, 12:15:16 PM11/11/14
I am not at work this week, so I will answer your questions more precisely next week.

If it helps, the data is stored in "numpy" arrays (which are contiguous in-memory arrays). So if you can get fast .csv out of that, integrating that functionality into LIAM2 should be easy.

Hope it helps,

-----Original message-----
From: Joe Bosak
Date: Fri, 7 Nov 2014 18:39:09 +0100
Subject: [liam2-dev] Writing outputs to csv - quickly

> Hi there
> I am seeking advice on how to go about exporting liam2 entities to csv
> format at the end of each iteration. Exporting all variables for all
> entities would be fine. In the following paragraphs I have set out some
> background and explained how far I have got. I would be very grateful for
> some guidance on this!
> Background
> =========
> We are looking at liam2 performance on models of the scale and complexity
> we ordinarily use. The simulation code runs very fast, but we have hit
> issues with outputting the datasets produced to a format SAS can read - SAS
> being our product of choice for detailed and ad hoc analyses of
> microsimulation model outputs. Howard Redway has posted on this in
> liam2-users.
> SAS appears unable to read hdf5 files directly, so we are using csv as an
> intermediate format.
> The current liam2 functionality for outputting individual-level records to
> csv files is accessed by using csv(dump()). The time taken for a 99-year
> run of our test model to output the data this way is around 1 hour. SAS
> only takes a few minutes to create comparably-sized csv files, and to read
> such files in. We would like to explore whether there are faster ways to
> output to csv.
> Attempts made
> ============
> I have tested the csv routine on its own, on a simple array of comparable
> size to that we are using. Python can export this to csv very quickly
> using the same csv writer as liam2. I find this puzzling - the only
> explanation I can come up with is that it is due to the parsing that goes
> on within liam2 - but in that case I imagine the simulation itself ought to
> be similarly affected, yet it runs very fast. So I am rather stumped.
> I have also looked at using pandas HDFStore routines to translate the hdf5
> files to csv format at the end of the simulation. However for a 99 year
> run this appears to hit the memory limit on our system [20gb ram] so that
> it takes about 30 minutes to translate the data, compared to around 6
> minutes for a 50-year run.
> Whilst we can look at expanding the memory further, I would also like to
> understand whether there is any direct way to access the data structures
> within liam2 during a run, to save them to csv format at each iteration. I
> do not mean to re-invent the dump() and csv() routines, but I am instead
> wondering whether there is something simple we can if we simply want to
> export all variables for all cases [and whether it might be faster to do
> so]?
> Where I have got to with tracking down what to do is this:
> - in the module, entity.store_period_data is what calls the
> process for saving to the hdf5 file
> - store_period_data is in the module and it's the term
> self.array.append_to_table(self.table) that is [presumably] moving the
> current period data to the hdf5 table
> - append_to_table is in but just seems to be an alternative name
> for append_carray_to_table
> - append_carray_to_table seems to chop the data into chunks [based on the
> buffers] and append those to the table for storage
> But I cannot see an appropriate point where I could add some code to output
> the array to csv - nor indeed how to refer to the array.
> Regards
> Joe
> --
> You received this message because you are subscribed to the Google Groups "liam2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To post to this group, send an email to
> Visit this group at

Reply all
Reply to author
0 new messages