oq export

233 views
Skip to first unread message

era

unread,
Jul 4, 2017, 1:08:37 AM7/4/17
to OpenQuake Users, Elizabeth Abbott
Hi team,

Just a question. I need to export the all outputs of a full path enumeration of a logic tree using classical PSHA in OQv2.5. I've been looking at OQ's help information around this (in the engine and in the manual) and if I need to export all outputs of a full path enumeration on a logic tree, it looks like using the command "oq export uhs/all" or whichever hazard type I would want. This is all fine since I figure I can just use a pipe to get it to do that automatically after I do the hazard run, but it looks like it only provides the output in csv, even if you were to add a "-e xml" or something like that to try and force it to export in xml.

Is it at all possible to have the program output all outputs of a full path enumeration as xml?

Thanks!

Best,
Elizabeth

Michele Simionato

unread,
Jul 4, 2017, 3:35:10 AM7/4/17
to OpenQuake Users, e.ab...@gns.cri.nz
You discovered a bug which will be fixed shortly. Here is the fix: https://github.com/gem/oq-engine/pull/2946.
Anyway, exporting all realizations in XML is not really efficient, if you have large models you have other options like exporting in HDF5 format.
As usual, thanks for the feedback,

         Michele

era

unread,
Jul 6, 2017, 6:35:27 PM7/6/17
to OpenQuake Users, e.ab...@gns.cri.nz
Hi Michele, 

Thanks! I'll look into using the HDF5 output as our primary one for bigger output jobs like this. I'm sorry I've had no time to experiment myself, but in theory, to do that, would you just do an 'oq export uhs/all -e HDF5' or does HDF5 become the default output?

Cheers,
Elizabeth

Michele Simionato

unread,
Jul 7, 2017, 1:54:08 AM7/7/17
to OpenQuake Users, e.ab...@gns.cri.nz


On Friday, July 7, 2017 at 12:35:27 AM UTC+2, era wrote:
Hi Michele, 

Thanks! I'll look into using the HDF5 output as our primary one for bigger output jobs like this. I'm sorry I've had no time to experiment myself, but in theory, to do that, would you just do an 'oq export uhs/all -e HDF5' or does HDF5 become the default output?


No, HDF5 will never become the default output because we cannot use it from QGIS. Right now it is only implement for the hazard curves as an experiment more than an official output. Of course, if tomorrow
QGIS started supporting the same version of HDF5 that we use and we were sure that the support stayed there, we will use HDF5 a lot more, but I do not see this happening any time soon. This is why we are
using the `.npz` format instead. It has more limitations but it does not cause segfaults in QGIS ;)

There is a page documenting some of the `oq` commands and in particular `oq export`: https://github.com/gem/oq-engine/blob/master/doc/oq-commands.md
I promise to document other commands before the next release ;-)

era

unread,
Jul 26, 2017, 6:48:50 PM7/26/17
to OpenQuake Users, e.ab...@gns.cri.nz
My apologies for the delay in responding!  

I think I understand what you are saying. HDF5 being incompatible with QGIS means it is inconvenient for use across all outputs at this stage. You mentioned .npz as an alternative... I have read into this a little bit now, but not much. It sounds, as you said, like it has its limitations. That being said, the concept of the output seems straightforward enough - I assume it would still be usable for plotting data and putting it in tables, etc. if we put the appropriate code around it? Though I suppose this could get complicated as the combination of things (ie. sites, poes, intensity measures, and intensity types) you are using grows. 

Looking at the commands from the link you sent, it looks like the uhs can be output in npz, but not the realizations? I say this realizing that you are promising to document the other commands before the next release :-)  Hazard curves are helpful but more often we need to consider uniform hazard spectra in our analyses (research and commercial), which can include presenting the full spread of results from the logic tree. 

Again, I am sorry I have not had a chance to experiment with any of this yet. Please don't feel pressure to respond to this while you are on holiday, Michele!

Michele Simionato

unread,
Jul 27, 2017, 2:41:36 AM7/27/17
to OpenQuake Users, e.ab...@gns.cri.nz
The problem of the outputs has always been one of the biggest complications of the engine. At the moment we have over 50 exporters defined, mostly for the risk outputs. It is not clear how the management of the outputs will change in the future, but it will certainly change. The risk scientists would be in favor of removing most or all outputs from the engine, just expose an API accessible to the QGIS plugin and then the problem of the outputs will be moved to the QGIS side. The hazard scientists would prefer to export HDF5 files. We still have outputs that are somewhat obsolete (.geojson, .xml) and it not how long they will be supported.

There is a lot of work to do in order to rationalize the outputs, if you have wishes now it is the time to speak! I mean not only you, Elizabeth, but all users of the engine.

era

unread,
Sep 14, 2017, 5:44:11 PM9/14/17
to OpenQuake Users

Sorry for the big delay on this -  having spoken to some of the research software developers we've got on staff, they reckon json (smaller outputs) and hdf5 (larger) would be best for us overall. It would also fit in well with our post-processing programs, etc. At the moment we're working to adapt everything to csv as well, but I don't think that is necessarily the preferred long term solution as there is information contained in the xml files that is not included in the csv files.

Michele Simionato

unread,
Sep 15, 2017, 5:25:33 AM9/15/17
to OpenQuake Users


Il giorno giovedì 14 settembre 2017 23:44:11 UTC+2, era ha scritto:

Sorry for the big delay on this -  having spoken to some of the research software developers we've got on staff, they reckon json (smaller outputs) and hdf5 (larger) would be best for us overall. It would also fit in well with our post-processing programs, etc. At the moment we're working to adapt everything to csv as well, but I don't think that is necessarily the preferred long term solution as there is information contained in the xml files that is not included in the csv files.


In these last few months we thought about the future of the outputs of the engine.
Our plan now is to provide a web API so that the QGIS plugin can read the data it needs directly from it, without the need to download files.
This means that we can dispense from exporting .npz files. The web API is already there and will be part of engine 2.7, to be released fairly soon.
The .npz files will not be removed for the moment but they might be in the future. The number of export .hdf5 files may grow.
I cannot be more specific at the moment, but this looks like the most promising direction for the future.
HTH,

                 Michele

era

unread,
Apr 5, 2019, 12:02:12 AM4/5/19
to OpenQuake Users

Hi team,


Related to this conversation from a year or so ago, I have been looking at trying to get hdf5 output for a number of hazard results (hcurves, uhs, hmaps gmf_data, etc.). According to the information here and using oq export --info, that doesn't appear to be possible still.

 

Are there any plans to expand the data types that can be output as hdf5? Or is npz the large export format we should be looking to use more consistently?

 

Thanks,

Elizabeth

michele....@globalquakemodel.org

unread,
Apr 7, 2019, 12:10:07 AM4/7/19
to OpenQuake Users
On Friday, April 5, 2019 at 6:02:12 AM UTC+2, era wrote:

Hi team,


Related to this conversation from a year or so ago, I have been looking at trying to get hdf5 output for a number of hazard results (hcurves, uhs, hmaps gmf_data, etc.). According to the information here and using oq export --info, that doesn't appear to be possible still.

 

Are there any plans to expand the data types that can be output as hdf5? Or is npz the large export format we should be looking to use more consistently?



Since one year ago things have changed. Now the hazard curves (and hmaps) are stored in the datastore in a nice format and essentially there is no need to export them. For the sake of exemplification, suppose you have
a calculation with N=100,000 sites, R=1000 realizations and 10 IMTs with 40 levels each, i.e L  = 400. Then the amount of data required to store the hazard curves using 32 bit floats is N * R * L * 4 = 149 GB.
The engine is perfectly capable of storing such data in the `hcurves` dataset of the datastore as a 3D array of shape (N, R, L) and it does so fast and efficiently. Then you can post process the data by looking
directly to this dataset. There is no need to export in npz, csv or any other format, with the risk of wasting a lot of time and space to export something which is already stored in the right way.
Before engine 3.3 an exporter was needed to generated the individual curves output after the calculation, now it is generated during the calculation if you have set individual_curves=true in the job.ini.
Hope this answer your question,

                      Michele

era

unread,
Apr 8, 2019, 1:32:33 AM4/8/19
to OpenQuake Users
Thanks, Michele, that sounds awesome! I'm so glad the engine can do that now without requiring explicit outputs. Where does the output data "live" now after it is calculated? Can data from previous calculations be accessed after other calculations have been done (e.g. can we access the outputs of calculation 1 if we have done 10 calculations afterward?)? Is there any kind of issue if we were to need to export that data in some format later on?

Hopefully those questions make sense.

Thanks,
Elizabeth

michele....@globalquakemodel.org

unread,
Apr 8, 2019, 2:27:58 AM4/8/19
to OpenQuake Users


On Monday, April 8, 2019 at 7:32:33 AM UTC+2, era wrote:
Thanks, Michele, that sounds awesome! I'm so glad the engine can do that now without requiring explicit outputs. Where does the output data "live" now after it is calculated? Can data from previous calculations be accessed after other calculations have been done (e.g. can we access the outputs of calculation 1 if we have done 10 calculations afterward?)? Is there any kind of issue if we were to need to export that data in some format later on?

Take for instance the demo  LogicTreeCase1ClassicalPSHA which has individual_curves=true, 2 sites, 8 realizations, 2 IMTs with 19 levels each.
If you run it it will generate an HDF5 file (the datastore) that contains everything and then you can do things like this:

In [1]: from openquake.baselib import datastore

In [2]: ds = datastore.read('/home/michele/oqdata/calc_33885.hdf5')  # you can use the full path or the calculation ID here

In [3]: ds['hcurves-rlzs'].shape
Out[3]: (2, 8, 38)

The two IMTs are PGA and SA(0.1) respectively, so for the third axis the indices [0:19] will correspond to PGA and the indices [19:38] to SA(0.1).
HTH,

       Michele

era

unread,
Apr 10, 2019, 7:14:18 PM4/10/19
to OpenQuake Users
Ah that's great! Thank you for the demo. Much appreciated.

I think that solves our questions for now, but we will ask if anymore arise!

Thank you!

Elizabeth
Reply all
Reply to author
Forward
0 new messages