safe extraction api

16 views
Skip to first unread message

Nicolas Schmid

unread,
Jun 8, 2021, 10:44:36 AMJun 8
to OpenQuake Users
Hi everyone

I am working on  a scenario_risk calculation and am trying to get the calculation results in a consistent and "future-proof" manner.

There are several different ways of accessing the data documented, however all of them are rather minimal.

You already stated, that the contents of the *datastore* are not consistent and frequently change. 

The *Extractor* has very little documentation which led me to fall back to looking at the keys inside the datastore and using those with the extractor api.

Just a couple of problems using the extractor:
extractor.get(agg_losses-rlzs).to_dframe() -> doesnt work
extractor.get(avg_losses-rlzs).to_dframe() -> works
=> how am I supposed to know what I can call where

extractor.get(agg_values).shape/.array -> shape=(28328,1) corresponding to the array
extractor.get(agg_values).to_dframe() -> gives me a (3232x3) dataframe back

as already said above
extractor.get(agg_losses-rlzs).to_dframe() -> doesnt work
and extractor.get(agg_losses-rlzs).array only returns one column. Using extractor.get(agg_keys) gives me the corresponding keys. Once I'm looking at the agg_losses-rlzs I've got different keys however

also the head of my *csv export* avg_losses-rlz has some weird missing site_id's:
asset_id,site_id,taxonomy,lon,lat,structural
211,?,M1_L,6.01821,46.14401,9.87386E+03
212,?,M3_L,6.01821,46.14401,2.10054E+04
213,?,M4_L,6.01821,46.14401,1.79089E+03
214,?,M6_L,6.01821,46.14401,1.74702E+04
215,?,RCW_L,6.01821,46.14401,7.78517E+02
216,?,T,6.01821,46.14401,6.63474E+02
641,0,M3_L,6.08760,46.47775,4.10986E+03
645,1,M6_L,6.08872,46.43278,2.96731E+03
1345,2,M1_L,6.12753,46.44223,6.36461E+03
1346,2,M3_L,6.12753,46.44223,1.51787E+04

but using *csv export* seems rather like a detour since I'm immediately reading the results into some python code again. Also if I have the results on a different server this is not very efficient.

Using the WebAPI:
Not mentioning the normal problems like timeouts and connection speed:
Im not very flexible, e.g. when aggregating for site_id I have the agg_losses-rlzs which mentions a site_id but I don't have a sitemesh which tells me which site has which coordinates. I will have to go into avg_losses and construct the relationship from there. 

And I have not tried yet aggregation by multiple keys.

Is there a way, which is reasonably documented, which gives me access to all the information. Which will also not possibly break with the next version?

Kind regards
Nicolas

Michele Simionato

unread,
Jun 9, 2021, 9:35:11 PMJun 9
to OpenQuake Users
You are making some good points, Nicholas. We use internally the extract API for plotting purposes. Since we are not plotting directly the agg_losses-rlzs output,
it is no wonder that it does not work properly. As far as I know, you are the first user that noticed that. You should open issues on GitHub for the obvious bugs.
More in general, if you need some specific extractor not provided by then engine, we gladly accept pull requests. While the internal format of the datastore may
change, we are committed to keep the extractors working across versions as much as possible.

    Michele
Reply all
Reply to author
Forward
0 new messages