NRML schema definition

408 views
Skip to first unread message

Craig Arthur

unread,
Jan 22, 2021, 1:17:19 AM1/22/21
to OpenQuake Users
Hi,

I'm trying to track down an XML Schema Definition (XSD) for NRML, so I can validate NRML files for an application I'm building. 

The only references I can find anywhere to NRML is through the OQ code base, but I'm yet to locate an XSD that describes the overall data model. Plenty of examples, but not the authoritative definition. 

Any ideas where I might be able to source this from?

Thanks,
Craig

Michele Simionato

unread,
Jan 22, 2021, 3:11:13 AM1/22/21
to OpenQuake Users
There is no XSD schema. We had one in the past, but was removed many years ago. The reason was the the XSD constraint were too weak, it was totally possible to have valid NRML files that were causing errors during the calculation.
We provide a validation API though, both a Python one (preferred) and a Web API (if you are fine with posting large XML files to a web service). Let me know what you have in mind and I can provide details.

            Michele

Craig Arthur

unread,
Jan 26, 2021, 9:34:46 PM1/26/21
to OpenQuake Users
Thanks Michele. Is there any governance around the format? Is it something that's managed within the OQ community only? 

Also, the application I'm working on is multi-hazard, so the components I am most interested in are the vulnerability & fragility curve definitions. Potentially also the exposure definition. Has the NRML definition evolved to include elements specific to OQ? 

Craig

Michele Simionato

unread,
Jan 27, 2021, 10:34:37 AM1/27/21
to OpenQuake Users
On Wednesday, January 27, 2021 at 3:34:46 AM UTC+1 Craig Arthur wrote:
Thanks Michele. Is there any governance around the format? Is it something that's managed within the OQ community only? 

Also, the application I'm working on is multi-hazard, so the components I am most interested in are the vulnerability & fragility curve definitions. Potentially also the exposure definition. Has the NRML definition evolved to include elements specific to OQ? 

At the moment there is no governance around the NRML format. I am not sure if anybody else is using it for non-earthquake risks, but for sure we in GEM are planning to use it for secondary perils, like liquefaction, landslides, volcanic risks, etc
We do not want it to be specific for Earthquakes only. I hope this clarifies our position,

           Michele

anirudh.rao

unread,
Jan 27, 2021, 11:10:49 AM1/27/21
to OpenQuake Users
Hi Craig,

Thanks to git, the original XSD files for NRML can still be found at https://github.com/gem/oq-nrmllib/tree/master/openquake/nrmllib/schema/risk. These have been deprecated for several years. However, fragility and vulnerability input files defined using these specifications are still accepted by the OpenQuake engine.

Updated example fragility and vulnerability input files for the current version of the OpenQuake engine are listed below.

As Michele mentioned, there are no longer corresponding XSD files. Most of the validation logic applied to these files should be directly applicable to non-earthquake perils too, other than a few specific aspects, for instance where the engine checks for valid intensity measure type definitions. The intensity measures would likely change depending on the peril.

For the exposure, while the older NRML format still works, the engine now also accepts exposure models where the assets are defined in csv files with a short accompanying metadata section in NRML. For instance:

This format is much more convenient for users with large exposure models comprising millions of assets, in which case trying to store all of the asset information in a single xml file becomes unwieldy rather quickly. 

The engine user's manual describes these input files in more detail.

Best regards,
Anirudh

Craig Arthur

unread,
Jan 27, 2021, 9:11:34 PM1/27/21
to OpenQuake Users
Thanks Anirudh, those XSD files are what I was looking for. 

More generally, is there any appetite to formalise these definitions? After some brief discussions within our office, there seems to be value in building some governance around the daata model, as we've got a view both from users of OQ, but for more general, mulit-hazard applications (which I understand GEM may be moving towards?). I'd start with simply having these XSDs in a separate git repo, with some examples as well.

Regards,
Craig

Michele Simionato

unread,
Jan 28, 2021, 12:22:46 AM1/28/21
to OpenQuake Users
On Thursday, January 28, 2021 at 3:11:34 AM UTC+1 Craig Arthur wrote:
Thanks Anirudh, those XSD files are what I was looking for. 

More generally, is there any appetite to formalise these definitions? After some brief discussions within our office, there seems to be value in building some governance around the daata model, as we've got a view both from users of OQ, but for more general, mulit-hazard applications (which I understand GEM may be moving towards?). I'd start with simply having these XSDs in a separate git repo, with some examples as well.


About the general question ( "is there any appetite to formalise these definitions") I cannot answer now, since we have to discuss the issue internally. About using the old XSD I strongly recommend NOT to do that. For instance the XSD
schema for the vulnerability functions that you see in https://github.com/gem/oq-nrmllib/blob/master/openquake/nrmllib/schema/risk/vulnerability.xsd is 8 years old and while the engine will accept files in that format it will annoy you with deprecation warnings. The XSD schema technology is really bad for our use case and will never come back: the kind of validations we need are too advanced for the XSD. Just to give you an idea this is the code required to
check the coefficients of variations in the vulnerability functions (not all of it):

        anycovs = self.covs.any()
        for lr, cov in zip(self.mean_loss_ratios, self.covs):
            if lr == 0 and cov > 0:
                msg = ("It is not valid to define a mean loss ratio = 0 "
                       "with a corresponding coefficient of variation > 0")
                raise ValueError(msg)
            if cov < 0:
                raise ValueError(
                    'Found a negative coefficient of variation in %s' %
                    self.covs)
            if distribution == 'BT':
                if lr == 0:  # possible with cov == 0
                    pass
                elif lr > 1:
                    raise ValueError('The meanLRs must be   1, got %s' % lr)
                elif cov == 0 and anycovs:
                    raise ValueError(
                        'Found a zero coefficient of variation in %s' %
                        self.covs)
                elif cov ** 2 > 1 / lr - 1:
                    # see https://github.com/gem/oq-engine/issues/4841
                    raise ValueError(
                        'The coefficient of variation %s > %s does not '
                        'satisfy the requirement 0 <   < sqrt[  × (1 -  )] '
                        'in %s' % (cov, numpy.sqrt(1 / lr - 1), self))

There are plenty of special rules depending on arrays of coefficients inside the XML.
Moreover the validity of a file depends on other files (example: the logic tree file contains references to other files, the exposure contains references to other files,
etc). The right way to check the validity of the files is to use a command like

$ oq check_input job.ini

that will check all of the files referred to by the job.ini file.

For what concerns example, I would look at the tests, because then you have the guarantee that such examples works and are tested every day.
Just clone the repository and give a

$ find oq-engine/openquake/qa_tests_data/ -name \*.xml 

HTH,
                                        Michele

Craig Arthur

unread,
Jan 28, 2021, 9:31:41 PM1/28/21
to OpenQuake Users
Hi Michele, 

That makes real sense regarding the more specific rules for the XML - those rules still apply to our application as well, so having a more complete validation is also relevant to us. 

One question that has arisen is around the use of the "probabilisticDistribution" attribute - there's two valid values ("LN" and "BT"), but no parameters that define said distribution (e.g. a scale or shape parameter). Can you shed any light on this attribute? 

Thanks
Craig

anirudh.rao

unread,
Jan 29, 2021, 8:14:58 AM1/29/21
to OpenQuake Users
Hi Craig,

Both the lognormal and the beta distributions for the vulnerability functions are parametrised by the mean and coefficient of variation of the loss ratio in the NRML files. If you're looking at the xsd files, the means and coefficients of variation are defined by the elements lossRatio and coefficientsVariation respectively. For functions based on the beta distribution, the means and coefficients of variation are converted to the corresponding ɑ and β parameters by the engine after the file has been read. The code block handling the conversions is here: https://github.com/gem/oq-engine/blob/engine-3.10/openquake/risklib/scientific.py#L801-L857


Best,
Anirudh

Peter Pažák

unread,
Feb 2, 2021, 3:41:34 PM2/2/21
to OpenQuake Users
Hi Anirudh,

just to make sure I understand it perfectly well, for LN it looks like covs is on the input and sigmas are calculated of of that:  sigma = numpy.sqrt(numpy.log(covs ** 2.0 + 1.0))
In the code for BT however it looks like stddev is on input, not coeficient of variation: e.g. def _alpha(mean, stddev): return ((1 - mean) / stddev ** 2 - 1 / mean) * mean ** 2
so just double checking, in the vulnerability.xml when  dist="BT" <covLRs> are really coefficients of variation = sigma / mean, right?

Thank you very much
Peter

Dátum: piatok 29. januára 2021, čas: 14:14:58 UTC+1, odosielateľ: anirudh.rao

anirudh.rao

unread,
Feb 3, 2021, 2:31:03 AM2/3/21
to OpenQuake Users
Hi Peter,

The function definitions in NRML must specify coefficients of variation in both cases. They are converted to standard deviations before the calculation of the alpha and beta parameters for the "BT" distribution.

Best,
Anirudh

Craig Arthur

unread,
Feb 3, 2021, 3:29:32 AM2/3/21
to OpenQuake Users
Hi all,

Bringing the conversation back to the ongoing maintenance of the NRML definition. 

One governance model that I've seen used is essentially managing the definition through something like GitHub. This could support managing an XSD, plus scripts for more contextual validation, such as the examples provided above by Michele. 

From my perspective, the XSD serves as a first pass check to make sure there's nothing fundamentally wrong with syntax, etc. The validation scripts then provide additional (but obviously required) tests to ensure the supplied file is usable. 

Certainly, this raises questions about how to maintain such contextual validator scripts in the OQ code base alongside something for the NRML schema definition in a completely separate repository. Maybe make use of submodules etc. that can be used across multiple repositories. 

Welcome your thoughts on this. 

Craig

Michele Simionato

unread,
Feb 3, 2021, 4:53:55 AM2/3/21
to OpenQuake Users
On Wednesday, February 3, 2021 at 9:29:32 AM UTC+1 Craig Arthur wrote:
Hi all,

Bringing the conversation back to the ongoing maintenance of the NRML definition. 

One governance model that I've seen used is essentially managing the definition through something like GitHub. This could support managing an XSD, plus scripts for more contextual validation, such as the examples provided above by Michele. 

From my perspective, the XSD serves as a first pass check to make sure there's nothing fundamentally wrong with syntax, etc. The validation scripts then provide additional (but obviously required) tests to ensure the supplied file is usable. 

Certainly, this raises questions about how to maintain such contextual validator scripts in the OQ code base alongside something for the NRML schema definition in a completely separate repository. Maybe make use of submodules etc. that can be used across multiple repositories. 

Welcome your thoughts on this. 

Craig

To be honest, from my perspective a governance model would just add overhead. The NRML format is documented (yes, we could move the docs in a single place while now they are scattered in different sections of the manual, but that's it), tested and stable.

We changed the vulnerability functions once in 10 years . Some for the fragility functions and for the hazard models. The exposure was extended a couple of times in 10 years, but in a backward compatible way. The engine still accept and will accept forever the old version of NRML, because we want to be able to repeat calculations of 10 years ago. So I think a third party can already adopt NRML without  issues, even without a formal governance:  you can be sure that we are not going to break NRML.

If extensions of NRML are needed (for instance for non-seismic risks) we are well open to accept suggestions and to discuss. As far as backward compatibility is preserved, it is not an issue to extend NRML.

     Michele

Craig Arthur

unread,
Feb 3, 2021, 4:45:40 PM2/3/21
to OpenQuake Users
As the ones wanting to extend NRML, I think it would fall to us to be responsible for any governance that we desired to have in place. 

And certainly backwards compatibility is a fundamental requirement. The main areas for change we would consider are in the intensity measure types and units to enable a wider range of hazards (we deal with extreme wind and inundation, for example), so these are unlikely to break that compatibility. 

Anyway, we'll have some more discussions internally to see what the appetite is like, and perhaps come back with a more considered suggestion.

Thanks very much for all your thoughts though!

Craig
Reply all
Reply to author
Forward
0 new messages