[Bug 1096915] [NEW] OQ-Engine: define hazard task only for site close to each source and not for the entire region of interest

9 views
Skip to first unread message

Damiano Monelli

unread,
Jan 7, 2013, 10:04:48 AM1/7/13
to oqb...@foldr3.com
Public bug reported:

Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites defined
in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway), while
it's clear that a source is only affecting sites close to it (those that
are within the 'maximum_distance' defined in the configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be done
for each source.

We can avoid doing distance calculation for each source, by saving into
the DB the sites locations where hazard results need to be computed for,
create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html) and
for each source query only those sites that are within the bounding box
containing the source 'rupture enclosing polygon' (which is already
saved into the DB). No need to do any distance calculation at the DB
level.

For each source, we can then compute hazard curves/GMFs only for the
sites of interest (for the remaining sites hazard curve and GMFs values
can be set to 0)

In this way we can avoid calling the source site distance filter in both
the classical and event based calculator and just use the rupture site
distance filter
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L75)
which will exclude the sites that are too far from each rupture.

** Affects: openquake
Importance: Undecided
Status: New

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
New

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the sites locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB). No need to do any distance calculation
at the DB level.

For each source, we can then compute hazard curves/GMFs only for the
sites of interest (for the remaining sites hazard curve and GMFs
values can be set to 0)

In this way we can avoid calling the source site distance filter in
both the classical and event based calculator and just use the rupture
site distance filter
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L75)
which will exclude the sites that are too far from each rupture.

To manage notifications about this bug go to:
https://bugs.launchpad.net/openquake/+bug/1096915/+subscriptions

Damiano Monelli

unread,
Jan 7, 2013, 10:51:16 AM1/7/13
to oqb...@foldr3.com
** Description changed:

Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites defined
in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway), while
it's clear that a source is only affecting sites close to it (those that
are within the 'maximum_distance' defined in the configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be done
for each source.

We can avoid doing distance calculation for each source, by saving into
- the DB the sites locations where hazard results need to be computed for,
+ the DB the site locations where hazard results need to be computed for,
create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html) and
for each source query only those sites that are within the bounding box
containing the source 'rupture enclosing polygon' (which is already
- saved into the DB). No need to do any distance calculation at the DB
- level.
+ saved into the DB) dilated by maximum_distance. No need to do any
+ distance calculation at the DB level.

For each source, we can then compute hazard curves/GMFs only for the
sites of interest (for the remaining sites hazard curve and GMFs values
can be set to 0)

In this way we can avoid calling the source site distance filter in both
the classical and event based calculator and just use the rupture site
distance filter
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L75)
which will exclude the sites that are too far from each rupture.

into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 25, 2013, 8:26:49 AM1/25/13
to oqb...@foldr3.com
** Changed in: openquake
Status: New => Confirmed

** Changed in: openquake
Importance: Undecided => High

** Changed in: openquake
Assignee: (unassigned) => Lars Butler (lars-butler)

** Changed in: openquake
Milestone: None => 0.9.1

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 25, 2013, 9:03:12 AM1/25/13
to oqb...@foldr3.com
After discussing this with Damiano, we have condensed the goals of this
task into 2 main points. They are as follows.

1. After doing some profiling with various test scenarios, we have
observed that SiteCollection.expand() [1] takes a significant amount of
time--in fact, more so than the calculation/number crunching itself.
Notes about these observations can be found in bug # 1097676. In the
profiling report for one test case [2], the calculator's `execute`
phase, the core calculation, took 946.248 seconds. Of that time, 198.747
seconds was spent doing SiteCollection.expand() calls. That's more than
20% of the core calculation.

One of the goals of this task is to _avoid_ SiteCollection expansion if
at all possible. This could potentially require some refactoring in the
oq-engine calculation code as well as nhlib.

2. The second goal of this task is to avoid SiteCollection re-creation
inside of every task. Each task needs the Sit eCollection, but in order
to create it, the area of interest geometry needs to be created each
time. If the area of interest is a large polygon, this can take quite a
bit of time. Looking at the same profiling report [2], we find that
307.832 seconds was spent on only 3 calls to Polygon.discretize() [3].

One possible solution is to create the SiteCollection once at the
beginning of the calculation and cache it in pickled form in the
database, in the `uiapi.hazard_calculation` table. We'll have to
experiment and see if the pickling/unpickling and DB storage is more
viable than re-computing the SiteCollection and the geometry every time.
My hunch is that caching in the database will be much more cost-
effective.


[1] - https://github.com/gem/nhlib/blob/73064bc2cc1807632ab9d2ea346e708886fa743e/nhlib/site.py#L217
[2] - https://launchpadlibrarian.net/128508178/test2.prof, see bug # 1097676
[3] - https://github.com/gem/nhlib/blob/73064bc2cc1807632ab9d2ea346e708886fa743e/nhlib/geo/polygon.py#L177

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 5:20:08 AM1/28/13
to oqb...@foldr3.com
Baseline code version for profiling, before any optimizations:
https://github.com/gem/oq-
engine/tree/e802c80e7a8d35e45a34cc1455389b4b44bee9b6

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 6:46:42 AM1/28/13
to oqb...@foldr3.com
Nope, it was the right one. =)

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 6:44:45 AM1/28/13
to oqb...@foldr3.com
Disregard the previous comment; wrong bug.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 8:03:17 AM1/28/13
to oqb...@foldr3.com
Pre-optimization profiling.

** Attachment added: "site-coll.prof"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3504428/+files/site-coll.prof

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 8:04:51 AM1/28/13
to oqb...@foldr3.com
Code after minor optimizations: https://github.com/larsbutler/oq-
engine/tree/6c193ab1d394134a6cdf70b97a7ca0899efd19de

These optimizations basically just generate the site collection once and
cache it in the database.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 8:05:40 AM1/28/13
to oqb...@foldr3.com
Post-optimization profiling.

** Attachment added: "site-coll-opt1.prof"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3504433/+files/site-coll-opt1.prof

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
Confirmed

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 8:17:24 AM1/28/13
to oqb...@foldr3.com
Attached test input files.

** Attachment added: "site-coll.tgz"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3504434/+files/site-coll.tgz

** Changed in: openquake
Status: Confirmed => In Progress

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 28, 2013, 8:14:59 AM1/28/13
to oqb...@foldr3.com
The optimizations are fairly small, but noticeable:

- Polygon.discretize() went from 5 calls with a cumulative time of 107.693 seconds to 2 calls with a cumulative time of 43.638 seconds.
- `get_site_collection`, which creates the SiteCollection, is only called once.
- Overall computation time dropped from 2528.041 seconds to 2426.922 seconds.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 3:49:16 AM1/29/13
to oqb...@foldr3.com
** Tags added: hazard optimization

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 4:25:30 AM1/29/13
to oqb...@foldr3.com
Regarding the first point (SiteCollection.expand()), it seems there is a
small change we can make in nhlib to avoid SiteCollection expansion
altogether.

Code baseline: https://github.com/gem/nhlib/tree/73064bc2cc1807632ab9d2ea346e708886fa743e
Patch: https://github.com/larsbutler/nhlib/commit/507ad2be3b3245502351088593bd2f04fb711bd3

In one particular case, this gave about ~5 optimization. Details and
files to follow.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 4:26:44 AM1/29/13
to oqb...@foldr3.com
This is the script I used to profile the core hazard curve calculation.
It's the same script that was used in bug # 1094297.

** Attachment added: "profile.py"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3505509/+files/profile.py

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 4:27:17 AM1/29/13
to oqb...@foldr3.com
Profiling, before optimization.

** Attachment added: "calc.prof"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3505510/+files/calc.prof

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 4:27:49 AM1/29/13
to oqb...@foldr3.com
Profiling, after optimization.

** Attachment added: "calc-opt.prof"
https://bugs.launchpad.net/openquake/+bug/1096915/+attachment/3505511/+files/calc-opt.prof

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 4:31:18 AM1/29/13
to oqb...@foldr3.com
New version of the patch, with a typo fixed:
https://github.com/larsbutler/nhlib/commit/74c9f261a56fff810767c64d72b345de73f6755f

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 29, 2013, 5:47:34 AM1/29/13
to oqb...@foldr3.com
Update: So, while the patch gave a ~5x optimization in an isolated test,
it actually increased the overall computation time when running the OQ
Engine end-to-end, an increase of about 340 seconds.

I need to investigate further.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 30, 2013, 7:01:23 AM1/30/13
to oqb...@foldr3.com
Update:

Making test inputs consistent (between pure nhlib and full oq-engine
tests), it seems that further optimizing SiteCollection expansion is
intractable. I'm going to advise that we leave this alone for now and
revisit it in the future. Further attempts at optimization may not be
worth the effort.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 30, 2013, 7:19:27 AM1/30/13
to oqb...@foldr3.com
Clarification:

One the second point
(https://bugs.launchpad.net/openquake/+bug/1096915/comments/1) we were
able to make some small optimizations. Attacking the first point seems
to be a dead-end.

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to

Lars Butler

unread,
Jan 30, 2013, 11:40:32 AM1/30/13
to oqb...@foldr3.com
Pull request to avoid re-computation of the site collection:
https://github.com/gem/oq-engine/pull/1022

--
You received this bug notification because you are subscribed to
OpenQuake.
Matching subscriptions: openquake-bugs
https://bugs.launchpad.net/bugs/1096915

Title:
OQ-Engine: define hazard task only for site close to each source and
not for the entire region of interest

Status in OpenQuake:
In Progress

Bug description:
Currently, hazard calculations are parallelized over sources (i.e. a
task is defined for each source and for the entire set of sites
defined in the configuration file).

However, for regional scale calculations (e.g. Europe) it can happen
that a source (say in North Africa) is assigned to an entire set of
sites that covers entire Europe (from North Africa up to Norway),
while it's clear that a source is only affecting sites close to it
(those that are within the 'maximum_distance' defined in the
configuration file).

Currently we are using the NHLIB source filtering mechanism
(https://github.com/gem/nhlib/blob/master/nhlib/calc/filters.py#L55)
that is passed both to the classical hazard curve calculator
(https://github.com/gem/nhlib/blob/master/nhlib/calc/hazard_curve.py#L25)
and used in the event based calculator (https://github.com/gem/oq-
engine/blob/master/openquake/calculators/hazard/event_based/core_next.py#L161)
in order to remove those sites that are too far from the considered
source. However this filtering mechanism requires distance calculation
from the source to the entire set of sites (which is of the order of
100k(s) for regional scale calculations). This calculations need be
done for each source.

We can avoid doing distance calculation for each source, by saving
into the DB the site locations where hazard results need to be
computed for, create (only once) a geospatial index
(http://postgis.refractions.net/documentation/manual-1.3/ch03.html)
and for each source query only those sites that are within the
bounding box containing the source 'rupture enclosing polygon' (which
is already saved into the DB) dilated by maximum_distance. No need to
Reply all
Reply to author
Forward
0 new messages