finding eventid in quakeml from fdsn-event ws

3 views
Skip to first unread message

Philip Crotwell

unread,
Mar 1, 2017, 10:45:17 PM3/1/17
to fdsn-wg3...@fdsn.org
Hi all

A common access pattern is to first do an initial wide but shallow
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.

I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.

While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.

I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.

Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.

thanks
Philip


IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">

NCEDC
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">


SCEDC
<event publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37300872"
catalog:datasource="ci" catalog:dataid="ci37300872"
catalog:eventsource="ci" catalog:eventid="37300872">


USGS
<event catalog:datasource="us" catalog:eventsource="us"
catalog:eventid="c000lvb5"
publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=usc000lvb5&format=quakeml">

ETHZ
<event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">

INGV
<event publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">

ISC
<event publicID="smi:ISC/evid=600516598">

Jeremy Fee

unread,
Mar 3, 2017, 11:51:17 PM3/3/17
to fdsn-wg3...@fdsn.org
Hello,

in the QuakeML world, entities (events, origins, picks) are identified
> through the publicID, and *only*" through the publicID. The publicID has
> been designed in a way that makes it easy to be globally unique (authority
> part, then resource part that is in the hands of the issueing agency which
> ensures uniqueness). The "legacy" event IDs that are used in some
> earthquake catalogs (often just integer numbers) cannot be unique,
> collisions are likely to occur. When designing QuakeML there was a long
> discussion whether legacy IDs should be part of the data model, and there
> was a consensus that they shouldn't, first because their usage should be
> discouraged (non-uniqueness, not being future-proof, etc), and there was
> also no semantically convincing place in the schema to put them.


USGS handled this by defining an eventsource (typically FDSN network code)
as a "namespace" for eventids, which eliminates collisions and allowing
contributors to continue using the existing IDs for events without
requiring yet another eventid system. We commented on this early in the
Quakeml 1.2 process (see previous email to the quakeml mailing list below)
and implemented a custom extension to Quakeml to support our requirements (
https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatalog-0.1.xsd
) while remaining compatible with the original specification.

USGS requirements may differ from other organizations, because we aggregate
multiple earthquake catalogs from many contributors into a single
"composite" catalog. We consider an event to have multiple IDs, one unique
id from each contributor (USGS included), and allow events to be referenced
using any of those IDs. Messages from multiple contributors are associated
based on location in space and time, and automatic associations can be
manually overridden when needed. This balances the requirements for a)
individual organizations to assign a unique identifier and maintain a
catalog of events and b) operate independently of any central authority.


The fdsnws-event standard says when it comes to the eventid query
> parameter: "event identifiers are data center specific". It seems that most
> implementations expect the legacy ID, not the publicID of the event (in
> fact, in your examples, this holds for all data centers expect for ETH).
> Thanks for pointing this out! I was totally unaware of the fact.
>


In my opinion this is a serious specification and implementation flaw. In
> the next version of the event service spec it should be mandatory that
> eventid is the publicID of the event. All services should be queried in the
> same way, but, e.g., for ETH it is not possible, because there are no
> legacy IDs. In the current situation, the user has to know which service
> requires legacy ID, and which service requires publicID. In addition, the
> legacy ID is not per se contained in the returned QuakeML document, as it
> is not contained in the standard. This makes it hard to find the legacy ID
> if it exists at all (can be hidden in the publicID, or in an extension
> attribute which depends on the data center).
>


Furthermore, QuakeML publicIDs are designed to be opaque. They may be
> compiled from other pieces of information, like timestamps, legacy IDs,
> etc., but they need not, they can just be random strings (the resource
> part). Therefore, no user or service should rely on parsing publicIDs.


If you want to add support to query using public IDs, I recommend
definition of a new "publicID" parameter for the fdsn service and leave the
existing eventid parameter unchanged for backward compatibility. An
additional consideration is how a service should handle multiple versions
of the same event element (assuming ordering based on
event/creationInfo/creationTime). It may be simpler for an explicit
"detailURL" or similar attribute to be added to the event element to
support the explicit use case to obtain more information.

We introduced our custom extension to support these requirements of being
able to uniquely identify events, and individual pieces of information
being contributed to those events, because of the suggestion (see below)
that there may be aliases for publicIDs and they would not be guaranteed to
be universally unique. I suggest that rather than adding/changing meaning
of an otherwise opaque and non-unique identifier, that explicit attributes
or elements be created for these purposes (or that the AnssCatalog
extension be more widely adopted).


Thanks,

Jeremy


Previous email to quakeml mailing list (couldn't find the list archives
online):

> From: Fabian Euchner <fabian....@sed.ethz.ch>
>
> Date: March 23, 2011 9:35:11 AM MDT
>
> To: Jeremy M Fee <jm...@usgs.gov>
>
> Cc: <Qua...@intensity.usc.edu>, <kfe...@gps.caltech.edu>, Michelle
>
> Guy <mg...@usgs.gov>
>
> Subject: Re: Fwd: [QuakeML] question about authority-id and resource-
>
> id in public identifiers
>
>
>> Hi Jeremy et al.,
>
>
>> the section on resource identifiers in the standard doc is based on
>
> some, up
>
> to now rather theoretical, thoughts on how a resource metadata
>
> framework could
>
> look like. This is very much inspired (I could also say borrowed)
>
> from how
>
> this is handled in the Astrophysical Virtual Observatory
>
> community ;-) To be
>
> honest, I don't know how agencies that already use QuakeML interpret &
>
> implement it and you are of course right that it's not specified in
>
> detail in
>
> the standard doc. Since it is a standard doc on the markup language,
>
> I think
>
> it's not the right place to specify it, there should be a second
>
> document that
>
> is more focused on infrastructure. *I think it's pretty clear that an*
>
> * identifier cannot refer to two different resources, but I could*
>
> * imagine that a*
>
> * resource can be referenced by more than one identifier from the same*
>
> * authority*
>
> * (aliases).* Thanks for starting this discussion, it's an important
>
> point if
>
> QuakeML starts to play a more important role in our networked
>
> infrastructures.
>
>
>> Cheers,
>
> Fabian
>
>
>>
>> On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:
>
> > Hi Fabian,
>
> >
>
> > I received a reply from Karen at CalTech, but I'd like to know if
>
> > these assumptions are also safe across all QuakeML implementations:
>
> > 1) An authority always refers to the same resource using the same
>
> > resourceID.
>
> > 2) When an authority updates a resource, the same resourceID is used
>
> >
>
> > And as a result of the previous two assumptions:
>
> > 3) When one authority submits two different event resourceIDs, they
>
> > refer to different events.
>
> >
>
> >
>
> > I've read the QuakeML-BED.pdf for version 1.1, and cannot find
>
> > anything imposing this restriction. At USGS we rely on this to 1)
>
> > track updates to existing events, and 2) distinguish events that are
>
> > so close in space and time they would otherwise be considered the
>
> > same
>
> > event.
>
> >
>
> >
>
> > Thanks,
>
> >
>
> > Jeremy
>
> >
>
> > Begin forwarded message:
>
> >> From: Karen Felzer <kfe...@gps.caltech.edu>
>
> >> Date: March 17, 2011 4:23:32 PM MDT
>
> >> To: Jeremy M Fee <jm...@usgs.gov>
>
> >> Cc: qua...@intensity.usc.edu
>
> >> Subject: Re: [QuakeML] question about authority-id and resource-id
>
> >> in public identifiers
>
> >>
>
> >> Yes -- information for the same earthquake should always be reported
>
> >> under the same earthquake ID number.
>
> >>
>
> >> regards,
>
> >> Karen Felzer
>
> >>
>
> >> On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:
>
> >>> Hi,
>
> >>>
>
> >>> Is it a safe assumption that an authority will always refer to the
>
> >>> same resource using the same resource id? Meaning, if an authority
>
> >>> submits an event under one resource id, they will always reuse that
>
> >>> same resource id when updating event information (and identify
>
> >>> version information separately)? This would make it much easier to
>
> >>> recognize updates, versus new information.
>
> >>>
>
> >>>
>
> >>> Thanks,
>
> >>>
>
> >>> Jeremy
>
> >>> _______________________________________________
>
> >>> QuakeML mailing list
>
> >>> Qua...@intensity.usc.edu
>
> >>> http://intensity.usc.edu/mailman/listinfo/quakeml
>
> >
>
>
>> --
>
>
>>
>> -------------------------------------------------------------------------------
>
>
>> Fabian Euchner phone +41 44 633 7178
>
> Swiss Seismological Service fax +41 44 633 1065
>
> ETH Zurich, NO F67 e-mail fab...@sed.ethz.ch
>
> Sonneggstrasse 5 www.fabian-euchner.de
>
> 8092 Zurich (Switzerland)
>
> www.earthquake.ethz.ch/people/feuchner
>
>
>>
>> -------------------------------------------------------------------------------
>
>
>> QuakeML http://quakeml.org AstroCat http://astrocat.org
>
> QuakePy http://quakepy.org CVcat http://cvcat.net
>
> CSEP http://www.cseptesting.org
>
>
>>
>> -------------------------------------------------------------------------------
>
>
>>


On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner <fabian....@sed.ethz.ch>
wrote:

> Hi Philip, hi all,
>
>
>
> in the QuakeML world, entities (events, origins, picks) are identified
> through the publicID, and *only*" through the publicID. The publicID has
> been designed in a way that makes it easy to be globally unique (authority
> part, then resource part that is in the hands of the issueing agency which
> ensures uniqueness). The "legacy" event IDs that are used in some
> earthquake catalogs (often just integer numbers) cannot be unique,
> collisions are likely to occur. When designing QuakeML there was a long
> discussion whether legacy IDs should be part of the data model, and there
> was a consensus that they shouldn't, first because their usage should be
> discouraged (non-uniqueness, not being future-proof, etc), and there was
> also no semantically convincing place in the schema to put them.
>
>
>
> The fdsnws-event standard says when it comes to the eventid query
> parameter: "event identifiers are data center specific". It seems that most
> implementations expect the legacy ID, not the publicID of the event (in
> fact, in your examples, this holds for all data centers expect for ETH).
> Thanks for pointing this out! I was totally unaware of the fact.
>
>
>
> In my opinion this is a serious specification and implementation flaw. In
> the next version of the event service spec it should be mandatory that
> eventid is the publicID of the event. All services should be queried in the
> same way, but, e.g., for ETH it is not possible, because there are no
> legacy IDs. In the current situation, the user has to know which service
> requires legacy ID, and which service requires publicID. In addition, the
> legacy ID is not per se contained in the returned QuakeML document, as it
> is not contained in the standard. This makes it hard to find the legacy ID
> if it exists at all (can be hidden in the publicID, or in an extension
> attribute which depends on the data center).
>
>
>
> Furthermore, QuakeML publicIDs are designed to be opaque. They may be
> compiled from other pieces of information, like timestamps, legacy IDs,
> etc., but they need not, they can just be random strings (the resource
> part). Therefore, no user or service should rely on parsing publicIDs.
>
>
>
> Thanks again, Philip, for bringing up this important issue.
>
>
>
> Best regards,
>
> Fabian

> > ----------------------
>
> > FDSN Working Group III
>
> > (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
>
> >
>
> > Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
>
> > Update subscription preferences at http://www.fdsn.org/account/profile/
>
>
>
>
>
> --
>
> ------------------------------------------------------------
> -----------------
>
> Fabian Euchner phone +41 44 633 7178
>
> Institute of Geophysics fax +41 44 633 1065
>
> ETH Zurich, NO F5 e-mail fab...@sed.ethz.ch
>
> Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
>
> 8092 Zurich (Switzerland)
>
> ------------------------------------------------------------
> -----------------
>
> QuakeML http://quakeml.org QuakePy http://quakepy.org
>
> CSEP http://www.cseptesting.org/centers/eth
>
> ------------------------------------------------------------
> -----------------
>
>
>
>
> ----------------------
> FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-
> products/)
>
> Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at http://www.fdsn.org/account/profile/
>
>

Fabian Euchner

unread,
Mar 4, 2017, 2:49:07 AM3/4/17
to fdsn-wg3...@fdsn.org
Hi Philip, hi all,

Best regards,
Fabian


>

Fabian Euchner

unread,
Mar 4, 2017, 8:47:06 AM3/4/17
to fdsn-wg3...@fdsn.org
Hello Jeremy, hello all,

first, let me apologize if somebody found my comment too harsh or offending. That was
absolutely not my intention.

Since the fdsnws-event default output format is QuakeML, I assume that the QuakeML
data model is the common minimum standard data model. Since legacy IDs are not
contained therein, I think using them as a query parameter should not the common
standard way to query individual event information. Therefore, I would suggest that a
next iteration of the event service specification defines a new query parameter, maybe
called eventpublicid, that is implemented by all data centers to query on event publicIDs,
which are mandatory in all result QuakeML documents. If some data centers want to
additionaly provide a query parameter for legacy IDs, that's fine for me. Every user
querying based on this has to know what she/he does, and how to deal with results.

All the best,
Fabian


> Hello,
>
> in the QuakeML world, entities (events, origins, picks) are identified
>
> > through the publicID, and *only*" through the publicID. The publicID has
> > been designed in a way that makes it easy to be globally unique (authority
> > part, then resource part that is in the hands of the issueing agency which
> > ensures uniqueness). The "legacy" event IDs that are used in some
> > earthquake catalogs (often just integer numbers) cannot be unique,
> > collisions are likely to occur. When designing QuakeML there was a long
> > discussion whether legacy IDs should be part of the data model, and there
> > was a consensus that they shouldn't, first because their usage should be
> > discouraged (non-uniqueness, not being future-proof, etc), and there was
> > also no semantically convincing place in the schema to put them.
>
> USGS handled this by defining an eventsource (typically FDSN network code)
> as a "namespace" for eventids, which eliminates collisions and allowing
> contributors to continue using the existing IDs for events without
> requiring yet another eventid system. We commented on this early in the
> Quakeml 1.2 process (see previous email to the quakeml mailing list below)
> and implemented a custom extension to Quakeml to support our requirements (
> https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal

> og-0.1.xsd ) while remaining compatible with the original specification.

Philip Crotwell

unread,
Mar 7, 2017, 2:52:55 AM3/7/17
to fdsn-wg3...@fdsn.org
Hi

The main question I have as a client writer is how do I get from a
general fdsn event query, with many events, to a detailed query for a
single event without server-specific code. As best I can figure out,
there is no simple answer now.

The best I can come up with is this algorithm. I presume everyone
agrees this is needlessly complicated and as there is not a good
default action, it is unable to handle a new fdsn event web services
without rewriting the code.

1) If (IRIS or INGV):
use publicID as full URL after replacing "smi:" with "http://"
2) if (USGS or SCEDC):
use publicID as full URL after replacing "quakeml:" with "http://"
3) if (NCEDC):
use catalog:eventid as eventid parameter
4) if (ETHZ):
use entire publicID (including smi:) as eventid parameter
5) if (ISC):
parse publicID as a URL and use the value of the evid parameter as
eventid parameter


One further note is that although the USGS, SCEDC and NCEDC appear to
use the same anss "catalog" quakeml extension, they interpret the
fdsnevent eventid parameter differently. The USGS requires the
concatenation of catalog:eventsource and catalog:eventid while NCEDC
and SCEDC both accept only catalog:eventid as the eventid parameter.

What a client needs is to be able to use a single value from a quakeml
event as the eventid. Theoretically, the publicID appears as if it is
supposed to be that value, but as a practical matter only works for
one out of the seven services. And the publicID as it is currently
specified is not friendly to being used as a URL parameter as it is
possible (and very common) to have it include the '&' character.
Without escaping that char, the resulting URL will be wrong. IMHO, a
friendly eventid value really should not require processing in order
to be added to a URL.

There are two questions I feel. First, can there be a recommendation
as to what a current fdsn event web service should should accept as
the eventid parameter? Second, if there is a revision of the spec,
what should we change to make this easier?

Absent a more specific publicID format, I don't see a good option that
doesn't require almost everyone to make server changes. Perhaps
accepting the full publicID as the eventid, in addition to whatever
the current implementation, is the least bad?

As to the longer term, perhaps adding a "publicid=" parameter to the
fdsn event query is the clearest and most direct solution. But I still
feel that existing publicIDs are too verbose and unfriendly for use in
URLs. Perhaps some of this could be addressed in both in quakeml 2.0
by making the structure of the publicID cleaner or simpler, and by an
explicit mapping from quakeml event parameters to a url?

thanks
Philip

>> > in the QuakeML world, entities (events, origins, picks) are identified
>
>> > through the publicID, and *only*" through the publicID. The publicID has
>
>> > been designed in a way that makes it easy to be globally unique
>> > (authority
>
>> > part, then resource part that is in the hands of the issueing agency
>> > which
>
>> > ensures uniqueness). The "legacy" event IDs that are used in some
>
>> > earthquake catalogs (often just integer numbers) cannot be unique,
>
>> > collisions are likely to occur. When designing QuakeML there was a long
>
>> > discussion whether legacy IDs should be part of the data model, and
>> > there
>
>> > was a consensus that they shouldn't, first because their usage should be
>
>> > discouraged (non-uniqueness, not being future-proof, etc), and there was
>
>> > also no semantically convincing place in the schema to put them.
>
>> >
>
>> >
>
>> >
>

> --
>
> -----------------------------------------------------------------------------


>
> Fabian Euchner phone +41 44 633 7178
>
> Institute of Geophysics fax +41 44 633 1065
>
> ETH Zurich, NO F5 e-mail fab...@sed.ethz.ch
>
> Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
>
> 8092 Zurich (Switzerland)
>
> -----------------------------------------------------------------------------
>
> QuakeML http://quakeml.org QuakePy http://quakepy.org
>
> CSEP http://www.cseptesting.org/centers/eth
>
> -----------------------------------------------------------------------------
>
>
>
>
>

Reply all
Reply to author
Forward
0 new messages