Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective over
decades because, among other reasons, SEED format has been mostly static.
There are places in SEED into which everything, albeit awkwardly in some
cases, has to fit. This creates a format that everybody may grouse about
equally, but lives within. As a result, there has been a remarkable level of
data sharing across networks. We were one of the early participants in the
design of miniSEED, and as a manufacturer, we have supplied equipment
embracing the advantages of a documented, common, and efficient format. It
has been gratifying to see seismology benefit so greatly over recent years,
helped along by the ability to share high-quality data. After such a long,
successful run, a few of the format’s capabilities need refreshing, but the
design remains sound.
There appear to be two main independent objectives in the present drive to
update miniSEED:
1. Extend representations of certain format elements, such as network
and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined entities in
favor of sanitizing the information permanently retained to fit an idealized
rendition of the data recorded by field equipment. As a by-product, the
extensible, documented “blockette” system would be replaced by “opaque”
data.
Point 1 can be argued is clearly needed, although whether it is necessary to
do a wholesale rewrite of MSEED handling software worldwide to accomplish
this goal is a worthy topic of discussion. These goals could be accommodated
within the existing format, for example, by definition of new blockettes to
contain the extended identifiers. For example, reserved values could be used
for the existing network and location codes to indicate the presence of
extended identifiers. Such an approach would be forward and backward
compatible, and impose minimal changes on existing global infrastructure. I
understand some FDSN members have voiced a similar opinion that minimally
invasive changes could be developed that would address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an
archival format, as much information as possible about the recording
environment and the equipment should be maintained – and documented, not
filtered out - for potential use decades from now. Some of the proposals, in
the spirit of extensibility, propose moving some information that is now
fully enumerated in the published SEED format specification into opaque
headers - what might be called the information “gray market”.
The objective of Point 2 is essentially to strip the published format down
to some clean bones, and neither mandate nor even define data structures
that may be pertinent to only one class of equipment in the format’s
definition. This is a nice idea from a data center’s view, since all the
burden of interpreting any information that might have its formal
specification decommissioned would be pushed onto the user. It’s a bad idea
from the point of view of future integrity and maximum usefulness of the
archive, since “opaque” data is likely to be undocumented, poorly
documented, or even omitted altogether as data are passed from archive to
archive over time. A diversity of information should be supported, and
defined in the archival format. The solution to managing information that
may be important to interpretation or future harvesting is not to eliminate
the information, but to document it. For an analog, imagine WWSSN
seismograms that have no writing on the back. Some of the comments in email
threads appear to agree with the point that more information pertinent to
the recording environment, not less, is better in an archival format.
Of course changing the format in a non-backward-compatible way, as proposed
in the changes driven by Point 2, does risk blowing up a lot of things that
work now. Is it worth it? Ultimately all format definitions are arbitrary.
Much of what is being proposed is effectively an arbitrary rearrangement. If
this were 1988, the cost would be minimal. Now, frankly, to arbitrarily
change fundamental aspects of the design of what has been one of the most
successful collaborative undertakings in earthquake seismology seems at
least unnecessary, if not a wholly unproductive use of resources. Everyone’s
infrastructure will not be simplified, but complicated by the major
bifurcation in the format used to exchange data worldwide. Every tool will
have to support not one, but both formats. This will not necessarily make
things better, but it will make work. A measured approach to solve the
actual problems, such as inadequate namespaces for certain format elements,
might address the task in a simple, direct, and efficient way that does not
create an enduring burden.
In a spirit of collaboration, we have responded to a number of the proposed
specific points in the relevant email threads. In general, however, we are
opposed to a redesign that would result in non-backward compatibility. We
would support a working group, and would be happy to serve, to develop an
approach to incorporate necessary changes, while retaining as much backward
compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
st...@quanterra.com <mailto:st...@quanterra.com>
Thank you for your thoughtful comments and for your participation in the straw man based process so far. The feedback from equipment manufacturers and other data producers is very valuable.
Your Point 2 about dropping information, reforming it and relegating it to a so-called "gray market" is important. Swayed by your previous feedback and change proposals we at the IRIS DMC created our own change proposal (#15) to transform the opaque data headers to optional data headers, where some are defined and some not. The intention is to provide a mechanism to be used for both FDSN defined flags/information and allow undefined flags to be inserted efficiently; in the end more capable than blockettes and addressing the opaque/gray market concern. This is the intended process with the straw man model, and demonstrates that it is working. If an interested party would like to retain the blockette structure, that too could be proposed and discussed.
The straw man should not be judged as a completed definition, it has not undergone even the first revision. I agree that arbitrary changes that deviate from past practices should be scrutinized carefully, and minimized, unless there are compelling reasons to change. Perhaps, for some, the initial straw man included too many changes from current usage, but I am optimistic that it will move in the direction of consensus. Trying to answer the overarching question of whether the end result would be worth the change cost is premature. We do not know what it is yet. We understand your position to be that any change that is not backwards compatible is not worth it, which is perfectly valid and will continue to influenced the process.
The discussion so far has already gathered more input and thoughts, across a broad audience, on the future of miniSEED than I have ever experienced in FDSN exchanges. This has been mostly constructive, valuable, and will inform any future discussions.
Best regards,
Chad
EIDA Data Centers welcome and support the recent letter by J. Steim,
coherent with our message posted on on July 8th
(http://www.fdsn.org/message-center/thread/413/#m-659).
As we have mentioned and detailed in various emails to this list, we
also are deeply concerned about the potential disruption and
deterioration of services to users due to a non-backwards compatibility
and many changes. Therefore, our position remains that in order to
optimize the process and usage of resources, before getting into the
single items of the proposal we should get a better understanding of
what we really need and want from an extension to existing SEED, how
this can be designed, which are the expected rollout plans and what will
be the implications for all users. Without having this clearly laid down
it is difficult to understand and evaluate if the changes we are
proposing are worth the efforts they will imply throughout our community.
We appreciate the support by the FDSN Chair for a meeting in late 2016
and this is also clear from the ongoing discussion. The aim should be
not to discuss two alternative proposals but rather to discuss how we
can reach the goal of maintaining a widely accepted format by addressing
as far as possible shortcomings of the current mini-SEED format, will be
supported by a wide section of the community, and be actively embraced
by data centers and end users. The meeting we proposed should include an
extensive discussion on what we really need from an extended or new
format and how we get there with a commonly agreed strategy.
As stated in the initial strawman the main driving motivation behind
this effort is the need to expand the network code to satisfy the always
growing number of demands: “Many FDSN members recognize that the current
two-character network code needs to expand. The miniSEED format is a
fixed length format and expanding the network code would render the
format incompatible with the current release. Such a small, but
disruptive change affords the opportunity to consider other changes to
the format, allowing the FDSN to address historical issues and create a
new foundation for current and future use.”
Therefore we proposed a pragmatic way to immediately solve this issue
with a cost effective solution. Still our proposal can accommodate a
number of other issues mentioned in the strawman as listed at the bottom
of the present e-mail [1].
Before moving forward with this process and iterations we would like to
invite everybody to carefully think about the general purpose of the
changes without being biased by the technical comments or change
proposals on the strawman. This can be done by setting up a dedicated
Working Group (as suggested by J. Steim) or in a dedicated meeting as we
proposed earlier. Indeed the dedicated meeting can be the fundamental
planning forum for this Working Group. In both cases the EIDA member
institutions are ready to actively contribute.
ORFEUS is ready to organize the meeting in Europe (possible location and
and date will be communicated later) and travel costs for up to 5 or 6
participants from other continents can be covered/sponsored by ORFEUS or
by the hosting Institute in Europe. A tentative agenda can be posted
here and discussed within the next days. The intention is not to have
two competing proposals, but to discuss and agree jointly the pathway to
the adoption and rollout of an extended or new standard that should not
be driven only by the urgent need for additional network codes.
Regards,
The ORFEUS/EIDA data centres
http://www.orfeus-eu.org/data/eida/nodes/
[1]
1. Expand the network code.
MS 2.5: Include expanded network code in b1002. Replace network code
in fixed header by "99" or another reserved code.
2. Add a miniSEED version field.
MS 2.5: Probably not needed, but can be included in b1002.
3. Add a data version field.
MS 2.5: Include data version field in b1002.
4. Move important Blockette details into fixed section of the header.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept.
5. Simplify & improve the record start time.
MS 2.5: Not applicable, MS 2.4 time structure will be kept (millisecond
resolution is already supported by blockette 1001).
6. Combine and drop bit flags.
MS 2.5: Not applicable, MS 2.4 bit flags will be kept.
7. Eliminate the time correction field.
MS 2.5: Not applicable, MS 2.4 time correction field will be kept.
8. Forward compatibility mapping.
MS 2.5: Trivial -- since MS 2.5 is a superset of MS 2.4, any MS 2.4 file
is also an MS 2.5 file.
9. General compression and opaque data encodings.
MS 2.5: In MS 2.4, encodings 1..5 (general), 10..18 (FDSN networks) and
30..33 (older networks) are defined. Proposed new encodings 50, 51, 52
and 100 can be added, but should be used only in special cases when
compatibility is not an issue.
10. Add CRC field for validating integrity.
MS 2.5: Include CRC field in b1002. CRC should be calculated over the
entire record, with the CRC bytes assumed to be zero for purposes of the
calculation.
11. Expand the channel codes.
MS 2.5: Include expanded channel code in b1002. Replace channel code in
fixed header by a reserved value.
12. Expand the location identifier.
MS 2.5: Include expanded location identifier in b1002. Replace location
identifier in fixed header by a reserved value.
13. Fixed-point data sample encoding.
MS 2.5: See 9.
14. No SEED 2.4 blockettes, include support for opaque headers.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept. Opaque headers,
though already supported by b2000, could be added to b1002 as well.
15. Eliminate sequence numbers.
MS 2.5: Not applicable, sequence numbers will be kept.
16. Eliminate the timing quality field.
MS 2.5: Not applicable, the timing quality field will be kept.
17. Variable record lengths.
MS 2.5: Not applicable. This is the only addition of MS3 that cannot be
implemented in MS 2.5. On the other hand, the proposal of variable
length records is rather controversial anyway and there are voices
against it.
--
Dr. ANGELO STROLLO
Department 2 Geophysics
Section 2.4 Seismology - GEOFON
Tel.: +49 (0)331/2881285
Mob.: +49 (0)172/8590874
Fax : +49 (0)331/2881277
Email: str...@gfz-potsdam.de
_______________________________________
Helmholtz Centre Potsdam
GFZ German Research Centre For Geosciences
Public Law Foundation State of Brandenburg
Telegrafenberg, 14473 Potsdam
House A3 Room 207
http://geofon.gfz-potsdam.de/
Thanks for including me on these e-mail exchanges. As a long time producer
and intense user of SEED data, and as having participated in the
discussions that led to the birth of SEED/mini-SEED as a global data
exchange format for broadband seismology within the FDSN 30 years ago, I
recognize that SEED may seem very clumsy given the evolution of computer
languages and in particular object-oriented coding, as well as other
shortcomings such as the two letter network code limitation. Some
improvements/enhancements should certainly be considered.
When SEED was designed, the focus was to standardize a format that would
contain all the necessary and accurate information to fully understand the
data, for the benefit of high quality science. It went along with effort at
developing standards for the quality of the broadband instrumentation,
which are also still relevant.
If the format under discussion was for a completely new type of data
acquired for different purposes than the original purpose of SEED, then
only would it be justified to "start from scratch". Why break it if it
works so well?
I particularly wish to support two of the points made by Joe Steim.
1- Because it is now so widely and effectively used, and serves the purpose
of the users that depend on the data for their research, any changes going
forward MUST be backward compatible. Otherwise, this will create havoc in
the user community, that could halt progress in funded science by several
months if not a year for a large international community, for no compelling
reason. That translates into a huge amount of unnecessary frustration, as
well as substantial financial costs.
2- I am really alarmed at any suggestion of reducing the amount of
information to be included in the metadata. Time and again, someone
discovers a problem with some older data, that can only be understood if
one digs deep into the metadata, and if the information is there, the data
are still useful. Should we then be throwing away such data, that may have
great value because they uniquely correspond to some original
source-station path, or an event that was not previously considered
"interesting"?
Surely, there must be ways to address some of the shortcomings of SEED
without discarding information, and in a backward compatible way!
Regards
Barbara Romanowicz
Thanks to everyone that provided feedback regarding a new version of miniSeed. We think this was very valuable and will help inform any process moving forward. This feedback included both support and resistance to the concept. Due to the lack of support for the current approach, we are not going to continue our current approach. We do believe strongly that the current miniSeed needs to be looked at closely so that we can continue making it a viable format as we move forward. The current version has many weaknesses that need addressing directly and not a workaround. We encourage WG II to consider an alternative to what we have been promoting for discussion at the next FDSN meetings in Kobe.
Cheers and thanks
Tim Ahern
Director of Data Services
IRIS
IRIS DMC
1408 NE 45th Street #201
Seattle, WA 98105