Template for making proposed change to the miniSEED DEADLINE FOR COMMENTS MAY 31, 2016

Tim Ahern

unread,

May 10, 2016, 8:04:06 PM5/10/16

to fdsn-w...@fdsn.org

Comments Must Be Submitted to WG II list (fdsn-w...@lists.fdsn.org) by May 31, 2016

I apologize for the delay in getting this information out to everyone.

I am attaching the following items
1) The agenda for our meeting in Vienna
2) The next generation of miniSEED document - this is the straw man and is currently Version 2016-3-30
3) The rationale for the recommended changes
4) The process proposed at the EGU meeting and it is the process we will follow
5) the Excel template in which you must provide your comments and suggestions
(Note we have added a rationale for each of your comments in this version of the spreadsheet)

To remind you of the process
A. All feedback on this version of the straw man (2016-3-30) must be sent to the FDSN WGII list within 3 weeks
B. An editorial board (Angelo Strollo, Chad Trabant, Reinoud Sleeman, Tim Ahern) will review the submissions and produce a new version of the strawman
C. The new straw man version will be posted again to FDSN WG II.
D. Steps A-B-C repeat as the straw man evolves or until no new comments are received. A maximum of 4 iterations are anticipated.

So please provide your comments in the attached Excel spreadsheet and return to FDSN WG II and also to all recipients of this email. I encourage everyone to make sure they are members of WGII for future comments and contact for the FDSN.

Again my apologies for the delay.
Cheers,

Tim Ahern

Director of Data Services
IRIS

IRIS DMC
1408 NE 45th Street #201
Seattle, WA 98105

(206)547-0393 x118
(206) 547-1093 FAX

1. Agenda-miniSeed-SOH.pdf

2. Next-Generation-miniSEED-Strawman-ver.pdf

3. Rationale - 2016-4-1.pdf

4. TheMiniSEEDProcess.pptx

TemplateForComments-V2.xlsx

Philip Crotwell

unread,

May 11, 2016, 8:42:33 PM5/11/16

to fdsn-w...@fdsn.org

Hi Tim

I was interested in making comments, but am having trouble with the
standardization of the input.

First, because the strawman is a PDF, copying from it and then pasting
into anything loses all separations between words. For example pasting
results in things like this:
Locationidentifier. Usedtoidentifyagroupingofchannels,forexamplefromaspecific

Secondly, and perhaps this is just because I do not use excel, but I
am unable to paste (or input) multiline text into your template and
have the text stay in a single field, ie the new line causes following
lines to fill down the column. Unfortunately I am too old to learn new
skills, so excel remains a mystery to me. :(

I agree that having comments show both existing and proposed language
is a really good idea, but the mechanics feel a bit limiting. It might
be helpful to resend the strawman as plain text for ease of copy
paste? And would it be possible to allow submitting comments in plain
text so long as they follow the structure of the template, ie
something like this?

thanks
Philip

Commenting on document version #:
2016-3-30

Topic:
Do something big or small

Type of Action:
M

Current Wording form document:
Location identifier. Used to identify a grouping of channels, for
example from a specific

New Wording:
Location identifier. Used to identify a big or small grouping of
channels, for example from a specific

Rational:
Big and small is good.

Author:
Philip Crotwell
Univ. of South Carolina
crot...@seis.sc.edu

Date of Comment:
2016-05-11

> ----------------------
> FDSN Working Group II
> (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
>
> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at http://www.fdsn.org/account/profile/
>

Chad Trabant

unread,

May 12, 2016, 7:09:03 PM5/12/16

to fdsn-w...@fdsn.org

Hi all,

To aid in the cut and paste issue that Philip raises attached is a plain text version of the straw man version 2016-3-30 document.

Regarding the entry of long lines of text, the cells are set to wrap lines so you should be able to continue typing as needed. To insert newlines to break paragraphs in the spreadsheet cells, either use Alt+Enter (Windows), Cmd+Option+Enter (Mac) or whatever your platform needs; alternatively you can insert ^P and the editors will understand it to be a paragraph break.

regards,
Chad

Next generation miniSEED
Version 2016-3-30

Background and context
Adopted by the FDSN in 1987, the SEED format has become and still serves as the canonical format for passive source seismic (and other) data. Data exchange, especially to end users, is commonly formatted as â€œfullâ€ SEED, which contains both the time series and complete metadata. For continuous data collection and archiving it is common to split the time series from the metadata. Extensions to the SEED format were adopted in 1992 to define miniSEED, the time series portion of SEED, which can be decoded independently of the supporting metadata.

Many FDSN members recognize that the current two-character network code needs to expand. Such an expansion requires changes in both the metadata and time series components of the SEED format. With the adoption of StationXML by the FDSN, the metadata component is easily adjusted due to the extensibility of XML. The time series component, miniSEED, is a fixed-length field format and expanding the network code would render the format incompatible with the current release. Such a small, but disruptive change affords the opportunity to consider other changes to the format, allowing the FDSN to address historical issues and create a new foundation for current and future use.

miniSEED 3, important changes
* Expand the network code field, in coordination with equivalent StationXML changes.
* Recommendation: 6 characters.
* Suggested convention for temporary deployments: â€œxxxxYYâ€ , where â€˜xâ€™ are unique identifiers and YY
are the last two digits of the start year of the deployment, e.g. 16 for 2016. Temporary network codes
will still begin with the letters X, Y, Z, or a numeral from 0-9.
* Add a miniseed format version.
* Add a data version.
* Move most blockette 100, 1000 & 1001 field information (actual sample rate, byte order, record length, encoding,
microseconds) into the fixed section of the data header.
* Simplify the record start time encoding and include microsecond resolution.
* Combine the 3 bit-flag fields in fixed section data header to a single byte, dropping rarely used flags.
* Eliminate timing correction field, timing corrections must be applied to the time stamp.
* Document forward compatibility mapping, how to convert miniSEED 2.4 to version 3.

miniSEED 3, changes for consideration
* General compression encodings for fundamental sample types and opaque data
Encoding 50: 32-bit integers, general compressor (e.g. Brotli)
Encoding 51: 32-bit IEEE floats, general compressor (e.g. Brotli)
Encoding 52: 64-bit IEEE floats (doubles), general compressor (e.g. Brotli)
Encoding 100: Opaque data
* Add CRC field for validating integrity of data payload.
* Expand channel codes, identify more instrument types and potential combination with location.
* Expand location identifier and disallow empty values (synonymous with all other series identifiers).
* Fixed-point sample encoding; would need to determine a representation due to lack of standard.
* No SEED 2.x blockettes allowed, instead allow opaque headers for arbitrary information.
* Eliminate fixed-length field for sequence numbers. Alternatives: transport protocol or opaque headers.
* Eliminate arbitrary % timing quality field, timing quality related bit flags remain. Further timing
qualifiers can be in an opaque header or separate channel if needed.

Below is a straw man miniSEED Fixed Section Data Header incorporating most of the concepts above.
Considerations and adjustments for byte alignment should be made after the fields have been settled.

Straw man miniSEED 3 Fixed Section Data Header

The data record starts at the first byte. The next two bytes are â€˜MSâ€™ to indicate the format, followed by a single binary digit indicating the format version. The fixed section of the header may be followed by optional, opaque header values. The total length of the record is the length of the fixed section, plus the length of any opaque headers, plus the length of the data payload. No padding is allowed before, after or between any of the sections.

Note Field name Type Length Offset Mask/Flags
1 Record indicator (â€˜MSâ€™) A 2 -
2 miniSEED version (3) B 1 -
3 Network code A ? - [UN]
4 Station code A 5 - [UN]
5 Location identifier A ? - [UN]
6 Channel codes A ? - [UN]
7 Quality indicator A 1 - [UN]
8 Data version B 1 -
9 Record length B 4 -
10 Record start time B 8 -
11 Number of samples B 4 -
12 Sample rate B 4 -
13 CRC-32 of data B 4 -
14 Offset to data B 2 -
15 Flags B 1 -
16 Sample encoding format B 1 -
17 Number of opaque headers that follow B 1 -
18 Opaque header fields V V -

Notes for fields, all fields are mandatory:

1 Data record indicator - â€œMSâ€ .

2 UBYTE: miniSEED header version. Set to 3 for this version.

3 Network code. A code that uniquely identifies the network operator responsible for the data. This identifier is assigned by the FDSN. Left justify and pad with spaces (ASCII 32). Cannot be empty.

4 Station code (see Appendix G). Left justify and pad with spaces (ASCII 32). Cannot be empty.

5 Location identifier. Used to identify a grouping of channels, for example from a specific sensor. Left justify and pad with spaces (ASCII 32). Cannot be empty.

6 Channel codes (see Appendix A). Cannot be empty.

7 Quality indicator. Defined values: D (unknown), R (Raw), Q (Quality controlled), M (merged/modified).

8 UBYTE: Data version. Start with version 1 and increase for later versions.

9 ULONG: The record length in bytes.

10 LONGLONG (64-bit signed integer): Start time of record, time of the first data sample. As a representation of UTC, this value is encoded as the number of microseconds since midnight 1 January 1970 UTC not including leap seconds. This is a microsecond version of Unix/POSIX time as defined by IEEE Std 1003.1, 2013 Edition (POSIX.1-2008). The mapping between separate components of a UTC time (seconds, minutes, hours, etc.) and this representation is documented in Section 4.15 of IEEE Std 1003.1, 2013 Edition, which is then scaled by 1E6 and microseconds are added to result in this representation. This time scale is continuous except for the occurrence of leap seconds, whether this value is a leap second or not is defined by bit 2 of the Flags field. When calculating time within a record, bits 2 and 3 of the Flags field should also be consulted to determine if leap seconds occurred during the record.

11 ULONG: Number of data samples in record.

12 FLOAT: Sample rate encoded in IEEE-754 floating point format. When the value is positive it represents the rate in samples per second, when it is negative it represents the sample period in seconds. Writers should use the negative value sample period notation for rates less than 1 samples per second to retain resolution. Set to 0.0 if no time series data is included or data is opaque.

13 ULONG: CRC-32 value of data as defined and used in RFC 1952 (GZIP format). For non-opaque data this is the CRC value of the decoded data payload. For opaque data it is the CRC of the raw payload. If no data payload or a CRC is not possible, set this value to 0.

14 UWORD: Offset in bytes, relative to the beginning of the record, to the beginning of encoded data. If no data payload, set this value to 0.

15 UBYTE: Flags:
[Bit 0] - Byte order. Set this bit to 0 to indicate least significant byte first (little endian) order and 1 to indicate most significant byte first (big endian) order. This indicates the byte order of binary header and data samples values.
[Bit 1] - The start time occurred during a leap second.
[Bit 2] - A positive leap second occurred during this record.
(same as SEED 2.4 FDSN, field 12, bit 4)
[Bit 3] - A negative leap second occurred during this record.
(same as SEED 2.4 FDSN, field 12, bit 5)
[Bit 4] - Time tag is questionable. (same as SEED 2.4 FSDH, field 14, bit 7)
[Bit 5] - Clock locked. (same as SEED 2.4 FSDH, field 13, bit 5)

16 UBYTE: A code indicating the encoding format. (same as SEED 2.4 Blockette 1000 field 3, with addition of encodings 50, 51, 52 and 100 described above)

17 UBYTE: Total number of opaque header fields that follow the fixed section.

18 VAR: Opaque data header fields. Each opaque header field is a variable length string, terminated by the character â€˜~â€™ (ASCII 126). Each header may contain any data except for the terminating character. It is strongly recommended that opaque headers contain printable text. Example header values (with terminators), for illustration only, no implied usage pattern:
â€œGPS~â€ , â€œTYPE=GPS~â€ , â€œFORMAT=BINEX~â€ , â€œSEQUENCE=12345~â€ , â€œFILENAME=data.bin~â€ ,
â€œFRAGMENT=15/238~â€ , â€œTIMEQUALITY=98%~â€

Philip Crotwell

unread,

May 12, 2016, 9:14:29 PM5/12/16

to fdsn-w...@fdsn.org

Hi

I would recommend caution with this idea. On the surface is appears a
simple solution of limited impact, but I fear it would be a source of
bugs and mistaken attribution for a long time to come. Consider a
"new" miniseed file with network code ABC that is loaded by software
that assumes it is an "old" style file. The data would appear to be
from network BC and because there is no notion of a format version of
miniseed in the header, there is no way for this older software to
notice that something is wrong. At least with a totally new file
format (and I would argue miniseed3 is a new file format, not a simple
revision), there is no expectation that older systems will
successfully read the files, and if they try, bad things will happen
very quickly and very noticeably. With a minor change as you propose,
there would be this expectation of still being able to use older code
without change. And for the most part it is true, older systems would
work, most of the time, except when they don't....and then they would
fail in subtle ways. And therein lies the problem. In the short term
it would be a lower level of pain, but that pain would drag on for
decades. I would much prefer a short term disruption.

Use of extra bytes in the header is fine in the case where older
systems can more or less safely ignore the new information, such as in
the data quality indicator. But I feel that the network code is just
too important for it to be interpreted wrongly.

My $0.02
Philip

On Thu, May 12, 2016 at 10:21 AM, Reinoud Sleeman
<reinoud...@knmi.nl> wrote:
> Hi Tim, all,
>
> thanks for your email and the documents concerning the IRIS proposal for
> changes in the
>
> miniSEED format that was discussed at the EGU 2016 in Vienna.
>
>
>
> The main motivation for the proposal for miniSEED3 (mS3) comes from the need
> to expand
>
> the current two-letter network code, simply because we are running out of
> available (free)
>
> combinations. The proposed solution in mS3 is to expand the network code to
> more (6, or 8)
>
> characters (in particular to be prepared for improved identification of
> temporary networks).
>
> Then, since such a small change would be disruptive, why not consider to
> include other changes
>
> to the format as identified over the last decades
>
>
>
> In my opinion, however, prior to entering the next step in discussing the
> contents of this proposal
>
> is the question whether the FDSN supports a disruption in the format, with
> all implications for acquisition,
>
> operations, software and services, or that we prefer a simple solution (if
> possible) with limited impact.
>
> This discussion did not took place at the EGU, or before through the mailing
> list, but it is extremely
>
> important to get feedback from WGII on this issue before the next round in
> the discussion on the
>
> proposal can take place.
>
>
>
> The question is whether we really need to change the current miniSEED
> format to accommodate
>
> for the required expansion of the network code or that we can find a
> solution within the existing
>
> SEED format. A possible solution is to use the reserved byte in the fixed
> section of the data header and
>
> define this as the third character in the network code. When this field is
> empty the network code has 2
>
> characters as it always has been. This would be a very simple and pragmatic
> solution, with the price
>
> being paid that we will keep alive all other changes that we possibly would
> like to have cleared.
>
>
>
> Both solutions will mark the end of dataless SEED anyway as a 2+ character
> network code will not
>
> fit the Station Blockette (50) and StationXML can be the only format for
> stations with 2+ character
>
> network codes.
>
>
>
> The purpose of this mail is to invite the WGII to provide feedback on the
> above question first, before
>
> the proposed process (with feedback on the straw man) can/may start. I think
> it is important to have a
>
> broad agreement within the FDSN to approve on which step to take in the
> evolution of miniSEED as it may
>
> have a major impact for many of us.
>
>
>
> Looking forward for any feedback (before end of May).
>
>
>
> Best regards,
>
> Reinoud Sleeman
>
> Chair FDSN WGII

Chad Trabant

unread,

May 12, 2016, 9:53:04 PM5/12/16

to fdsn-w...@fdsn.org

Hi Reinoud and others,

Philip makes a very good point, most current miniSEED readers would not recognize any change and would read the incorrect network code leading to network identification confusion.

While a change to 3 character network codes would be easier to adapt to for miniSEED readers (compared to a much bigger change), even that change would require schema and code modifications in lots of systems. Put another way, it sounds simple but a 3 character will trigger significant distribution in equipment and data handling systems (for very limited gain).

Furthermore, I think 3 characters is insufficient to identify temporary networks. Currently temporary networks cannot be unambiguously identified by their code alone, a start year (at minimum) is required to remove ambiguity. Now is our chance to address this very common wrinkle in network identification.

Chad

Chad Trabant

unread,

May 12, 2016, 10:53:51 PM5/12/16

to fdsn-w...@fdsn.org

Hi all,

To aid in the submission of proposed changes, attached is a fillable PDF that can be used as an alternative to the Excel spreadsheet. For each change proposal, copy the file, add a sensible tag to the file name and fill in the boxes.

Submissions may be submitted in either the Excel sheet or this PDF form.

regards,
Chad

> On May 12, 2016, at 9:10 AM, Chad Trabant <ch...@iris.washington.edu> wrote:
>
>
> Hi all,
>
> To aid in the cut and paste issue that Philip raises attached is a plain text version of the straw man version 2016-3-30 document.
>
> Regarding the entry of long lines of text, the cells are set to wrap lines so you should be able to continue typing as needed. To insert newlines to break paragraphs in the spreadsheet cells, either use Alt+Enter (Windows), Cmd+Option+Enter (Mac) or whatever your platform needs; alternatively you can insert ^P and the editors will understand it to be a paragraph break.
>
> regards,
> Chad
>

> <Next_Generation_miniSEED-Strawman-2016-3-30.txt>

Next_Generation_miniSEED_Change_Proposal.pdf

Reinoud Sleeman

unread,

May 13, 2016, 12:19:56 AM5/13/16

to fdsn-w...@fdsn.org

Hi Tim, all,

proposal can take place.

network codes.

Comments Must Be Submitted to WG II list (fdsn-w...@lists.fdsn.org<mailto:fdsn-w...@lists.fdsn.org>) by May 31, 2016

Tim Ahern

unread,

May 18, 2016, 1:39:24 AM5/18/16

to fdsn-w...@fdsn.org

I received a request to extend the period to comment on the miniSeed straw man to June 6 and I granted this extension. Please have your comments in no later than June 6, a one week extension of the deadline.

Cheers

Tim Ahern

Director of Data Services
IRIS

IRIS DMC
1408 NE 45th Street #201
Seattle, WA 98105

(206)547-0393 x118
(206) 547-1093 FAX

> <1. Agenda-miniSeed-SOH.pdf>
> <2. Next-Generation-miniSEED-Strawman-version-2016-3-30.pdf>
> <3. Rationale - 2016-4-1.pdf>
> <4. TheMiniSEEDProcess.pptx>
> <TemplateForComments-V2.xlsx>
>
>

Edelvays N. Spassov

unread,

Jun 1, 2016, 12:34:46 AM6/1/16

to fdsn-w...@fdsn.org

Dear Tim & Chad

You guys do great work. We want to keep supplying the best and most complete content we can to make that possible.
Our general position is that we make a considerable effort to record what happens, as it happens, and we
think this information should be kept together in a form that is documented and published so that the data can be fully
interpreted long after we're gone. The data format should be as simple as it needs to be, and not simpler.

One of the more frequent support questions we have dealt with over the years is related to what customers see as
unexplained effects when they review data in display tools that do not interpret time quality. When you review data
including the status of timing quality, the answer becomes immediately obvious if the time quality is bad. Few understand
this subtlety, and many insist on discarding the time quality information. It's an unfortunate, and unnecessary, mistake to do so.
Recording less information packaged with the data would not be an advance for miniseed 3.

We’ve tried to capture all of our thoughts in the attachment where they differ from the strawman.
Attached please find our document as advised. There are 13 tabs including the Instructions and Example tabs.

Kind regards,

Dr. Edelvays Spassov
Sales Manager
Kinemetrics, Inc
222 Vista Avenue
Pasadena, CA 91107
Phone: 626-795-2220
Fax: 626-795-0868

From: Tim Ahern [mailto:t...@iris.washington.edu]
Sent: Tuesday, May 10, 2016 10:04 AM
To: Bob Hutt; Bruce Beaudoin; Florian Haslinger; Mathias Franke; Pete Davis; Kent Anderson; Bob Woodward; Katrin Hafner; Duk Kee Lee; Justin Sweet; Ogie Kuraica; Seiji Tsuboi; Angelo Strollo; Neil Spriggs; Reinoud Sleeman; Angel Rodriguez; Leonid Zimakov; Brent...@iris.edu<mailto:Brent...@iris.edu>; Edelvays N. Spassov; Branden Christensen; Dieter Stoll; Sébastien Judenherc; Tony Russell; Robert Leugoud; Jiang Li; Lani C. Oncescu; Murray McGowan; Shawn Goessen; Dennis Pumphrey; Suzan Kowalski; Bruce Townsend; Joe Steim; Ian Billings; Claudio Parma; David Wilson; David Easton; Fabian Euchner; Cristian Neague; Klaus Stammler; Jerome Vergne; Catherine Pequegnat; Vincent Douet; Paolo Mazzucchelli; Emanuele Ercoli; Christos Evangelidis; Tatary Dragos; Tim Ahern; Josh Stachnik
Cc: fdsn Group II
Subject: Template for making proposed change to the miniSEED DEADLINE FOR COMMENTS MAY 31, 2016

Comments Must Be Submitted to WG II list (fdsn-w...@lists.fdsn.org<mailto:fdsn-w...@lists.fdsn.org>) by May 31, 2016

TemplateForComments-V3_KMI.xlsx

Doug Neuhauser

unread,

Jun 7, 2016, 8:52:49 AM6/7/16

to fdsn-w...@fdsn.org

Tim,

I found that I was unable to enter and properly save all of the text
for my comments in the PDF file provided by Chad for comments on the
MiniSEED proposal.

Therefore, I am submitting my comments in the attached text files.

- Doug N

--
Doug Neuhauser University of California, Berkeley
do...@seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
Next Generation miniSEED Change Proposal Document version 2016-5-12

Change Description:

Change timestamp from 8 byte longlong with required leapsecond flag to 12 byte
MSEED3 Time Structure which can represent all timestampsto miccrosecond
resolution with properumeric vvalues.

Type of change: Modification

Current wording from document:

Record Start Time B 8

10 LONGLONG (64-bit signed integer): Start time of record, time of the first
data sample. As a representation of UTC, this value is encoded as the number
of microseconds since midnight 1 January 1970 UTC not including leap seconds.
This is a microsecond version of Unix/POSIX time as defined by IEEE Std
1003.1, 2013 Edition (POSIX.1-2008). The mapping between separate components
of a UTC time (seconds, minutes, hours, etc.) and this representation is
documented in Section 4.15 of IEEE Std 1003.1, 2013 Edition, which is then
scaled by 1E6 and microseconds are added to result in this representation.
This time scale is continuous except for the occurrence of leap seconds,
whether this value is a leap second or not is defined by bit 2 of the Flags
field. When calculating time within a record, bits 2 and 3 of the Flags field
should also be consulted to determine if leap seconds occurred during
the record.

[Bit 1] - The start time occurred during a leap second.

Propose new wording:

Record Start Time:
MSEED3 Time Structure (12 bytes)
Year B 2 Range: -32768 to 32767
Day-of-Year B 2 Range: 1-366
Hour B 1 Range: 0-23
Minute B 1 Range: 0-59
Second B 1 Range: 0-60 (including leap second)
Unused B 1 For alignment purposes
Microsecond B 4 Range: 0-999999

10 MSEED3 Time Structure (12 bytes) Start time of record, time of the first
data sample. As a representation of UTC with microsecnd resolution.

Rationale:

The current MSEED 2.4 time structure can be easily extended by 2 bytes
to provide microseconds (0-999999). This provides a continuous time
scale to the microsecond resolution, and does not suffer from the
POSIX IEEE Std 1003.1, 2013 Edition timestamp which does not allow for
leap seconds.

The proposed POSIX-style longlong int timestamp cannot represent a
leapsecond, so the proposed standard requires an additional flag to
indicate that the timstamp is actually during a leapsecond. In
addition, presumably the current optional flags for "record contains a
positive leapsecond" or "record coontains a negative leapsecond" would
now be required flags rather than advisory flags since the timestamp

Proposing a timestamp that appears to represent time as a continuum
where time computation can be performed with simple integer arithmetic
will encourage users and program to ignore leapseconds and therefore
have timing erros of 1 second when working around a leap second .
Given that MSEED is an archive format, we should NOT be promoting a
time representation that is "apparently" continuous but is actually
nor and does not provide an adequate representation of time without
the use of auxiliary bits. The proposed time representation also has
no way to represent 2 consecutive leapseconds should that ever happen.

I cannot support the currently proposed timestamp of longlong int
that required up to 3 leapsecond-related flags.

Author:
Douglas Neuhauser,
UC Berkeley Seismological Laboratory and
Northern California Earthquake Data Center
do...@seismo.berkeley.edu

Date of comment:
2016/06/06
Next Generation miniSEED Change Proposal Document version 2016-5-12

Change Description:

The CRC should represent the encoded data in the MSEED record rather than
the decoded data.

Type of change: Modification

Current wording from document:

13 ULONG: CRC-32 value of data as defined and used in RFC 1952 (GZIP
format). For non-opaque data this is the CRC value of the decoded
data payload. For opaque data it is the CRC of the raw payload. If
no data payload or a CRC is not possible, set this value to 0.

Propose new wording:

13 ULONG: CRC-32 value of data as defined and used in RFC 1952 (GZIP

format). For non-opaque data this is the CRC value of the encoded

data payload. For opaque data it is the CRC of the raw payload. If
no data payload or a CRC is not possible, set this value to 0.

Rationale:

1. It is also desireable to be able to verify the integrity of the
CRC for a MSEED record without having to decode the data.

2. The computation of a CRC of the encoded data in the MSEED record is well
defined, but the computation of a CRC on the decoded data is not.
For example, decoded STEIM1 or STEIM2 data would have a different
byte order on little endian and big-endian systems,
and therefore would have a different CRC.

Author:
Douglas Neuhauser,
UC Berkeley Seismological Laboratory and
Northern California Earthquake Data Center
do...@seismo.berkeley.edu

Date of comment:
2016/06/06

Angelo Strollo

unread,

Jun 14, 2016, 12:08:57 AM6/14/16

to fdsn-w...@fdsn.org

Dear Tim (and WGII members),

as already discussed in individual messages we (ORFEUS/EIDA) have been
collecting during the last 4 weeks comments within our EIDA group (11
European federated data centers) on the mseed3 proposal. In particular
we collected a number of comments to some of the specific points in the
straw man (i.e. #1 Expansion of the net code, #10 CRC field, #12
location identifier, #17 variable record lengths, etc); in parallel we
have been discussing more on the managerial side about the implications
that this important change will have on the operations of our data
centers once this will be approved.

Although we may all agree that the proposed changes are needed we should
also recognize that this is a major change for the seismological
community. In particular, for data centers that should actively engage
in this endeavor, the RFC process appears too fast. Having said that, we
would kindly ask you to allow comments on this first iteration until the
end of June. This will allow us to harmonize the technical comments we
are collecting internally in EIDA as well as discuss further on the
eventual implementation timeline and resources involved.
We collected the comments, initiated the discussion at the Management
meeting last week and we have fixed a dedicated internal technical
discussion for next week.

As mentioned above, considering that this is an important change for all
FDSN data centers, we think that keeping the possibility to post
comments for this first iteration until the end of June should not be a
problem. This will hopefully allow a lively discussion on the mailing
list as well as within the editorial board to ensure that different
opinions are captured and discussed before moving to the next iteration.

We apologize for this late request of deadline extension, in particular
with respect to colleagues that have submitted their comments timely
according to initial deadline.

With Kind Regards,
Angelo Strollo (on behalf of the ORFEUS/EIDA data centers)

> ----------------------
> FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
>
> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at http://www.fdsn.org/account/profile/
>

--
Dr. ANGELO STROLLO
Department 2 Geophysics
Section 2.4 Seismology - GEOFON
Tel.: +49 (0)331/2881285
Mob.: +49 (0)172/8590874
Fax : +49 (0)331/2881277
Email: str...@gfz-potsdam.de
_______________________________________

Helmholtz Centre Potsdam
GFZ German Research Centre For Geosciences
Public Law Foundation State of Brandenburg
Telegrafenberg, 14473 Potsdam
House A3 Room 207
http://geofon.gfz-potsdam.de/

Chad Trabant

unread,

Jun 23, 2016, 2:52:45 AM6/23/16

to fdsn-w...@fdsn.org

Dear WG II,

Attached are 5 proposed changes to the miniSEED 3 straw man from the IRIS DMC.

Chad

Next_Generation_miniSEED-6-char-network-.pdf

Next_Generation_miniSEED-16-bit-record-l.pdf

Next_Generation_miniSEED-change-location.pdf

Next_Generation_miniSEED-length-of-optio.pdf

Next_Generation_miniSEED-optional-header.pdf

Angelo Strollo

unread,

Jul 8, 2016, 5:55:04 PM7/8/16

to fdsn-w...@fdsn.org

Dear Tim, dear WGII Chair and dear WGII Colleagues,

as mentioned in our previous message we have been collecting internally
within the EIDA Management Board a number of comments on the mseed3
proposal, and discussed them carefully within the last weeks through
many meetings and intense internal e-mail exchange.

From our point of view this is a major change in seismology that will
have a notable impact on the data centre operations as well as on the
users for the next decade. Although challenging, probably also
premature, we tried to think about the effects of an implementation of
the proposed changes on the data centre side and related costs as well
as guessing the impact on our user community.

After a long discussion at both technical and managerial level among the
11 federated EIDA data centres in Europe we come to the conclusion that
the proposed change is too “expensive” for data centres as well as for
users with respect to the real benefit that may be derived.

Aside from the pressing issue of the network code limitation, we
consider that the other proposed changes are ‘nice to have’ from the
side of data managers but on balance they are not substantial additions
that warrant the effort to make a major change to the existing
standard. There are indeed other avenues we would like to explore - as
an example, the European community, is currently demanding more
interoperability with communities beyond seismology in order to allow
interdisciplinary and integrated research as well as for better and
modular data models, we would like the next generation of seismic data
format to be more widely adoptable.

We all agree that a new format for seismological data is needed in the
long term, but overall the main problem we have is with the speed
currently proposed for this standardization process, as pointed out also
by the WGII chair in one of the initial comments on this mailing list
already in May. Although we appreciate the initiative and the initial
straw-man we think that the proposed changes are significant enough to
require technological modifications in user software, data centre
practice, and station-side instrumentation software, but will not
substantially future-proof us from further change over the next decade.
As incremental changes damage credibility and possibility of community
uptake we would really like to have enough time to explore carefully
additional proposals that in order to be well thought and tested cannot
fit with the proposed time line.

Therefore, in light of what is written above we would like to propose a
different approach towards a new data format that goes through the
following 4 steps:

1 - Define an interim solution (SEED 2.4+?) allowing additional network
codes using an extra blockette for extended network codes, that is
backward compatible and can be used immediately.
2 - Restart an FDSN-wide process gathering ideas for streaming and
archive data formats, culminating in a dedicated meeting in late 2016.
3 - Distribute proposal(s) to the FDSN members before the Kobe meeting.
4 - Prepare a preliminary implementation plan for approval and adoption
by FDSN after the Kobe meeting possibly in 2018.

In summary, our proposal is to solve the immediate problem of network
code allocation in a pragmatic way without adding compatibility issues,
and in parallel start to work jointly on a well thought and future proof
solution that will bring us towards a new data format for seismology.

More details on our plan are appended to this email.

We apologize for the long e-mail and for not having used the template
for comments.

Looking forward to hearing from you,

The ORFEUS/EIDA* data centres

*http://www.orfeus-eu.org/eida/eida.html

************************

Appendix

1 - Use blockette 1002 to solve the immediate needs of additional
network codes.
We would propose to add another blockette 1002 to extend the network
code as follows.

A typical mini-SEED record looks schematically like this:

GE_WLF__BHZ, 728437, D
start time: 2016,180,00:00:16.599998
number of samples: 428
sample rate factor: 20 (20 samples per second)
sample rate multiplier: 1
activity flags: [00000000] 8 bits
I/O and clock flags: [00000100] 8 bits
[Bit 5] Clock locked
data quality flags: [00000000] 8 bits
number of blockettes: 2
time correction: 0
data offset: 64
first blockette offset: 48
BLOCKETTE 1000: (Data Only SEED)
next blockette: 56
encoding: STEIM 2 Compression (val:11)
byte order: Big endian (val:1)
record length: 512 (val:9)
reserved byte: 0
BLOCKETTE 1001: (Data Extension)
next blockette: 0
timing quality: 100%
micro second: 98
reserved byte: 70
frame count: 7

There is a 48-byte fixed header and a linked list of blockettes. The
header contains pointers to the beginning of data (64) and to the first
blockette. Each blockette has a pointer to the next blockette, which is
0 if no more blockettes follow.

Suppose we add another blockette (1002) and use it to extend the network
code for example “GEMMA” instead of “GE”. Unfortunately there is no free
space, so we have to steal 64 bytes (1 frame) from data.

Now a record would look like this (where “99” is a valid reserved
network code that will be used as an indicator of extended network code
if blockette 1002 is not supported by the reader):

99_WLF__BHZ, 728437, D
start time: 2016,180,00:00:16.599998
number of samples: 367 <= 14 %
less data
sample rate factor: 20 (20 samples per second)
sample rate multiplier: 1
activity flags: [00000000] 8 bits
I/O and clock flags: [00000100] 8 bits
[Bit 5] Clock locked
data quality flags: [00000000] 8 bits
number of blockettes: 3 <= one
blockette added
time correction: 0
data offset: 128 <= data
offset moved by 64 bytes
first blockette offset: 48
BLOCKETTE 1000: (Data Only SEED)
next blockette: 56
encoding: STEIM 2 Compression (val:11)
byte order: Big endian (val:1)
record length: 512 (val:9)
reserved byte: 0
BLOCKETTE 1001: (Data Extension)
next blockette: 64 <= now
pointing to the new blockette
timing quality: 100%
micro second: 98
reserved byte: 70
frame count: 6 <= one
frame less for data
BLOCKETTE 1002: (Data Extension 2)
next blockette: 0
extended network code: GEMMA
...

This approach will ensure 100% backwards compatibility without breaking
any existing miniseed readers that implement SEED 2.4 correctly. Old
readers would simply ignore the blockette. Although this removes 64
bytes from data, we should keep in mind that some additional header
space will be required anyhow with the current IRIS mseed3 proposal. As
this is not finalized and there is no guarantee that mseed3 header will
fit in 64 bytes

This is a “light” change that will allow the FDSN to solve the issue of
the network codes for the moment and give us the possibility to start an
extended discussion on the new data format.

Storage growth would be 16 % (instead of 100 TB, we would need 116 TB),
assuming 512-byte record size. On the positive side, real-time latency
would decrease by the same amount

2 - Having removed the time pressure, start a process where FDSN members
are encouraged to propose future proof ideas that will address mseed
format shortcomings, not only focusing on the offline storage but also
on the real-time, low latency streaming (if indeed they should remain
coupled). We also would prefer a tighter coupling between the new
waveform format and the current metadata standard, stationXML. We
propose to kick-off this process in a dedicated meeting that can be
organized at AGU in December or preferably at a dedicated meeting that
we can host in Europe between September and November this year. During
our discussion in Europe on the new data format we are considering
something similar to video streaming formats and/or OGC standards
(http://www.opengeospatial.org/standards). We are ready to commit some
resources on this project in order to get an initial proposal ready to
be discussed in a dedicated meeting later this year. This approach will
have the following advantages:

- No introduction of a new, incompatible (likely short-lived) format
without providing major technical progress;
- follow a long term strategy capable to cover future needs;
- better coordination and separation of data and metadata - eg remove
information from the data header that is already described in the
stationXML, making simpler data format;
- address shortcomings with realtime data streaming;
- gain time and freedom to think broader, including the adoption of
generic data standards with a scope beyond just seismology.

3 - Send the proposal(s) to the FDSN members before the IASPEI/Kobe
meeting for comments and discuss the proposal(s), timeline for
implementation and implications at the meeting and afterwards if
necessary. Of course, in order to demonstrate the feasibility, some work
should be done in designing some prototypes ideally before the Kobe
meeting. Feedback from End Users and Instrumentation manufacturers
should be consistently sought by advertising with FDSN-level
presentations at AGU and EGU in advance of Kobe.

4 - Prepare a preliminary implementation plan, accordingly define the
deadlines for approval and adoption at the FDSN that should go beyond
the Kobe meeting possibly in 2018

************************

Reply all

Reply to author

Forward