Next generation miniSEED - 2016-3-30 straw man change proposal 12 - Reduce record length field from 4 bytes to 2 bytes

Chad Trabant

unread,

Aug 11, 2016, 10:58:23 PM8/11/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi all,

Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached: Reduce record length field from 4 bytes to 2 bytes.

Please use this thread to provide your feedback on this proposal by Wednesday August 24th.

thanks,
Chad

12-IRISDMC-16-bit-record-lengths.pdf

Philip Crotwell

unread,

Aug 12, 2016, 5:48:42 AM8/12/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi

I think there should be a separation from what a datacenter permits in
its ingestion systems and what is allowed in the file format. I have
no problem with a datacenter saying "we only take records less than X
bytes" and it probably also makes sense for datacenters to give out
only small sized records. However, there is an advantage for client
software to be able to save a single continuous timespan of data as a
single array of floats, and 65k is kind of small for that. I know
there is an argument that miniseed is not for post processing, but
that seems to me to be a poor reason as it can handle it and it is
really nice to be able to save without switching file formats just
because you have done some processing. And for the most part,
processing means to take records that are continuous and turn them
into a single big float array, do something, and then save the array
out. Having to undo that combining process just to be able to save in
the file format is not ideal. And keep in mind that if some of the
other changes, like network code length, happen, the existing post
processing file formats like SAC will no longer be capable of holding
new data.

And in this case, the save would likely not compress the data, nor
would it need to do the CRC. I would also observe that the current
miniseed allows records of up to 2 to the 256 power, and datacenters
have not been swamped by huge records.

It is true that big records are bad in certain cases, but that doesn't
mean that they are bad in all cases. I feel the file format should not
be designed to prevent those other uses. The extra 2 bytes of storage
to allow up to 4Gb records seems well worth it to me.

thanks
Philip

> ----------------------
> Posted to multiple topics:
> FDSN Working Group II
> (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
> FDSN Working Group III
> (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
>
> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at http://www.fdsn.org/account/profile/
>

David Ketchum

unread,

Aug 17, 2016, 10:17:57 PM8/17/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi,

My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.

I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.

Dave

Andres Heinloo

unread,

Aug 18, 2016, 8:05:09 AM8/18/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

On 08/17/2016 08:18 PM, David Ketchum wrote:
> Hi,
>
> My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
>
> I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.

One alternative, which would be better suited for real-time, would be
using fixed-size "frames" instead of records. Think of a record
consisting of header frame followed by a variable number of data frames.
A frame might include timecode (sequence no.), channel index (for
multiplexing) and possibly CRC. Due to fixed size, finding the start of
a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
(header + 7 data frames), latency would be 7 times smaller, because each
data frame could be sent separately. And by using more data frames one
could reduce overall bandwidth without increasing latency.

Transmitting data in 64-byte chunks was already attempted with mseed
2.4, but unfortunately the total number of samples and the last sample
value must be sent before any data. In the new format I would put such
values, if needed, into a "summary" frame that would be sent after data
frames.

Regards,
Andres.

Chad Trabant

unread,

Aug 19, 2016, 9:57:20 AM8/19/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.

Chad

Philip Crotwell

unread,

Aug 19, 2016, 8:36:47 PM8/19/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Just want to point out that a new blockette with extended network code
is NOT backwards compatible. Old software that does not recognize the
new blockette (and therefore likely ignores it) will report it
successfully read the data, but will attribute new data records to the
wrong network. It may appear that this is a lower cost, however this
would generate a new class of bugs that would likely be subtle and
would persist for decades to come. There is pain in both ways, but I
would much prefer a system that fails obviously when it fails to one
that seems to work but actually is wrong infrequently and in a way
that is hard to notice.

A failure that looks like a failure gets fixed quickly, a failure that
looks like a success can easily persist for a long time, causing much
more damage in the long run.

Philip

On Fri, Aug 19, 2016 at 9:46 AM, Joachim Saul <sa...@gfz-potsdam.de> wrote:

> Chad Trabant schrieb am 19.08.2016 um 08:58:
>> The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.
>

> Thank you, Chad, for addressing an important point: the costs of the new
> format!
>
> Do you have a rough idea about what the costs of the transition to an
> incompatible new data format would be? Reading this discussion one might
> get the impression that the transition would be a piece of cake. A
> version change, a few modified headers, an extended network code plus a
> few other improvements like microsecond time resolution. Hitherto
> stubborn network operators will be forced not to use empty location
> codes. But all these benefits will come with a price tag because of the
> incompatibility of the new format with MiniSEED.
>
> So what will be the cost of the transition? Who will pay the bill? Will
> the costs be spread across the community or will the data centers have
> to cover the costs alone?
>
> There are quite a few tasks ahead of "us". "Us" means a whole community
> of data providers, data management centers, data users, software
> developers, hardware manufacturers. World-wide! I.e., everyone who is
> now working with MiniSEED and has got used to it. Everyone!
>
> Tasks will include:
>
> * Recoding of entire data archives
>
> * Software updates. In some cases redesign will be necessary, while
> legacy software will just cease to work with the new format.
>
> * Migrate data streaming and exchange between institutions world-wide.
> It is easy to foresee that real-time data exchange, which was pretty
> hard to establish in the first place with many partners world-wide, will
> be heavily affected by migrating to the new format.
>
> * Request tools: will there be a deadline like "by August 1st, 2017,
> 00:00:00 UTC" all fdsnws's have to support to the new format? Or will
> there be a transition? If so, how will this be organized? Either access
> to two archives (for each format) will be required or the fdsnws's will
> have to be enabled to deliver both formats by conversion on the fly?
>
> * Hardware manufacturers will have to support the new format.
>
> * Station network operators will have to bear the costs of adopting the
> new format even though it may not yield any benefit to them.
>
> I could probably add more items to this list but thinking of the above
> tasks causes me enough headaches already. That's the reason why I am
> publicly raising the cost question now because the proponents of the new
> format must have been thinking about this and probably have some idea
> about how costly the transition would be.
>
> Speaking of costs I would like to remind you of the alternative proposal
> presented on July 8th by Angelo Strollo on behalf of the major European
> data centers. They propose to simply introduce a new blockette 1002 to
> accommodate longer network codes but with enough space for additional
> attributes such as extended location id's etc. This light-weight
> solution is backward compatible with the existing MiniSEED. It is
> therefore the least disruptive solution and minimizes the costs of the
> transition.
>
> Regards
> Joachim

Andres Heinloo

unread,

Aug 20, 2016, 12:54:27 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

All existing software would require significant modifications even with
the current straw man (especially if variable length records are
allowed). SeedLink, Web Services, all user software. The overall cost of
the transition would be huge.

If we want to design a format for the next 30 years, we should not
restrict ourselves with limitations imposed by the current miniSEED
format. On the other hand, if compatibility with the current miniSEED
format is desired, just add another blockette to miniSEED 2.x (as
suggested by Angelo Strollo earlier) and that's it.

Back to the idea of "frames" -- indeed, some info that is needed for
real-time transfer could be stripped in offline format. If records could
be easily converted to frames and vice versa, it would be great.
Currently the main problem is forward references (number of samples,
detection flags, anything that refers to data that is not yet known when
sending the header), so we need a "footer" in addition to header.

Regards,
Andres.

Joachim Saul

unread,

Aug 20, 2016, 1:45:47 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Chad Trabant schrieb am 19.08.2016 um 08:58:

> The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.

Thank you, Chad, for addressing an important point: the costs of the new

Chad Trabant

unread,

Aug 20, 2016, 2:38:56 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi Dave,

> I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.

Can you share some of the other reasons?

I get the rapid access reasoning I think. As I've heard it described where one makes some educated guesses about where the data are in a file and skips around until you zero-in on the correct record(s).

The notion of a variable record length has been raised a number of times in the past, we finally added it to the straw man for these reasons:
a) In many ways it is a better fit for real time streams. No more waiting to "fill a record" or transmitting unfilled records, latency is much more controllable without waste. Also, data are usually generated at a regular rate, if one would like to package and transmit them at a regular rate with compression the output size is not readily predictable.

b) Adjustments to records such as adding optional headers become much easier. In 2.x miniSEED if you wanted to, for example, add a blockette but there is not enough room you are stuck with re-encoding the data into unfilled records or reprocessing a lot of data to pack it efficiently.

I'm on the fence with this one and would appreciate hearing about any other pros and cons regarding variable versus fixed record lengths.

thanks,
Chad

Chad Trabant

unread,

Aug 20, 2016, 3:00:50 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

> On Aug 19, 2016, at 8:15 AM, and...@gfz-potsdam.de wrote:

>
> On 08/19/2016 04:37 PM, Philip Crotwell wrote:
>> Just want to point out that a new blockette with extended network code
>> is NOT backwards compatible. Old software that does not recognize the
>> new blockette (and therefore likely ignores it) will report it
>> successfully read the data, but will attribute new data records to the
>> wrong network. It may appear that this is a lower cost, however this
>> would generate a new class of bugs that would likely be subtle and
>> would persist for decades to come. There is pain in both ways, but I
>> would much prefer a system that fails obviously when it fails to one
>> that seems to work but actually is wrong infrequently and in a way
>> that is hard to notice.
>>
>> A failure that looks like a failure gets fixed quickly, a failure that
>> looks like a success can easily persist for a long time, causing much
>> more damage in the long run.
>

> A special 2-letter network code can be reserved. AFAIK there are even
> some obvious network codes, such as "99" or "XX" that have never been
> used. If data records are attributed to network "99", it is quite
> obvious what is going on. Yet, if I use my old PQLX to quickly look at
> the data, I don't care about the network code.
>
> Wasn't the network code added in SEED 2.3 in the first place? Any issues
> known?

I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications. You can easily imagine older data converters being used for a long time and the expanded network code going missing right away. I predict it wouldn't take very long before network 99 shows up in publications.

I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.

As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.

Furthermore, even this small update would require modifications to all software chains, from data generations to data centers to users, along with database schemas, protocols, etc., etc. That is a huge amount of work for such a small change. If we are going to go through all of that we should at least fix some of the other issues with miniSEED. And now we are back at the beginning of this conversation that started in ~2013.

Chad

Andres Heinloo

unread,

Aug 20, 2016, 3:14:58 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

On 08/19/2016 04:37 PM, Philip Crotwell wrote:

> Just want to point out that a new blockette with extended network code
> is NOT backwards compatible. Old software that does not recognize the
> new blockette (and therefore likely ignores it) will report it
> successfully read the data, but will attribute new data records to the
> wrong network. It may appear that this is a lower cost, however this
> would generate a new class of bugs that would likely be subtle and
> would persist for decades to come. There is pain in both ways, but I
> would much prefer a system that fails obviously when it fails to one
> that seems to work but actually is wrong infrequently and in a way
> that is hard to notice.
>
> A failure that looks like a failure gets fixed quickly, a failure that
> looks like a success can easily persist for a long time, causing much
> more damage in the long run.

A special 2-letter network code can be reserved. AFAIK there are even

some obvious network codes, such as "99" or "XX" that have never been
used. If data records are attributed to network "99", it is quite
obvious what is going on. Yet, if I use my old PQLX to quickly look at
the data, I don't care about the network code.

Wasn't the network code added in SEED 2.3 in the first place? Any issues
known?

Regards,
Andres.

Joachim Saul

unread,

Aug 20, 2016, 3:31:51 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Philip Crotwell wrote on 19.08.2016 16:37:
> Just want to point out that a new blockette with extended network code
> is NOT backwards compatible.

As I wrote before, it *is* backward compatible with the *existing* MiniSEED, which is *all* MiniSEED currently existing in *all* archives. I didn't write "blockette-1002 MiniSEED", because it is obvious that attributes specific to blockette 1002 need to be retrieved from there.

The *only* compromise w.r.t. backward compatibility occurs if a blockette-1002 unaware software reads blockette-1002 MiniSEED. That is the price tag of the alternative solution. A minimal cost compared to, e.g., the recoding of entire data archives and disruption of complex data infrastructures. And as soon as that previously blockette-1002 unaware software is linked against an updated libmseed or qlib, the problem is gone anyway. In fact, for many data centers and infrastructures, the cost will be close to zero in practice.

Actually an updated libmseed or qlib would be made available long before the first blockette-1002 MiniSEED data actually start circulating publicly. Therefore all actively maintained software can be made 1002-ready well in advance.

Regards
Joachim

David Ketchum

unread,

Aug 20, 2016, 8:12:04 PM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Chad,

For instance the Edge/CWB software takes advantage of fixed length or at least power of two to store all channels of miniSEED in one files where regions of the file are reserved for each channel (generally 64 512 byte blocks). It can accommodate the 2.4 miniSEED of any size, but the blocks fit nicely into their extents and the indexing to find a channel and time range uses an index which only has to index the extents. This powerfully speeds up access to queried data. I know a lot of the work uses “file per channel per day”, but we found that pretty inefficient. I know that many use the binary search method you mentioned, which also works better on fixed length blocks.

I do not particularly think the miniSEED is a very good choice for telemetry when short latency is desired - like for earthquake early warning. The fixed part of the header is so big relative to the pay load that it is not bandwidth efficient. If variable length records are desired for this, I think the alternative of using another telemetry format that is more efficient should win out. Note the current Q330 one second packets are not in miniSEED form but they are fairly efficient and variable length. The receiving software takes this format an generates miniSEED. The one second packets are available for the EEW and the miniSEED is generated for later use and archival. My take is that miniSEED 3 should not try to be a telemetry format as it would be a bad one - it is a standard format used at datacenter and after the realtime processing is done. Further the telemetry format is a competitive function best left to the digitizer vendors. We should insist their telemetry format make good MiniSEED 3 including all of the mandatory and optional flags etc, but how they achieve that should be left to them.

Dave

Jean-Marie Saurel

unread,

Aug 24, 2016, 6:30:08 AM8/24/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hello,

I've read most of the topics and I'm inclined to follow Joachim and
Andres comments about the change cost, specially regarding the network
operators and people who use data for operational stuffs (as David
pointed also).
I'm not worried for PhD students and those who work on off-line data.

And I understand the concerns about having a reserved network code like
99 being incorrectly used in publication because of legacy software not
reading the extended one.

So, all this preamble to simply ask the following question.
Will the extended network code be reserved for temporary networks ?
Or will it also be available for new permanent networks as soon as
adopted ?

I ask this because if it's only for temporary networks, then we can
have more time to migrate all operationnal stuff.
On the contrary, if it's also available for permanent networks, we very
soon would see new permanent stations not being used by most of
operational entities (I'm thinking right now about Tsunami Service
Providers, global location and/or CMT providers) because their software
doesn't support more than two letter network codes.
Think that some Earthworm modules have been made location code
compatible only one or two years ago and some of their users have not
migrated to use those new modules !!

Regards.

Jean-Marie SAUREL.

> ----------------------
> Posted to multiple topics:
> FDSN Working Group II
> (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
> FDSN Working Group III
> (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
>
> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at
> http://www.fdsn.org/account/profile/

--
--------------------------------------
ICG-CARIBE EWS WG1 chair
Institut de Physique du Globe de Paris
Observatoire Volcanologique et Sismologique
1 rue Jussieu
75005 Paris

Chad Trabant

unread,

Aug 24, 2016, 6:40:44 AM8/24/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi Andres,

You have a point that we should not be limiting our thinking. I do think there is a sweet spot in the balance between a small patch on the current miniSEED (in particular one that could very detrimental to data identification) and something radically different. The very first straw man was created with that particular balance in mind as a place to start discussion, with the full expectation that it would evolve. My feeling is that non-independent records, ala headers plus frames transmitted independently, is a more radical change than anything in the straw man from the perspective of code reading the data.

As for the concept of "just" adding a blockette to extend the network code, all of software you mentioned (SeedLink, Web Services, all user software) in addition to data center schemas, data center software and, very importantly, data generation systems would need to be updated in order to not lose network identifiers. The libraries that do this parsing at the data center and user levels are the easy part, pushing updates out to all the places that use them will simply take a lot of time. As D. Ketchum wrote, updates will not be overnight. You can easily imagine there will be old versions of slink2ew, slink2ew, chain_plugin and many, many more pieces of middle-ware running for a very long time. In some cases they will be transforming the data from miniSEED to something else and silently stripping the network identifiers out. In other cases the new blockette(s) may be retained but all miniSEED3 data will need to be referred to as network "99" (or whatever) because the old system doesn't know any better. The overall cost of this transition would be huge, even for just adding a blockette.

Surely we can address other fundamental issues such as record byte order identification, which cannot be fixed with a simple blockette, if we are going to effectively go through a full software stack update. Much of the same planning, such as getting systems and software updated well before any new style data flows, is similar.

> Back to the idea of "frames" -- indeed, some info that is needed for
> real-time transfer could be stripped in offline format. If records could
> be easily converted to frames and vice versa, it would be great.
> Currently the main problem is forward references (number of samples,
> detection flags, anything that refers to data that is not yet known when
> sending the header), so we need a "footer" in addition to header.

A footer would work. Alternatively, the "micro" header on each frame could contain: start time of primary header (for sequencing), the starttime of the first sample in the frame, the number of samples in the frame and any optional headers relevant for the frame (detection). Reassembly to a full record would require summing up the sample counts, combining the optional headers and stripping the micro/frame headers. Some care would be needed with details. If we created such a telemetry framing for otherwise complete "next generation" miniSEED it would have the advantage of limiting the telemetry complexity to those systems that need it, allowing some degree of separation between the use cases of telemetry, archiving, etc. It's certainly an intriguing line of thought.

regards,
Chad

> Regards,
> Andres.

>
> ----------------------
> Posted to multiple topics:

Chad Trabant

unread,

Aug 24, 2016, 7:27:33 AM8/24/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi Joachim,

At the IRIS DMC we have thought quite a bit about the costs of a transition to a newer generation of miniSEED. In many respects I think the DMC has more at stake in terms of operational change than any other single group in the FDSN. This is a discussion intended to develop a proposal for the FDSN to consider in 2017, only after which an adoption plan can be finalized. Personally, depending on the transition discussions, I would be surprised if we have much traction on adoption by 2018, it could easily be longer.

The transition of SEED data to use new identifiers as outlined in the alternative proposal presented on July 8th by Angelo Strollo would also require most of the same data systems (data producers, middle-ware, data centers, user software) to be updated, which would take a long time. Also, until such time that most software has been updated we risk losing any extended network identifiers. The implication that we would simply add a new blockette, update a few libraries and the transition is over seems very unrealistic to me. Furthermore, that is a lot of cost to address a single issue in SEED.

Chad

Philip Crotwell

unread,

Aug 24, 2016, 7:54:40 PM8/24/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi

Just like to point out that merely upgrading a library, like libmseed,
to parse a new blockette does not make suddenly make older software
compatible with a longer network code. If the software itself is not
also upgraded to use the information in the new blockette then the new
information is effectively ignored. I feel that this idea that there
is a non-disruptive, easy "fix" to expanding the network code is
unrealistic.

Philip

On Wed, Aug 24, 2016 at 9:28 AM, Joachim Saul <sa...@gfz-potsdam.de> wrote:

> Chad Trabant wrote on 20.08.2016 02:01:
>> I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.
>

> Hallo Chad,
>
> what would be "a very long time"?
>
> First of all note that most of the current infrastructures world-wide
> will not be affected by the blockette-1002 extension at all. The reason
> for this is that most institutions will simply not produce any data with
> 1002 blockettes because they don't need the extended attributes. They
> will continue to produce and exchange 2.4 MiniSEED just as they have
> been for many years. They will not have to upgrade their station
> hardware/software in order to produce up-to-date, valid MiniSEED. NO CHANGE!
>
> Of course, "most institutions" is not necessarily all and sooner or
> later data with blockette 1002 will start to circulate. This will
> require blockette-1002 aware decoders to make use of the extended
> attributes.
>
> The obvious question is now: How much time would it take to update
> libmseed, qlib, seedlink et al. to support blockette 1002? A week? A
> month? A year? A very long time?
>
> As soon as blockette-1002 aware versions of said libraries are
> available, the software using them needs to be re-compiled and linked
> against them. A lot of software if not most is going to be
> blockette-1002 enabled that way, without need for further modifications.
> And, very importantly, the software can be made blockette-1002-ready
> WELL IN ADVANCE of the actual circulation of blockette-1002 data!
>
> This means specifically: If a consensus about the blockette 1002
> structure can be found, say, by December (e.g. AGU), then the work to
> make libmseed, qlib, seedlink et al. blockette-1002 ready and
> subsequently the software that uses them will take at most a few more
> months. With an updated libmseed, software like ObsPy and SeisComP will
> support at least the extended attributes out of the box. I haven't
> looked at the PQLX details but since it also uses libmseed to read
> MiniSEED, a blockette-1002-ready libmseed should allow the transition
> will very little (if any) further effort. I am therefore sure that most
> relevant, actively maintained software can likewise be made
> blockette-1002 ready before the Kobe meeting.
>
> There are, of course, details that need to be addressed. For instance,
> the proposed 4-character location identifier and how it is converted to
> Earthworm's tracebuf format, as pointed out by Dave. But these problems
> would be the same for blockette-1002 MiniSEED and the proposed new format.

>
>> You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.
>

> Older data converters WILL continue to work fine with all currently
> existing MiniSEED streams. Whereas NO older data converters will work
> with ANY data converted to the proposed new and entirely incompatible
> format!

>
>> I predict it wouldn't take very long before network 99 shows up in publications.
>

> This implies authors who don't have a clue about what a network code is.
> How would they be able to correctly use a network code? That's not an
> issue of data formats but channel naming in general.

>
>> I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.
>

> Why not inform the grad student? What does it take for the grad student
> to learn that in an FDSN network code context "IU" doesn't stand for
> "Indiana University"?
>
> http://www.fdsn.org/networks/detail/IU
>
> That's all! In case that grad student happens to stumble upon "99" then
> probably an explanation on http://www.fdsn.org/networks/detail/99 would
> help him or her.

>
>> As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.
>

> What do you mean by "things"? The proposed new format and its
> implementation would not just break the grad student's PQLX but it would
> break ENTIRE INFRASTRUCTURES. World-wide and from bottom to top!
>
> Do you want to disrupt the entire FDSN data exchange to protect the grad
> student using an old PQLX from getting a "99" network code? Is that what
> you are saying?

>
>> Furthermore, even this small update would require modifications to all software chains,
>

> You have a position and are trying your best to defend it. This is
> legitimate of course. But are exaggerating minor problems in order to
> discredit an approach that you cannot deny would be a lot less
> disruptive and expensive than the proposed new format.
>
>> from data generations
>
> No modifications are needed at the stations. Stations continue to
> produce 2.4 MiniSEED which will remains valid. There is no need to
> produce blockette 1002 except for stations that e.g. have extended
> network or location codes. There will not be many (it any) in currently
> existing networks.
>
>> to data centers
>
> Data centers are the ones that benefit most from a continuity that the
> blockette-1002 approach would allow because they neither need to recode
> entire archives nor have to provide "old" and "new" data formats in
> parallel.
>
>> to users
>
> Only users that actually use blockette-1002 data. If these users use
> up-to-date versions of actively maintained software such as ObsPy,
> SeisComP or MiniSEED-to-SAC converters they will not notice any
> difference. Legacy software will continue to work with the exception of
> the network code that will show up as "99".

>
>> along with database schemas, protocols, etc., etc.
>

> There are some cases where updates will require further efforts. We
> already read about Earthworm and the limited space for the location
> identifier in the current Tracebuf2 format. But the effort at the
> Earthworm end to accommodate a longer location identifier would be the
> same for blockette-1002 data as for the proposed new format. It is
> therefore understandable that the Earthworm community has reservations
> against an extended location code because it would have to pay the price
> for something it probably doesn't need.
>
> In general chances are high that most database schemas will remain
> unaffected as well as most protocols.
>
> But I am curious to hear about specific database schemas that would be
> more difficult to update to blockette-1002 MiniSEED than to the proposed
> new format.

>
>> That is a huge amount of work for such a small change.
>

> I hope to have pointed out by now that the work required to implement
> blockette 1002 would in fact be dramatically less compared to the work
> required to upgrade entire infrastructures (indeed from the data loggers
> all the way to data users) to a fully incompatible new format.

>
>> And now we are back at the beginning of this conversation that started in ~2013.
>

> What conversation are you referring to?
>
> Cheers

Joachim Saul

unread,

Aug 25, 2016, 1:27:14 AM8/25/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Chad Trabant wrote on 20.08.2016 02:01:

> I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.

Hallo Chad,

> You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.

Older data converters WILL continue to work fine with all currently

existing MiniSEED streams. Whereas NO older data converters will work
with ANY data converted to the proposed new and entirely incompatible
format!

> I predict it wouldn't take very long before network 99 shows up in publications.

This implies authors who don't have a clue about what a network code is.

How would they be able to correctly use a network code? That's not an
issue of data formats but channel naming in general.

> I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.

Why not inform the grad student? What does it take for the grad student

to learn that in an FDSN network code context "IU" doesn't stand for
"Indiana University"?

http://www.fdsn.org/networks/detail/IU

That's all! In case that grad student happens to stumble upon "99" then
probably an explanation on http://www.fdsn.org/networks/detail/99 would
help him or her.

> As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.

What do you mean by "things"? The proposed new format and its

implementation would not just break the grad student's PQLX but it would
break ENTIRE INFRASTRUCTURES. World-wide and from bottom to top!

Do you want to disrupt the entire FDSN data exchange to protect the grad
student using an old PQLX from getting a "99" network code? Is that what
you are saying?

> Furthermore, even this small update would require modifications to all software chains,

You have a position and are trying your best to defend it. This is

legitimate of course. But are exaggerating minor problems in order to
discredit an approach that you cannot deny would be a lot less
disruptive and expensive than the proposed new format.

> from data generations

No modifications are needed at the stations. Stations continue to
produce 2.4 MiniSEED which will remains valid. There is no need to
produce blockette 1002 except for stations that e.g. have extended
network or location codes. There will not be many (it any) in currently
existing networks.

> to data centers

Data centers are the ones that benefit most from a continuity that the
blockette-1002 approach would allow because they neither need to recode
entire archives nor have to provide "old" and "new" data formats in
parallel.

> to users

Only users that actually use blockette-1002 data. If these users use
up-to-date versions of actively maintained software such as ObsPy,
SeisComP or MiniSEED-to-SAC converters they will not notice any
difference. Legacy software will continue to work with the exception of
the network code that will show up as "99".

> along with database schemas, protocols, etc., etc.

There are some cases where updates will require further efforts. We

already read about Earthworm and the limited space for the location
identifier in the current Tracebuf2 format. But the effort at the
Earthworm end to accommodate a longer location identifier would be the
same for blockette-1002 data as for the proposed new format. It is
therefore understandable that the Earthworm community has reservations
against an extended location code because it would have to pay the price
for something it probably doesn't need.

In general chances are high that most database schemas will remain
unaffected as well as most protocols.

But I am curious to hear about specific database schemas that would be
more difficult to update to blockette-1002 MiniSEED than to the proposed
new format.

> That is a huge amount of work for such a small change.

I hope to have pointed out by now that the work required to implement

blockette 1002 would in fact be dramatically less compared to the work
required to upgrade entire infrastructures (indeed from the data loggers
all the way to data users) to a fully incompatible new format.

> And now we are back at the beginning of this conversation that started in ~2013.

What conversation are you referring to?

Cheers
Joachim

Joachim Saul

unread,

Aug 25, 2016, 4:48:01 AM8/25/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Philip Crotwell schrieb am 24.08.2016 um 15:55:
> Just like to point out that merely upgrading a library, like libmseed,
> to parse a new blockette does not make suddenly make older software
> compatible with a longer network code.

The structure in libmseed that holds the record header attributes is 'MSRecord'. If the decoder of an updated libmseed sees a blockette 1002 it will have to take the information about the network code etc. from there and populate the MSRecord accordingly. That's all. The software will then use or copy the content of MSRecord.network, which by the way is large enough already (10 characters plus '\0') to accommodate the extended network code.

Mission accomplished! Well... mostly.

> If the software itself is not
> also upgraded to use the information in the new blockette then the new
> information is effectively ignored.

There will of course be target data structures in which the network code is hard-coded to be only two characters long. In such cases (hopefully) only two characters are copied. I haven't found a software in which this would be an actual issue. There *is* a similar issue, though, with the extended location code and the Earthworm Tracebuf2 structure. This will be a pain to solve within the Earthworm community but neither the blockette 1002 nor the proposed new format can be blamed for it. It's a limitation of Earthworm that is due to the current SEED channel naming conventions.

ObsPy, SeisComP, SAC, to name a few, would have no problem at all to accommodate the extended attributes. This is probably true for most other actively maintained software that uses either libmseed or qlib.

> I feel that this idea that there
> is a non-disruptive, easy "fix" to expanding the network code is
> unrealistic.

There will never be a solution involving zero effort.

The question is how much effort each of the proposals would require. The blockette-1002 solution would be by far the easiest to adopt. But most importantly, existing infrastructures not requiring extended headers will not be disrupted at all. In other words: all existing real-time data exchange world-wide can continue to work as it does now. This allows enough time to upgrade software to support blockette 1002 and once blockette-1002 data actually start to circulate, most software infrastructures should be able to handle it properly.

Cheers
Joachim

Reply all

Reply to author

Forward