Next generation miniSEED - 2016-3-30 straw man change proposal 6 - Change CRC to represent encoded rather than decoded data

Chad Trabant

unread,

Aug 11, 2016, 10:46:39 PM8/11/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi all,

Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached: Change CRC to represent encoded rather than decoded data.

Please use this thread to provide your feedback on this proposal by Wednesday August 24th.

thanks,
Chad

Doug Neuhauser

unread,

Aug 12, 2016, 2:17:02 AM8/12/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

On 08/11/2016 02:29 PM, Philip Crotwell wrote:
> Hi
>
> I am inclined to agree with this. Back in the old days, the first
> sample/last sample were part of the record, with I guess the idea that
> if there was a transmission error you could both check that the
> decompression was done correctly, as well as potentially decompress
> backwards in time to the point of error. In practice I think that
> errors in the compression are near zero and so this amounts to just a
> check on bit errors in transmission. In this case, there is no benefit
> to having the CRC on the decompressed data, and quite a lot of speed
> improvement to having it on the encoded.
>
> However, I question whether this is even needed as part of the file
> format. It adds complexity to data loggers, perhaps small, but not
> zero. It makes sense as part of a transmission protocol or a file
> system, but that is a separate issue. Perhaps it is cheap insurance,
> but unless it is actively used by receiving software, it doesn't
> really help. Perhaps a question to be asked is does anyone use the
> existing last sample check and has anyone actually encountered real
> miniseed packets with errors? If this type of error doesn't actually
> happen, then perhaps we should not add in a fix for it.

Yes, I have software that uses the last sample check,
Yes, I often use it,
Yes, I have found errors.

I use the "last sample vs last decompressed sample" check
often, and I do find that it oaccasionally detects bad packets.
Most of the time it is due to a datalogger crash creating a bad packet
on disk. However, with other data loggers using SeedLink across
TCP radios, I have seen MiniSEED packet corruption, both in
the data AND in the headers. So TCP does not flag all multi-bit errors.

- Doug N

> I am not totally opposed to having a CRC in the format, but feel that
> its benefits vs. costs should be considered. There is value in
> simplicity.
>
> thanks
> Philip

>> ----------------------
>> Posted to multiple topics:
>> FDSN Working Group II
>> (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
>> FDSN Working Group III
>> (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
>>
>> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
>> Update subscription preferences at http://www.fdsn.org/account/profile/
>>
>
> ----------------------
> Posted to multiple topics:
> FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
> FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
>
> Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
> Update subscription preferences at http://www.fdsn.org/account/profile/
>

--
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
do...@seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 221 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)

Philip Crotwell

unread,

Aug 12, 2016, 3:29:02 AM8/12/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi

I am inclined to agree with this. Back in the old days, the first
sample/last sample were part of the record, with I guess the idea that
if there was a transmission error you could both check that the
decompression was done correctly, as well as potentially decompress
backwards in time to the point of error. In practice I think that
errors in the compression are near zero and so this amounts to just a
check on bit errors in transmission. In this case, there is no benefit
to having the CRC on the decompressed data, and quite a lot of speed
improvement to having it on the encoded.

However, I question whether this is even needed as part of the file
format. It adds complexity to data loggers, perhaps small, but not
zero. It makes sense as part of a transmission protocol or a file
system, but that is a separate issue. Perhaps it is cheap insurance,
but unless it is actively used by receiving software, it doesn't
really help. Perhaps a question to be asked is does anyone use the
existing last sample check and has anyone actually encountered real
miniseed packets with errors? If this type of error doesn't actually
happen, then perhaps we should not add in a fix for it.

I am not totally opposed to having a CRC in the format, but feel that

its benefits vs. costs should be considered. There is value in
simplicity.

thanks
Philip

On Thu, Aug 11, 2016 at 3:48 PM, Chad Trabant <ch...@iris.washington.edu> wrote:
>

Philip Crotwell

unread,

Aug 12, 2016, 10:53:37 PM8/12/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

OK, then I am in favor of including the CRC.

However, in thinking more about this, I do not think it makes any
sense to protect only the data with the CRC. After all an error in the
headers could be more damaging than one in the data. So, perhaps it is
better to have the CRC field be over the entire record, with the CRC
bytes assumed to be zero for purposes of the calculation.

Philip

Chad Trabant

unread,

Aug 19, 2016, 5:25:35 AM8/19/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Hi,

I agree with the rationale for this proposal. Even though we lose the capability to validate that the data was properly decoded, that process is not well defined given byte order differences.

A further consideration is whether we should move the CRC near the top of the header and include as much of the header as possible along with the data.

Chad

Philip Crotwell

unread,

Aug 19, 2016, 8:22:19 PM8/19/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

The way the checksum is done in TCP packets is to initially set the
checksum bytes to zero, then calculate the checksum over the entire
packet, then afterwards set the checksum bytes to the calculated
value. This procedure has the advantage that you can checksum the
entire record, not just the parts after the checksum location in the
header. The TCP checksum is simpler than the CRC in gzip, and has the
property that the checksum over the packet, including the checksum
bytes, always gives zero. The equivalent in the CRC case would be to
skip the CRC bytes in the header (ie assume zero) and then compare the
value computed with the one in the header.

But the main point is that because the location of the checksum in the
header does not determine what can be part of the checksum, I would
argue that everything should be included (why not?), and we do not
need to be overly concerned about the location of the CRC in the
header. For efficiency reasons you might actually prefer it to be
after the data to allow it to be computed/stored in a streaming mode.
Not sure if that complication is worth it, but moving the CRC to the
final 4 bytes of the record might be worth thinking about.

It may also be wise to allow the CRC not to be set for cases where
computational speed is more important and errors are not likely, like
reading, modify, write from a disk. Setting the CRC to zero probably
is sufficient, but I think there is the possibility that a real CRC
could actually end up being zero, one chance in 2^32, so likely that
is small enough not to worry about.

Philip

Pete Evans

unread,

Aug 20, 2016, 8:24:23 AM8/20/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

On 19.08.2016 16:23, Philip Crotwell wrote:

> But the main point is that because the location of the checksum in the
> header does not determine what can be part of the checksum, I would
> argue that everything should be included (why not?), and we do not
> need to be overly concerned about the location of the CRC in the
> header. For efficiency reasons you might actually prefer it to be
> after the data to allow it to be computed/stored in a streaming mode.
> Not sure if that complication is worth it, but moving the CRC to the
> final 4 bytes of the record might be worth thinking about.

I have one thought about why CRC might usefully only be over data:
it is not unknown for network/station/location/channel
codes to be be changed. Changing archived miniSEED would require
recomputed CRCs if they extend over the stream identifier
parts. I'm undecided if that's a strength or weakness.

> It may also be wise to allow the CRC not to be set for cases where
> computational speed is more important and errors are not likely, like
> reading, modify, write from a disk. Setting the CRC to zero probably
> is sufficient, but I think there is the possibility that a real CRC
> could actually end up being zero, one chance in 2^32, so likely that
> is small enough not to worry about.

We already have ~100TB of archived data: ~2^46 bytes. If they were
all 512 = 2^9 bytes, that's 2^37 records, and 2^37 check sums. I
suspect there are quite a few with CRC = 0.

Best wishes,
P.

--
Dr Peter L Evans, GEOFON Data Centre
GFZ German Research Centre for Geosciences
pev...@gfz-potsdam.de Tel. +49 (0)331 288-1261
http://geofon.gfz-potsdam.de/ Fax: +49 (0)331 288-1277

Joseph Steim

unread,

Aug 23, 2016, 11:09:07 PM8/23/16

to fdsn-w...@fdsn.org, fdsn-wg3...@fdsn.org

Agree. A CRC should confirm the integrity of the verbatim bytes representing
the data record

From: Chad Trabant [mailto:ch...@iris.washington.edu]
Sent: Thursday, August 11, 2016 3:47 PM
To: Tim Ahern <t...@iris.washington.edu>; Bob Hutt <Hu...@asl.cr.usgs.gov>;
Bruce Beaudoin <br...@passcal.nmt.edu>; Pete Davis
<pda...@epicenter.ucsd.edu>; Kent Anderson <ke...@iris.edu>; Bob Woodward
<wood...@iris.edu>; Katrin Hafner <haf...@iris.edu>; Justin Sweet
<jrs...@uw.edu>; Brent...@iris.edu; David Wilson <dwi...@usgs.gov>;
Mathias Franke <mathias...@kmi.com>; Ogie Kuraica <og...@kmi.com>; Neil
Spriggs <NeilS...@nanometrics.ca>; Angel Rodriguez
<an...@volcanbaru.com>; Leonid Zimakov <L.Zi...@reftek.com>; Edelvays
Spassov <e...@kmi.com>; Branden Christensen
<branden.c...@osop.com.pa>; Dieter Stoll
<dst...@lennartz-electronic.de>; Seiji Tsuboi <tsu...@jamstec.go.jp>; Jara
Salvador, Jose Antonio <JoseAnto...@icgc.cat>; David Easton
<david...@nanometrics.ca>; timp...@nanometrics.ca; John Clinton
<jcli...@sed.ethz.ch>; Vallee Martin <val...@ipgp.fr>; Constantin Ionescu
<vio...@info.ro>; Marmureanu...@dst.units.it; Bogdan Grecu
<bgr...@infp.ro>; Cristian Neagoe <cristia...@infp.ro>; Helle Pedersen
<Helle.P...@ujf-grenoble.fr>; Catherine Pequegnat
<catherine...@obs.ujf-grenoble.fr>; Pierre Volcke
<pierre...@obs.ujf-grenoble.fr>; Angelo Strollo
<str...@gfz-potsdam.de>; Sébastien Judenherc
<Sebastien...@fedd-scientific.com>; Tony Russell
<tony.r...@kenda.co.uk>; Robert Leugoud <rleu...@eentec.com>; Jiang Li
<liji...@geodevice.cn>; Lani Oncescu <la...@kmi.com>; Shawn Goessen
<sup...@guralp.com>; Dennis Pumphrey <d...@kmi.com>; Suzan Kowalski
<suzank...@nanometrics.ca>; Bruce Townsend
<brucet...@nanometrics.ca>; Joseph Steim <st...@quanterra.com>; Ian
Billings <reftek_...@trimble.com>; Claudio Parma <c.p...@solgeo.it>
Cc: fdsn Group II <fdsn-w...@lists.fdsn.org>; FDSN Working Group III
<fdsn-wg3...@lists.fdsn.org>
Subject: Next generation miniSEED - 2016-3-30 straw man change proposal 6 -

Reply all

Reply to author

Forward