[ANN] Reed-Solomon Parity Encoding at +1GB/sec

Klaus Post

unread,

Jun 22, 2015, 11:48:25 AM6/22/15

to golan...@googlegroups.com

Hi!

I am very happy to announce a new package, that is able to perform Reed-Solomon Erasure code creation at several Gigabytes per second. This will allow you to create real-time parity sets on your file storage.

The original intention with the package was to port the recently announced Reed-Solomon library created by Backblaze, and it remains compatible with that. The package does not use cgo, but uses AMD64 SSE3 assembler to achieve the fast encoding.

Package home: https://github.com/klauspost/reedsolomon

Godoc: https://godoc.org/github.com/klauspost/reedsolomon

Development Log: http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/

I hope the package interface is as easy and straightforward to use as I intended it to be, so I am very interesting in feedback on that. There are some underlying "rules" of using RS, that cannot be conveyed easily in the interface, but I hope the documentation and examples can help that.

Feedback, questions and comments are very welcome!

/Klaus

Axel Wagner

unread,

Jun 22, 2015, 12:04:32 PM6/22/15

to Klaus Post, golan...@googlegroups.com

Awesome! I wanted something like this for a while now.

May I ask what kind of decoder you use? I assume from the
description you use a systematic code? And do you plan to add an
error-decoder (as opposed to just an erasure decoder, like now)?
Because in theory you can detect and correct up to parity/2 errors
too and I would consider that a very usefull thing.

Lastly, something to clarify in the docs, is the total number of
shards. i.e. with
New(42, 23)
will the encoder use a) 42 shards plus 23 parity shards, so 65 in total
or b) 42 shards of which 23 are used for parity shards, so 42 in total,
of which 19 are usable for data? I assume you do the first, but the
literature diverges in notation on this, so some clarification is
usefull, I think.

Best,

Axel

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Klaus Post

unread,

Jun 22, 2015, 1:31:58 PM6/22/15

to golan...@googlegroups.com, klau...@gmail.com

Hi!

On Monday, 22 June 2015 18:04:32 UTC+2, Axel Wagner wrote:

Awesome! I wanted something like this for a while now.

Great to hear!

May I ask what kind of decoder you use? I assume from the
description you use a systematic code?

I am not deeply into the math, I mostly did a lot of tests to ensure that it aligns with the java client.

You can read more about it here: https://www.backblaze.com/blog/reed-solomon/

From what I understand of it - and please forgive me for my layman terms, I only understand it on an operational level: It computes a (reversible) Vandermonde matrix, which it uses with an 8 bit Galois Field to encode the erasure codes. I believe this paper was the basis of the original implementation: http://web.eecs.utk.edu/~plank/plank/papers/SPE-9-97.html

And do you plan to add an
error-decoder (as opposed to just an erasure decoder, like now)?
Because in theory you can detect and correct up to parity/2 errors
too and I would consider that a very usefull thing.

I don't have any current plans, but that is because I don't know what it is- could you perhaps give an example?

Lastly, something to clarify in the docs, is the total number of [...]

I have adjusted the docs to hopefully make that a bit more clear. 'Data shards' should always refer to the data you want to encode and 'parity shards' to the calculated parity. It might be cleaner to wrap this type into a separate (struct) type, but on the other hand I like the directness of just supplying byte arrays.

Best,

Axel

Thanks for the feedback!

/Klaus

Axel Wagner

unread,

Jun 22, 2015, 1:53:10 PM6/22/15

to Klaus Post, golan...@googlegroups.com, klau...@gmail.com

Klaus Post <klau...@gmail.com> writes:
> I am not deeply into the math, I mostly did a lot of tests to ensure that
> it aligns with the java client.
>
> You can read more about it here:
> https://www.backblaze.com/blog/reed-solomon/

Yipp, I have read that in the meantime, interesting read, though
they indeed don't go very far into the math.

> From what I understand of it - and please forgive me for my layman terms, I
> only understand it on an operational level: It computes a (reversible)
> Vandermonde matrix, which it uses with an 8 bit Galois Field to encode the
> erasure codes.

Yipp, that's how I read it too :)

> I don't have any current plans, but that is because I don't know what it
> is- could you perhaps give an example?

It is a pretty hard problem to do, which is why I think there aren't a
lot of implementations out there. The difference is basically, that you
assume, that I know which blocks are corrupted. But in theory, an
algorithm can find that out, as long the numbers of errors is below a
certain threshold. Think of it as losing a harddrive vs. bitrot.

Reed and Solomon described a theoretical decoder, that explains pretty
well why that works in theory, which is also described in the wikipedia
[0]. It's completely impractical though, so other, more efficient (but
also more expensive) correction algorithms were developed later.

In any case, I didn't really expect you to implement that (but it would
be cool if), because it's pretty non-trivial and involves a lot of
math. I actually tried to do it once and failed miserably :) I guess for
now another way to detect errors will have to do :)

> I have adjusted the docs to hopefully make that a bit more clear.

Cool.

So again, very nice package, I will keep it bookmarked for later use :)

Klaus Post

unread,

Jun 22, 2015, 2:14:08 PM6/22/15

to golan...@googlegroups.com, klau...@gmail.com

On Monday, 22 June 2015 19:53:10 UTC+2, Axel Wagner wrote:

It is a pretty hard problem to do, which is why I think there aren't a
lot of implementations out there. The difference is basically, that you
assume, that I know which blocks are corrupted. But in theory, an
algorithm can find that out, as long the numbers of errors is below a
certain threshold. Think of it as losing a harddrive vs. bitrot.

Ah, ok. I haven't heard about that. Definitely sounds interesting.

I considered adding a per shard hash as part of the library and help keep track of shard number and order, but I couldn't come up with something I felt could easily be used in a lot of different scenarios, so for now it is up to the library user to ensure validity of the individual shards. Either way, I think that will go into a separate package - the only thing I have that I might do is to add a simpler way of "streaming" data to the encoder.

/Klaus

Vasiliy Tolstov

unread,

Jun 22, 2015, 5:38:29 PM6/22/15

to Klaus Post, golan...@googlegroups.com

Wow, i'm already very happy with pgzip and now with this package i'm
happy twice =)

--
Vasiliy Tolstov,
e-mail: v.to...@selfip.ru

Nick Craig-Wood

unread,

Jun 23, 2015, 4:36:23 AM6/23/15

to Klaus Post, golan...@googlegroups.com

Very nice package!

If you wanted to implement a command line tool which uses it, you might
want to check out the par tool (par2 package in ubuntu)

http://parchive.sourceforge.net/

There seems to be a specification for writing files to disk too...

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Nick Craig-Wood

unread,

Jun 23, 2015, 4:46:10 AM6/23/15

to Klaus Post, golan...@googlegroups.com

On 22/06/15 19:14, Klaus Post wrote:
> On Monday, 22 June 2015 19:53:10 UTC+2, Axel Wagner wrote:
>
> It is a pretty hard problem to do, which is why I think there aren't a
> lot of implementations out there. The difference is basically, that you
> assume, that I know which blocks are corrupted. But in theory, an
> algorithm can find that out, as long the numbers of errors is below a
> certain threshold. Think of it as losing a harddrive vs. bitrot.
>
> Ah, ok. I haven't heard about that. Definitely sounds interesting.

Erasure codes are usually used when you know don't have the data at all.
This is extra information provided by a checksum of a packet or the
disk being missing. Reed-Solomon is a fine erasure encoder (it is Optimal).

For a Reed-Solomon code if you can deal with 2k+1 erasures then you can
only deal with k corrections. There are other codes which will do much
better than this. It has its advantages though as it is easy to
implement in software over GF(256).

If you are interested in RS coding, then Russ Cox wrote lots of stuff
about it

http://research.swtch.com/field

And also a QR code (which uses RS codes) encoder in go (but not
decoder?) here

https://godoc.org/code.google.com/p/rsc/qr

The general case of decoding reed solomon codes is hard. Many years a
go I implemented the Berelekamp Massey decoder (which was used for
decoding RS codes over satellite)

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Berlekamp.E2.80.93Massey_decoder

There may be better algorithms now though.

If I were doing the same things today, I'd consider a low density parity
check code which perform much better. See David MacKay's book for a
good introduction

http://www.inference.phy.cam.ac.uk/mackay/itila/

> I considered adding a per shard hash as part of the library and help
> keep track of shard number and order, but I couldn't come up with
> something I felt could easily be used in a lot of different scenarios,
> so for now it is up to the library user to ensure validity of the
> individual shards.

And that is as it should be for a low level library. The user might be
packing the data into Ethernet packets and relying on the 32 bit CRC to
check for corruptions, or storing in files, you just don't know...

> Either way, I think that will go into a separate
> package - the only thing I have that I might do is to add a simpler way
> of "streaming" data to the encoder.

In the docs you wrote on the Reconstruct function

//
// The reconstructed shard set is complete, but integrity is not
verified.
// Use the Verify function to check if data set is ok.

The Verify function should always pass after you do error decoding so it
isn't an integrity check in any way.

If you want an additional integrity check then the user of the library
should add one with a CRC/MD5/SHA1 etc - calling Verify after
Reconstruct is just a waste of CPU cycles!

(In general you can use codes for detecting errors, or correcting
errors, but not both!)

Klaus Post

unread,

Jun 23, 2015, 5:48:30 AM6/23/15

to golan...@googlegroups.com, klau...@gmail.com

On Tuesday, 23 June 2015 10:46:10 UTC+2, Nick Craig-Wood wrote:

If you are interested in RS coding, then Russ Cox wrote lots of stuff
about it

http://research.swtch.com/field

Very nice introduction, and by a Gopher too :)

It is very interesting, even though much of the theory behind it is way above my math level.

In terms of speed, it is unfair to compare Russ' 22MB/s directly to my 1GB/s, since his benchmarks are on a much smaller datasets.

The Verify function should always pass after you do error decoding so it
isn't an integrity check in any way.

Good spot. And you are right, it cannot be assumed to be an integrity check. However it is more complex.

If you Reconstruct and you are missing LESS than the number of shards you have parity for, the Verify will be able to detect data corruptions, since parity will not line up.

If you run reconstruct, and you are missing the same number of shards you have parity for, the Verify function will always succeed, even if the data has been corrupted, since the missing shards will be made to match the corrupted data.

But I think it should be made much clearer from the documentation, that 'Verify' is sort of a red herring, when running it after a 'Reconstruct', and you should rely on checksums.

>par2 package [...]

Yes, I have used par2 for many years for backups - especially when I did "long-time-storage" on DVD's. PAR2 is really good, although it has some rare failure states, which should be fixed in the algorithm used in this package.

Great feedback! Thanks!

Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

/Klaus

Reply all

Reply to author

Forward