Chunk Parity -- or RAID5 for BeeGFS

116 views
Skip to first unread message

Jan Behrend

unread,
Jun 22, 2017, 10:59:56 AM6/22/17
to fhgfs...@googlegroups.com
Hello list,

one of my latest interal BeeGFS customers is worried about losing a complete storage node
to lightning or a black hole leaving his cluster with partially inaccessable data chunks.

Reluctant to buddymirror all of it using half his valuable storage on this, naturally he
suggested raid levels and found this project:
https://github.com/runefriborg/beegfs-chunk-parity

IMO this project addresses a very special use case and I think we should/could not use it
in our enviroment.

The question is, is there something like this on the BeeGFS roadmap?
Does this make sense at all in this architecture, does it kill performance ... etc?

I'm sure you thought about this, and I'd kindly ask to get some insight on this.

Cheers Jan

--
MAX-PLANCK-INSTITUT fuer Radioastronomie
Jan Behrend - Rechenzentrum
----------------------------------------
Auf dem Huegel 69, D-53121 Bonn                                  
Tel: +49 (228) 525 359
http://www.mpifr-bonn.mpg.de


Rune M. Friborg

unread,
Jun 23, 2017, 3:01:18 AM6/23/17
to fhgfs...@googlegroups.com
Hi Jan,

I can not answer "The question is, is there something like this on the BeeGFS roadmap?"

But, I would like to give you some insights into our experience with the beegfs-chunk-parity project, which I am the architect of.
First, it is probably not ready for production-use by anyone else than the developers of the tool.

Having said that. It has been in production use for almost two years at Aarhus University, where they run a BeeGFS installation with 50 storage targets (3.5PB) . With 50 storage targets the
chance of having one failing is dangerously high, which is why the beegfs-chunk-parity project was developed.


Every storage target records a change log. From this change log a continues parallel process runs on the storage servers and updates the parity in the background. This approach has the huge benefit, that it does not affect the latency of file operations on the filesystem.

In November last year, Aarhus University lost an entire storage target ~80TB. It was successfully recovered (using this chunk-parity) and the entire filesystem were up and running after 1-2 week of downtime, with a small amount of recent files missing. This saved us from a restore from backup, which would have required months and the pain of users realising that the backup only covers the most important files (1/5th).

cheers, Rune


--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Melpheos1er

unread,
Jun 26, 2017, 5:23:48 AM6/26/17
to beegfs-user
Looks like a great project.
Can this be though as some sort of RAIN achitecture (not the same but in the same category)
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.

Rune M. Friborg

unread,
Jun 27, 2017, 2:48:29 AM6/27/17
to fhgfs...@googlegroups.com
Hi,

It can not be used for RAIN. It's more like a disaster recovery method ( since you always loose some recently modified files ). We were actually planning that it should hopefully never be used, but then we got 3 lost drives in one RAID-6 configuration. 

Also, I would like to emphasise that the current version of the chunk-parity software is very targeted towards the use-case and setup at Aarhus University.

cheers, Rune

To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages