Collection-wide fixity check stats

31 views
Skip to first unread message

Ben Fino-Radin

unread,
Apr 30, 2015, 1:20:37 PM4/30/15
to digital-...@googlegroups.com
Hi all,

I would love to hear from those of you managing large repositories, how long it takes you to fixity check your entire collection.

Of course this only applies to those doing fixity checks linearly, and not according to an event driven policy…

Also of course, there are plenty of variables here, but for now I'm just interested in the two numbers: ammt of data, ammt of time. Any takers?

Best,
Ben


Andrew Berger

unread,
Apr 30, 2015, 2:43:10 PM4/30/15
to digital-...@googlegroups.com
We have one storage area that had roughly 38 terabytes of data the last time I did a full check. When I ran the initial checksums - just md5 - it had about 35 terabytes and the check took about 6 days, running continuously. I used md5deep. Other tools may be faster, but I was looking for the easiest set of command-line options for running a check recursively on a large set of folders.

I added to that list incrementally as more files were added and then ran another full check three months later, when there were about 38 terabytes of data. That took maybe 7 days; I didn't time it precisely, but based on the first check I expected it to take about a week. The second check isn't quite a parallel comparison, since I used a more powerful computer, used md5sum instead of md5deep, and divided the list into parts instead of running one long check. I also didn't run the check quite continuously, since I could choose when I started each part.

Andrew
Reply all
Reply to author
Forward
0 new messages