If you run a FSCK, the "status" output will show you how many files are
missing/etc. Then the fsck log will have the fill fid information for each
type of error:
http://code.google.com/p/mogilefs/wiki/FSCK#Interpreting_Results
There's also thread titled "end-to-end checksums" from Nov 2011 where I
posted my "checksums" branch: git://bogomips.org/MogileFS-Server.git
Still hoping more eyes will look at it and maybe try it out.
I unfortunately haven't had much time to test further, but it seems
alright in my limited testing.
Yes, my checksums branch allows running fsck to get the list using
printlog/taillog. No special tweaks to whatever HTTP server you're
using is required, but running mogstored (from the checksums branch) is
/highly/ recommended.
> was looking for something like moglist but maybe you could correct me if I
> misunderstood .
You could probably write a quick script to transform fsck printlog
FID results to namespace/key names if it makes recovery easier...
Not sure about the rest of your fsck questions below. These numbers
are /without/ my checksums changes, right?
> This is how my fsck status looks like now ..
> mogadm fsck status
>
> Running: Yes (on lfvsfcp58.dn.net)
> Status: 5740115 / 287432015 (2.00%)
> Time: 265m (360 fids/s; 13025m remain)
360 fids/s seems really low. What's the network latency between your
trackers <-> storage nodes and trackers <-> DB?
Adding checksums will make fsck much slower esp with large files, but
the current size-only checks should be much faster...
Maybe somebody else can pipe up, I've never had performance issues
with plain fsck...
> Check Type: Normal (check policy + files)
>
> [num_GONE]: 2
> * [num_MISS]: 567716 (that's a LOT of missing copies that are supposed to
> exist ) ?*
> [num_NOPA]: 2
> * [num_POVI]: 709490 ( replication policy violation ) *
> * [num_REPL]: 709473 ( FID has been scheduled for replication to fix a
> policy violation ) *
> [num_SRCH]: 2
>
> if if we try manually to fix all these files that's a HUGE no .. is FSCK
> really supposed to run for *13025m* long ??