Thanks - that extra info is quite useful. Knowing that nothing else unusual
is happening can be quite valuable (and I don't like to assume).
I haven't found anything that would clearly cause your crash, but I have
found something that looks wrong and conceivably could.
Could you please try this patch on top of what you are currently using? By
the look of it you get a crash at least every day, often more often. So if
this produces a day with no crashes, that would be promising.
The important aspect of the patch is that it moves the "atomic_inc" of
"sh->count" back under the protection of ->device_lock in the case when some
other thread might be using the same 'sh'.
Thanks,
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3088d3af5a89..03f82ab87d9e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -675,8 +675,10 @@ get_active_stripe(struct r5conf *conf, sector_t sector,
|| !conf->inactive_blocked),
*(conf->hash_locks + hash));
conf->inactive_blocked = 0;
- } else
+ } else {
init_stripe(sh, sector, previous);
+ atomic_inc(&sh->count);
+ }
} else {
spin_lock(&conf->device_lock);
if (atomic_read(&sh->count)) {
@@ -695,13 +697,11 @@ get_active_stripe(struct r5conf *conf, sector_t sector,
sh->group = NULL;
}
}
+ atomic_inc(&sh->count);
spin_unlock(&conf->device_lock);
}
} while (sh == NULL);
- if (sh)
- atomic_inc(&sh->count);
-
spin_unlock_irq(conf->hash_locks + hash);
return sh;
}