From: Gui Hecheng <
guihe...@cmss.chinamobile.com>
As we encountered, when we are on handling ENOENT error under the .stale directory,
and the log appears like:
ERROR [gway 121713] err_to_sderr(61) /shd/obj10/.stale corrupted
ERROR [gway 121743] err_to_sderr(61) /shd/obj10/.stale corrupted
ERROR [gway 121710] err_to_sderr(61) /shd/obj10/.stale corrupted
...
The ENOENT error is because that the disk is umounted by systemd due to certain
errors found on disk.
But sheep should not report the corruption of a .stale directory.
Actually sheep intends to check the existence of the mountpoint directory:
if (stat(dir, &s) < 0) {
sd_err("%s corrupted", dir);
return md_handle_eio(dir);
}
But here the "dir" is a .stale directory, and then md_handle_eio() will
check whether the disk exist by comparing string "/shd/obj10/.stale" and
all the disk names like "/shd/obj10", "/shd/obj9", etc.
So it will not be able to find the bad disk and give up.
To fix it, we trim the ".stale" with an extra dirname() call, then
md_handle_eio() will be given an searchable disk name.
Signed-off-by: Gui Hecheng <
guihe...@cmss.chinamobile.com>
---
sheep/store/common.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/sheep/store/common.c b/sheep/store/common.c
index 87b3fc0..ecd3c84 100644
--- a/sheep/store/common.c
+++ b/sheep/store/common.c
@@ -45,6 +45,13 @@ int prepare_iocb(uint64_t oid, const struct siocb *iocb, bool create)
return flags;
}
+static char *get_obj_dir(char *path)
+{
+ if (is_stale_path(path))
+ path = dirname(path);
+ return dirname(path);
+}
+
int err_to_sderr(const char *path, uint64_t oid, int err)
{
struct stat s;
@@ -52,7 +59,7 @@ int err_to_sderr(const char *path, uint64_t oid, int err)
/* Use a temporary buffer since dirname() may modify its argument. */
pstrcpy(p, sizeof(p), path);
- dir = dirname(p);
+ dir = get_obj_dir(p);
sd_debug("%s", path);
switch (err) {
--
1.8.3.1