Hi,
I'm trying to find the correlated error to this INFO message:
INFO [my-cluster:e6a58f50-433c-11ea-a949-d1a066623dc1:e6a5ddb1-433c-11ea-a949-d1a066623dc1] i.c.s.SegmentRunner - Repair for segment e6a5ddb1-433c-11ea-a949-d1a066623dc1 started, status wait will timeout in 3600000 millis
Basically, we've discovered in some cases, the value of hangingRepairTimeoutMins is set too short, and causing segments to fail. But there doesn't appear to be a specific ERROR event when it hits this timeout. Is this a bug? or is it at a different log level than ERROR? or is there something different we could/should be looking for? Basically we need a way to detect when we hit this limit, so we can either shrink the repair segment size, or up the timeout, etc.
-= Jay =-