I know this has been mentioned before, spanning back a few years at least but I wonder if other people are still seeing this?
I think the problem is down to flaky networking where a slave can detach / go offline briefly when mid polling (why that would happen is a different thing entirely), so it looks like a polling request was done on a slave, it disconnected and the polling is still waiting for the request to come back, which it never does and suspends any subsequent polls until a Jenkins restart is performed.
I guess the fix / way around this would to be have a time out in the polling so it can recover, can anyone who had these same issues verify the same thing happened? If so, I'll add a feature request / bug report for it, although obviously it's a difficult thing to reproduce.