However, if I reboot the secondary, the primary starts freezing up for long periods:
Mar 9 22:46:27 cs04 hastd: [iscsi1] (primary) Unable to r: Socket is not connected.
Mar 9 22:46:27 cs04 hastd: [iscsi1] (primary) Unable to co: Connection refused.
Mar 9 22:46:42 cs04 last message repeated 3 times
Mar 9 22:46:53 cs04 istgt[14298]: ABORT_TASK
Mar 9 22:47:35 cs04 last message repeated 3 times
Mar 9 22:48:02 cs04 hastd: [iscsi1] (primary) Unable to co: Operation timed out.
Mar 9 22:48:02 cs04 istgt[14298]: CmdSN(45748), OP=0x2a, ElapsedTime=74 cleared
Mar 9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c: 640:istgt_iscsi_write_pdu: ***ERROR*** iscsi_write() failed (errno=32)
Mar 9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c:3327:istgt_iscsi_op_task: ***ERROR*** iscsi_write_pdu() failed
Mar 9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c:3867:istgt_iscsi_execute: ***ERROR*** iscsi_op_task() failed
Mar 9 22:48:02 cs04 istgt[14298]: istgt_iscsi.c:4337:worker: ***ERROR*** iscsi_execute() failed
Mar 9 22:48:02 cs04 istgt[14298]: CmdSN(490802), OP=0x2a, ElapsedTime=73 cleared
Mar 9 22:48:02 cs04 istgt[14298]: CmdSN(28387), OP=0x2a, ElapsedTime=73 cleared
Mar 9 22:48:14 cs04 istgt[14298]: ABORT_TASK
Mar 9 22:48:52 cs04 last message repeated 2 times
Mar 9 22:49:22 cs04 hastd: [iscsi1] (primary) Unable to co: Operation timed out.
As soon as the secondary comes back online, everything starts behaving again and all is well.
Is this expected behavior at this point, or should hastd not block like this?
-- Kevin
It shouldn't of course block like this. There is a separate thread
responsible for reconnecting which shouldn't interact with I/O threads.
I'll try to reproduce and will let you know.
--
Pawel Jakub Dawidek http://www.wheelsystems.com
p...@FreeBSD.org http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
Could you try the following patch?
http://people.freebsd.org/~pjd/patches/hastd_primary.c.patch
Sorry for the long delay.
This does seem to fix that problem, yes. :)
-- Kevin