monitoring iscsi connections

solidguy

unread,

Sep 10, 2009, 1:53:50 PM9/10/09

to open-iscsi

Is there a good way to monitor the iscsi sessions for errors? I need
to report it via SNMP (and perhaps kill the VM running over the
connection). Right now the only way I see is to monitor syslog -- but
the syslog doesn't always give me which session (or disk) is in
trouble.

Thanks
--
solidguy

Mike Christie

unread,

Sep 10, 2009, 3:08:54 PM9/10/09

to open-...@googlegroups.com

You can monitor which sessions have errors by listening on the iscsi
netlink interface. I am not sure if that is very user friendly. You
would have to listen for ISCSI_KEVENT_CONN_ERROR events and filter out
other junk. This does not tell you if a disk has errors though.

For upstream we are working on a more complete solution for
dm-mutlipath, so it can figure out when session/connections have link
problems or if a disk is offlined by the scsi eh. But right now, I do
not think there is any easy way to do what you need.

Chiradeep Vittal

unread,

Sep 10, 2009, 6:23:42 PM9/10/09

to open-iscsi

Thanks. I'll take a look at the netlink interface. Not using multipath
for now, but will do so later.

For basic monitoring of storage network problems, here's what I am
thinking:
1. If there is a network failure, eventually cat /sys/block/<disk>/
device/state should show "offline" ?
2. How long will this take? I know that this is a function of
replacement_timeout, noop_interval, noop_timeout and scsi timeout, but
the relationship is not clear

Let us say
a=session.timeo.noop_out_interval=5
b=session.timeo.noop_out_timeout=5
c=session.timeo.replacement_timeout=120
d=`cat /sys/block/<disk>/device/timeout`=60

The disk should go offline in a maximum of a+b+c+d=190s after a
network failure?

If the network comes back up, how soon will the disk state go to
'running' ?

Thanks
--
solidguy

Ulrich Windl

unread,

Sep 11, 2009, 2:24:30 AM9/11/09

to open-...@googlegroups.com

Maybe something like "netstat -s"? For those who don't know, here's an example of
"netstat -s" in HP-UX (partial output only):

netstat -s
tcp:
2230243561 packets sent
2219104628 data packets (2507019105755 bytes)
733 data packets (583009 bytes) retransmitted
7150563 ack-only packets (3941611 delayed)
3465 URG only packets
271750 window probe packets
52973 window update packets
3992851 control packets
1443289314 packets received
1362450977 acks (for 2507016701224 bytes)
596 duplicate acks
0 acks for unsent data
646031312 packets (160540170224 bytes) received in-sequence
0 completely duplicate packets (0 bytes)
6499 packets with some dup data (1092308 bytes duped)
6499 out of order packets (545964 bytes)
0 packets (0 bytes) of data after window
271750 window probes
782772 window update packets
18729 packets received after close
0 segments discarded for bad checksum
0 bad TCP segments dropped due to state change
1193488 connection requests
671426 connection accepts
1864914 connections established (including accepts)
2601445 connections closed (including 774164 drops)
717930 embryonic connections dropped
1361096509 segments updated rtt (of 1361096509 attempts)
322 retransmit timeouts
3 connections dropped by rexmit timeout
271750 persist timeouts
308419 keepalive timeouts
306875 keepalive probes sent
0 connections dropped by keepalive
0 connect requests dropped due to full queue
718236 connect requests dropped due to no listener
0 suspect connect requests dropped due to aging
0 suspect connect requests dropped due to rate
udp:
0 incomplete headers
0 bad checksums
0 socket overflows
ip:
877280934 total packets received
0 bad IP headers
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets forwarded
0 packets not forwardable
icmp:
21109 calls to generate an ICMP error message
0 ICMP messages dropped
Output histogram:
echo reply: 6213
destination unreachable: 14896
source quench: 0
routing redirect: 0
echo: 0
time exceeded: 0
parameter problem: 0
time stamp: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
0 bad ICMP messages
Input histogram:
echo reply: 31569
destination unreachable: 14582
source quench: 0
routing redirect: 1
echo: 6213
time exceeded: 0
parameter problem: 0
time stamp request: 0
time stamp reply: 0
address mask request: 0
address mask reply: 0
6213 responses sent
[...]

>
> Thanks
> --
> solidguy
> >

Mike Christie

unread,

Sep 11, 2009, 10:53:13 AM9/11/09

to open-...@googlegroups.com

On 09/10/2009 05:23 PM, Chiradeep Vittal wrote:
> Thanks. I'll take a look at the netlink interface. Not using multipath
> for now, but will do so later.
>
> For basic monitoring of storage network problems, here's what I am
> thinking:
> 1. If there is a network failure, eventually cat /sys/block/<disk>/
> device/state should show "offline" ?
> 2. How long will this take? I know that this is a function of
> replacement_timeout, noop_interval, noop_timeout and scsi timeout, but
> the relationship is not clear
>
> Let us say
> a=session.timeo.noop_out_interval=5
> b=session.timeo.noop_out_timeout=5
> c=session.timeo.replacement_timeout=120
> d=`cat /sys/block/<disk>/device/timeout`=60
>
> The disk should go offline in a maximum of a+b+c+d=190s after a
> network failure?

It is not really that easy, because if the nop times out the iscsi layer
will drop the session and the disk state will not change to offline. The
disk state will only change if the scsi command timer fires and the scsi
eh runs and fails. In this case the disk state will go to offline.

For the nop timeout case and the scsi eh failing case, the iscsi session
state will go to failed, so you could check that instead. That value is in

/sys/class/iscsi_session/session%SID/state

>
> If the network comes back up, how soon will the disk state go to
> 'running' ?

When the iscsi session is dropped due to a nop timeout or the scsi eh
failing, the initiator will basically poll the network ever couple of
seconds by trying to reconnect the tcp connection. And so it depends on
the type of failure. If the initiator is trying to reconnect the tcp
connection when the network comes up, then we could reconnect right
away, or if the network layer cannot figure things out the reconnect
could timeout and then the next try would work, or if the network had
given us a error right away when we tried the reconnect then it on the
next reconnect attempt we would be successful.

Chiradeep Vittal

unread,

Sep 11, 2009, 3:07:23 PM9/11/09

to open-iscsi

Very enlightening, thanks.
If there is a constant stream of traffic over the iscsi session and

there is a network failure,

then the scsi eh timer should fire right?
And the disk will then go offline (according to /sys/block/<disk>/
device/state )?

I think where this is leading to is to use dm-multipath even if there
is only a single path since dm-multipath
will constantly test the link.

Chiradeep Vittal

unread,

Sep 11, 2009, 3:09:18 PM9/11/09

to open-iscsi

Thanks. I was looking for more granular error reporting
(i.e., iscsi session X failed at Y / iscsi session X had Y errors
between time A and time B)

On Sep 10, 11:24 pm, "Ulrich Windl" <ulrich.wi...@rz.uni-

Mike Christie

unread,

Sep 14, 2009, 12:41:25 PM9/14/09

to open-...@googlegroups.com

Chiradeep Vittal wrote:
> Very enlightening, thanks.
> If there is a constant stream of traffic over the iscsi session and
> there is a network failure,
> then the scsi eh timer should fire right?

If you have nops turned off then it will.

If you have nops on then it could fire, but the iscsi layer will prevent
the scsi eh timer from causing the scsi eh from running. With nops on we
send a nop every noop_timout seconds. If we have some bad timing and a
scsi timer fires while a nop is being sent or right before we want to
send a nop to test the network then starting with 871 and upstream
kernel 2.6.30 we reset the scsi cmd eh timer and send a nop. If during
that reset cmd timer period the nop times out then we would drop the
iscsi session and the scsi eh would not run. If the nop runs ok, then
when the cmd timer times out again then the scsi eh will run.

If there is a network problem while the scsi eh is running, then the
scsi eh could fail if we cannot reconnect within replacement_timeout
seconds and that would lead to the devices going offline as seen in
sysfs below.

Reply all

Reply to author

Forward