Information about iSCSI pings that almost timed out

Erez Zilber

unread,

Nov 18, 2009, 11:29:56 AM11/18/09

to open-...@googlegroups.com

open-iscsi sends nop-outs to the target. If the target responds quick
enough, we don't get a timeout. I'd like to know (for internal debug
purposes) how many times the ping timer almost expired. This sounds
like a useful feature also for other open-iscsi developers/users.

I was thinking about adding the following mechanism:

1. Add an array of some length to store long nop-outs. Protect it with
some lock.
2. If a nop-in (as a response to nop-out) was received after >= 0.7 *
ping_to (or 0.8 or whatever), add some info about it to the array
(when was the nop-out sent, how much time until we got a nop-in etc).
3. The array should be used in a cyclic way - when it gets full,
overwrite the 1st entry.
4. We can dump the info from the array from time to time or the user
may use iscsiadm to do that. When this is done, we can delete the
contents of the array.

Comments? Objections?

Erez

Spears, Steve

unread,

Nov 18, 2009, 11:30:31 AM11/18/09

to open-...@googlegroups.com

Please take me off of your list.

--

You received this message because you are subscribed to the Google
Groups "open-iscsi" group.
To post to this group, send email to open-...@googlegroups.com.
To unsubscribe from this group, send email to
open-iscsi+...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=.

Under Florida Law, e-mail addresses and the contents of the e-mail are public
records. If you do not want your e-mail address or the contents of the e-mail
released in response to a public records request, do not send e-mail to this
entity. This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to which they are
addressed.

Mike Christie

unread,

Nov 18, 2009, 7:49:38 PM11/18/09

to open-...@googlegroups.com

What info did you want to store?

What about some perf counters for it? Add a counter for if a ping took
a-b secs, b-c secs, and c-d secs. Userspace could then just get that
info with the other stats. Also maybe a new iscsi_connection sysfs entry
that exports the avg time to complete a nop might be nice. It might help
people config the value.

You also probably want to add something to reset the counters too.

Erez Zilber

unread,

Nov 19, 2009, 2:36:10 AM11/19/09

to open-...@googlegroups.com

On Thu, Nov 19, 2009 at 2:49 AM, Mike Christie <mich...@cs.wisc.edu> wrote:
> Erez Zilber wrote:
>> open-iscsi sends nop-outs to the target. If the target responds quick
>> enough, we don't get a timeout. I'd like to know (for internal debug
>> purposes) how many times the ping timer almost expired. This sounds
>> like a useful feature also for other open-iscsi developers/users.
>>
>> I was thinking about adding the following mechanism:
>>
>> 1. Add an array of some length to store long nop-outs. Protect it with
>> some lock.
>> 2. If a nop-in (as a response to nop-out) was received after >= 0.7 *
>> ping_to (or 0.8 or whatever), add some info about it to the array
>> (when was the nop-out sent, how much time until we got a nop-in etc).
>> 3. The array should be used in a cyclic way - when it gets full,
>> overwrite the 1st entry.
>> 4. We can dump the info from the array from time to time or the user
>> may use iscsiadm to do that. When this is done, we can delete the
>> contents of the array.
>>
>
> What info did you want to store?

The following info:
1. The exact time when the nop-out was sent.
2. How much time until we got the nop-in.

>
>
> What about some perf counters for it? Add a counter for if a ping took
> a-b secs, b-c secs, and c-d secs.

This is problematic because:
1. You can set your ping timeout to X. According to your suggestion,
we will need to have the following counters:
a. < 0.2*X
b. 0.2*X - 0.4*X
c. 0.4*X - 0.6*X
d. 0.6*X - 0.8*X
e. 0.8*X - X
This means that the names of the counters will change according to
the value of X. You may have different values of X for different
connections which makes this more problematic.
2. If you have the array that I suggest instead of these counters and
you see that on 17:45:07 you had a long ping (that did not time out),
you can check what happened on that time in your target. This may be
very helpful.

Erez

Ulrich Windl

unread,

Nov 19, 2009, 2:38:51 AM11/19/09

to open-...@googlegroups.com

Hi!

Wouldn't it be more obvious to calculate the average delay to a ping request?
(Possibly exponential average as for the system loads) (min and Max would be good
as well, but standard deviation probably requires use of the FPU, so that's not
possible in kernel modules (AFAIK)).

Regards,
Ulrich

Erez Zilber

unread,

Nov 19, 2009, 4:07:38 AM11/19/09

to open-...@googlegroups.com

On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
<ulrich...@rz.uni-regensburg.de> wrote:
> Hi!
>
> Wouldn't it be more obvious to calculate the average delay to a ping request?
> (Possibly exponential average as for the system loads) (min and Max would be good
> as well, but standard deviation probably requires use of the FPU, so that's not
> possible in kernel modules (AFAIK)).

It's in userspace, so (almost) everything is possible. It's nice to
have counters, average delay etc, but I want to be able to know
exactly when bad things almost happened (i.e. timeout almost expired).
Counters/average delay will not help me.

Erez

Ulrich Windl

unread,

Nov 19, 2009, 5:10:09 AM11/19/09

to open-...@googlegroups.com

I thought you want to tune the timeouts. So if properly tuned, the kernel will log
when when your measurements are unusual (i.e. timeout exceeded).

So what yyou really want to have (it seems) is not just a histogram of timeouts,
but a history over time as well.

Ulrich

Mike Christie

unread,

Nov 19, 2009, 3:02:45 PM11/19/09

to open-...@googlegroups.com

Ulrich Windl wrote:
> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>
>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>> <ulrich...@rz.uni-regensburg.de> wrote:
>>> Hi!
>>>
>>> Wouldn't it be more obvious to calculate the average delay to a ping request?
>>> (Possibly exponential average as for the system loads) (min and Max would be good
>>> as well, but standard deviation probably requires use of the FPU, so that's not
>>> possible in kernel modules (AFAIK)).
>> It's in userspace, so (almost) everything is possible. It's nice to
>> have counters, average delay etc, but I want to be able to know
>> exactly when bad things almost happened (i.e. timeout almost expired).
>> Counters/average delay will not help me.
>
> I thought you want to tune the timeouts. So if properly tuned, the kernel will log
> when when your measurements are unusual (i.e. timeout exceeded).
>

I think that is what I wanted. I think Erez wants something a little
different, right Erez?

> So what yyou really want to have (it seems) is not just a histogram of timeouts,
> but a history over time as well.
>
> Ulrich
>

Erez Zilber

unread,

Nov 22, 2009, 3:51:19 AM11/22/09

to open-...@googlegroups.com

On Thu, Nov 19, 2009 at 10:02 PM, Mike Christie <mich...@cs.wisc.edu> wrote:
> Ulrich Windl wrote:
>> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>>
>>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>>> <ulrich...@rz.uni-regensburg.de> wrote:
>>>> Hi!
>>>>
>>>> Wouldn't it be more obvious to calculate the average delay to a ping request?
>>>> (Possibly exponential average as for the system loads) (min and Max would be good
>>>> as well, but standard deviation probably requires use of the FPU, so that's not
>>>> possible in kernel modules (AFAIK)).
>>> It's in userspace, so (almost) everything is possible. It's nice to
>>> have counters, average delay etc, but I want to be able to know
>>> exactly when bad things almost happened (i.e. timeout almost expired).
>>> Counters/average delay will not help me.
>>
>> I thought you want to tune the timeouts. So if properly tuned, the kernel will log
>> when when your measurements are unusual (i.e. timeout exceeded).
>>
>
> I think that is what I wanted. I think Erez wants something a little
> different, right Erez?

I think that it would be nice if we had both:
1. The average delay of a ping request.
2. A list of ping requests that almost timed out with some helpful
info (when was the ping sent and how much time until we got a
response). With this information, you can understand and debug the
whole system: you can check your target and see what caused it to be
so slow on that specific time, you can see if your network was very
busy during that time etc.

Erez

Mike Christie

unread,

Nov 22, 2009, 3:44:24 PM11/22/09

to open-...@googlegroups.com

I think this sounds good to me.

Erez Zilber

unread,

Nov 22, 2009, 4:24:37 PM11/22/09

to open-...@googlegroups.com

Great. I will try to send a patch soon.

Erez

Erez Zilber

unread,

Dec 1, 2009, 8:04:30 AM12/1/09

to open-...@googlegroups.com

Regrading the average delay of a ping request task - we need to have
the average delay, but we're interested only in the average delay of
pings that were sent lately (i.e. not pings that were sent a year
ago). Am I right?

I thought about having a cyclic array of delays in the kernel. It can
hold the delays of the last X pings (e.g. X = 1000). Whenever the user
runs 'iscsiadm -m session -s', this array will be sent to userspace
and we can calc the average delay/standard deviation/whatever you want
in userland.

Comments?

Erez

Erez Zilber

unread,

Dec 8, 2009, 11:21:31 AM12/8/09

to open-...@googlegroups.com

> Regrading the average delay of a ping request task - we need to have
> the average delay, but we're interested only in the average delay of
> pings that were sent lately (i.e. not pings that were sent a year
> ago). Am I right?
>
> I thought about having a cyclic array of delays in the kernel. It can
> hold the delays of the last X pings (e.g. X = 1000). Whenever the user
> runs 'iscsiadm -m session -s', this array will be sent to userspace
> and we can calc the average delay/standard deviation/whatever you want
> in userland.
>
> Comments?
>
> Erez
>

Anyone has comments on this? I'd like to start working on it and need
some feedback.

Thanks,
Erez

Reply all

Reply to author

Forward