Rejection I/O to dead device

938 views
Skip to first unread message

aspasia

unread,
May 29, 2008, 5:06:20 PM5/29/08
to open-iscsi
Hello all,

One of my servers/users reported that suddenly their iscsiRoot was
inaccessible, in the console I saw the following:

ext3 - find entry reading directory #132624 offset - rejecting I/O to
dead device

When I rebooted the host - the iscsiRoot seemed to be happy
again ... :)

I checked its /var/log/messages and did not notice any error.

However, I also noticed that the host up'ed the 2nd NIC - ... I
configured only iscsiRoot with the assumption of only eth0 being
up .... could having eth0 and eth1 possibly cause a confusion over the
iscsiRoot operations?

I have disabled eth1 now, and will observe this user/server and see if
they bring up the same issue in the future ...

Any thoughts or feedback will be greatly appreciated..

- aspasia.

Mike Christie

unread,
May 30, 2008, 3:00:17 PM5/30/08
to open-...@googlegroups.com
aspasia wrote:
> Hello all,
>
> One of my servers/users reported that suddenly their iscsiRoot was
> inaccessible, in the console I saw the following:
>
> ext3 - find entry reading directory #132624 offset - rejecting I/O to
> dead device
>
> When I rebooted the host - the iscsiRoot seemed to be happy
> again ... :)
>
> I checked its /var/log/messages and did not notice any error.
>
> However, I also noticed that the host up'ed the 2nd NIC - ... I
> configured only iscsiRoot with the assumption of only eth0 being
> up .... could having eth0 and eth1 possibly cause a confusion over the
> iscsiRoot operations?
>

It could have if the routing changed, but the initiator would have tried
to reconnect and we would have started using eth1. However, if this
happened more than 5 times to the same IO then we could have seen IO
errors. Either way though you should have seen some iscsi and scsi
errors in the logs.

a s p a s i a

unread,
May 30, 2008, 5:00:33 PM5/30/08
to open-...@googlegroups.com
>
> It could have if the routing changed, but the initiator would have tried
> to reconnect and we would have started using eth1. However, if this
> happened more than 5 times to the same IO then we could have seen IO
> errors. Either way though you should have seen some iscsi and scsi
> errors in the logs.

I noticed these occurred:

May 29 16:47:31 r05s23 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 29 16:47:34 r05s23 iscsid: received iferror -38
May 29 16:47:34 r05s23 last message repeated 3 times
May 29 16:47:34 r05s23 iscsid: connection1:0 is operational after
recovery (1 attempts)

... what id the -38 error code?

thanks in advance.

- a.

>
> >
>

--
A S P A S I A
. . . . . . . . . . ..

Mike Christie

unread,
Jun 1, 2008, 4:05:10 PM6/1/08
to open-...@googlegroups.com
a s p a s i a wrote:
>> It could have if the routing changed, but the initiator would have tried
>> to reconnect and we would have started using eth1. However, if this
>> happened more than 5 times to the same IO then we could have seen IO
>> errors. Either way though you should have seen some iscsi and scsi
>> errors in the logs.
>
> I noticed these occurred:
>
> May 29 16:47:31 r05s23 iscsid: Kernel reported iSCSI connection 1:0
> error (1011) state (3)
> May 29 16:47:34 r05s23 iscsid: received iferror -38
> May 29 16:47:34 r05s23 last message repeated 3 times
> May 29 16:47:34 r05s23 iscsid: connection1:0 is operational after
> recovery (1 attempts)
>
> ... what id the -38 error code?
>

It just means that userspace tried to set a feature the kernel did not
support. It is not serious.

If there was nothing before that first line about the kernel reporting a
connection error, then the target could have initiated this. Do you see
anything in the target logs? What target is this again?

a s p a s i a

unread,
Jun 4, 2008, 6:54:12 PM6/4/08
to open-...@googlegroups.com
Wanted to update on this issues again ...

> It just means that userspace tried to set a feature the kernel did not
> support. It is not serious.
>
> If there was nothing before that first line about the kernel reporting a
> connection error, then the target could have initiated this. Do you see
> anything in the target logs? What target is this again?
>

Target is: Centos51 box also. We have noticed this to happen on this
one client (centos51 also) every 2-3 days ... This client seems to do
a lot of FS-related testing, not against the iscsiRoot I/O to another
FS-mounted system. No messages in the target logs pertaining to this
particular client and its iscsi target.

Today the same symptom occurred again, I was able to scribble the
messages in the console ... (now, I have a terminal server console
hooked-up and I'm capturing its output in file so I will have a more
complete logging info):


psid 0 of 1
psid 0 of 1
oversize name in 0000
psid 0 of 1
psid 0 of 1
|
|
readdir corruption in 000
psid 0 of 1
readdir corruption in 000
|
|
iscsi: cmd 0x2a is not queued (6)
end_request: I/O error dev sde; sector 50640
|
|
Buffer I/O error on device sde1, logical block 6323 last page write
due to I/O error on sde 1
iscsi: cmd 0x2a is not queued (6)
Aborting journal on device sde1
iscsi: cmd 0x2a is not queued (6)
sd 6:0:0:0 rejection I/O to device being mounted (this repeats a few times)
|
|
EXT3-fs error (device sde1): ext3_find_entry: reading directory 96001 offset 0

......
|

-------

I upgraded this server with a new iscsiRoot image that incorporates
Mike's suggestion to increase timeout specific for iscsiRoot type
connections ... would it be possible inaccessibility of root device is
possibly cause by this?

thanks in advance,

- aspasia.

Mike Christie

unread,
Jun 4, 2008, 9:06:02 PM6/4/08
to open-...@googlegroups.com
a s p a s i a wrote:
> Wanted to update on this issues again ...
>
>> It just means that userspace tried to set a feature the kernel did not
>> support. It is not serious.
>>
>> If there was nothing before that first line about the kernel reporting a
>> connection error, then the target could have initiated this. Do you see
>> anything in the target logs? What target is this again?
>>
>
> Target is: Centos51 box also.

What target is running on Centos? Is it IET or stgt or something else?

> Buffer I/O error on device sde1, logical block 6323 last page write
> due to I/O error on sde 1
> iscsi: cmd 0x2a is not queued (6)
> Aborting journal on device sde1
> iscsi: cmd 0x2a is not queued (6)

This means we either got a error from the target and we tried to
relogin, but we ended up killing the session because either the target
told us it was going away permantly or we bugged out and could not
handle the problem.

What are you running on the initiator side? Is that Centos 5.1 too? Are
you using the initiator that comes with it or a open-iscsi.org release?

What we need is some iscsid outpout which would be a little bit before
the log output you sent which would tell us why it decided to kill the
session. It would be something about a bug or fatal error or could not
log into target.

Mike Christie

unread,
Jun 4, 2008, 9:07:16 PM6/4/08
to open-...@googlegroups.com
Mike Christie wrote:
> a s p a s i a wrote:
>> Wanted to update on this issues again ...
>>
>>> It just means that userspace tried to set a feature the kernel did not
>>> support. It is not serious.
>>>
>>> If there was nothing before that first line about the kernel reporting a
>>> connection error, then the target could have initiated this. Do you see
>>> anything in the target logs? What target is this again?
>>>
>>
>> Target is: Centos51 box also.
>
> What target is running on Centos? Is it IET or stgt or something else?
>

Oh yeah, are you doing anything on the target? Rebooting it? Changing
the config? Adding or removing targets or portals?

a s p a s i a

unread,
Jun 5, 2008, 12:06:55 AM6/5/08
to open-...@googlegroups.com
>Oh yeah, are you doing anything on the target? Rebooting it? Changing
the config? Adding or removing targets or portals?

.... in the target when the /etc/ietd.conf file is updated with a new
iscsi root, a brief "service iscsi-target restart" occurs .... we're
thinking of just keeping a static list and not bother it any longer
....

- a.

Mike Christie

unread,
Jun 5, 2008, 12:14:11 PM6/5/08
to open-...@googlegroups.com
a s p a s i a wrote:
>> Oh yeah, are you doing anything on the target? Rebooting it? Changing
> the config? Adding or removing targets or portals?
>
> .... in the target when the /etc/ietd.conf file is updated with a new
> iscsi root, a brief "service iscsi-target restart" occurs .... we're
> thinking of just keeping a static list and not bother it any longer
> ....
>

Ah ok, as you saw in the other thread that can cause a disruption. We
should log back in. If you could try this rpm:
http://people.redhat.com/mchristi/iscsi/RHEL5/5.2/rpms/iscsi/iscsi-initiator-utils-6.2.0.868-0.7.el5.src.rpm

It is what will be in RHEL 5.2/Centos5.2. It fixes a couple places where
we would try to login but the target was rebooting/restarting and
would return errors that might have indicated that it was coming back
but we did not interpret them right and would kill the session.

Reply all
Reply to author
Forward
0 new messages