problem with kernel 2.6.28 but not with 2.6.27

103 views
Skip to first unread message

Joe Landman

unread,
Mar 11, 2009, 1:47:02 AM3/11/09
to open-...@googlegroups.com
Hi folks:

We are having a problem with kernel 2.6.28 and the git repository
code (the latest semi-stable code doesn't support 2.6.28). Same
hardware, same targets, the 2.6.27 kernel can see our targets, but
2.6.28.4 can't. RHEL 5.3 on one machine Centos 5.2 on another, both
x86_64, with git code from repository (needed due to lack of 2.6.28
support in the 870.3 code from what I could see).

Here is what we see

[root@jackrabbit open-iscsi]# uname -r
2.6.27.5
[root@jackrabbit open-iscsi]# iscsiadm --mode discovery --type
sendtargets --portal 192.168.5.117
192.168.5.117:3260,1 iqn.2008-08.com.scalableinformatics:tiburon.dos.boot.image
192.168.5.117:3260,1
iqn.2008-08.com.scalableinformatics:tiburon.seagate.flash.cd
192.168.5.117:3260,1
iqn.2008-08.com.scalableinformatics:tiburon.suse10.3.x64.install.dvd
192.168.5.117:3260,1
iqn.2008-08.com.scalableinformatics:tiburon.ubuntu.install.cd
192.168.5.117:3260,1 iqn.2008-10.com.scalableinformatics:tiburon.iscsi.boot.disk

but 2.6.28.4 doesn't

[root@jackrabbit open-iscsi]# uname -r
2.6.28.4

[root@jackrabbit open-iscsi]# iscsiadm --mode discovery --type
sendtargets --portal 192.168.5.117
iscsiadm: Cannot perform discovery. Initiatorname required.
iscsiadm: Discovery process to 192.168.5.117:3260 failed to create a
discovery session.

moreover, performing an explicit restart

/etc/init.d/open-iscsi restart

yields this:

[root@jackrabbit ~]# /etc/init.d/open-iscsi restart
Stopping iSCSI initiator service: [ OK ]
Starting iSCSI initiator service: FATAL: Error inserting iscsi_tcp
(/lib/modules/2.6.28.4/kernel/drivers/scsi/iscsi_tcp.ko): Unknown
symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting ib_iser
(/lib/modules/2.6.28.4/kernel/drivers/infiniband/ulp/iser/ib_iser.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
[ OK ]
Setting up iSCSI targets: Logging in to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.ubuntu.install.cd, portal:
192.168.5.117,3260]
Logging in to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.dos.boot.image, portal:
192.168.5.117,3260]
Logging in to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.suse10.3.x64.install.dvd,
portal: 192.168.5.117,3260]
Logging in to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.seagate.flash.cd, portal:
192.168.5.117,3260]
Logging in to [iface: default, target:
iqn.2008-10.com.scalableinformatics:tiburon.iscsi.boot.disk, portal:
192.168.5.117,3260]
iscsiadm: Could not login to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.ubuntu.install.cd, portal:
192.168.5.117,3260]:
iscsiadm: initiator reported error (13 - daemon access denied)
iscsiadm: Could not login to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.dos.boot.image, portal:
192.168.5.117,3260]:
iscsiadm: initiator reported error (13 - daemon access denied)
iscsiadm: Could not login to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.suse10.3.x64.install.dvd,
portal: 192.168.5.117,3260]:
iscsiadm: initiator reported error (13 - daemon access denied)
iscsiadm: Could not login to [iface: default, target:
iqn.2008-08.com.scalableinformatics:tiburon.seagate.flash.cd, portal:
192.168.5.117,3260]:
iscsiadm: initiator reported error (13 - daemon access denied)
iscsiadm: Could not login to [iface: default, target:
iqn.2008-10.com.scalableinformatics:tiburon.iscsi.boot.disk, portal:
192.168.5.117,3260]:
iscsiadm: initiator reported error (13 - daemon access denied)
iscsiadm: Could not log into all portals. Err 13.

In the logs, I found this:

Mar 11 01:29:25 jackrabbit iscsid: peeruser_unix: unknown local user with uid 0

I traced this with some googling to the usr/mgmt_ipc.c code. The
specific code which is tossing this error is

pass = getpwuid(peercred.uid);
if (pass == NULL) {
log_error("peeruser_unix: unknown local user with uid %d",
(int) peercred.uid);
return 0;
}

Basically the code is dying on the return from the getpwuid call.
Somehow pass is set to null for peercred.uid == 0. Strange.

As a sanity check, I compiled and built this:

#include <sys/types.h>
#include <pwd.h>
#include <stdio.h>

main() {
struct passwd *p;
uid_t uid=0;

if ((p = getpwuid(uid)) == NULL)
perror("getpwuid() error");
else {
printf("getpwuid() returned the following info for uid %d:\n",
(int) uid);
printf(" pw_name : %s\n", p->pw_name);
printf(" pw_uid : %d\n", (int) p->pw_uid);
printf(" pw_gid : %d\n", (int) p->pw_gid);
printf(" pw_dir : %s\n", p->pw_dir);
printf(" pw_shell : %s\n", p->pw_shell);
}
}

compiling and running the code ...

[root@jackrabbit ~]# gcc t.c -o t.x
[root@jackrabbit ~]# ./t.x
getpwuid() returned the following info for uid 0:
pw_name : root
pw_uid : 0
pw_gid : 0
pw_dir : /root
pw_shell : /bin/bash

so obviously getpwuid is doing the right thing when UID == 0.

The question is why is this not working right in the mgmt_ipc version?

So I instrumented that a bit, played with errno and added in some more
sanity checks:

log_error("sizeof(peercred): %d",sizeof(peercred));
log_error("so_len : %d",so_len);

errno = 0;
pass = getpwuid(peercred.uid);
if (pass == NULL) {
log_error("peeruser_unix: unknown local user with uid %d",
(int) peercred.uid);
log_error("error return: %s", strerror(errno));
return 0;
}

and now the logs have this:

Mar 10 16:34:24 iscsid: sizeof(peercred): 12
Mar 10 16:34:24 iscsid: so_len : 12
Mar 10 16:34:24 iscsid: peeruser_unix: unknown local user with uid 0
Mar 10 16:34:24 iscsid: error return: Success
Mar 10 16:34:24 iscsid: sizeof(peercred): 12
Mar 10 16:34:24 iscsid: so_len : 12
Mar 10 16:34:24 iscsid: peeruser_unix: unknown local user with uid 0
Mar 10 16:34:24 iscsid: error return: Success

So errno isn't being set, or the call is working, but setting pass to
NULL. This seems wrong.

I thought SElinux, but then I tried this on a second machine with
SElinux disabled. Doesn't work in the same way on the non-SElinux
machine, so that isn't it.

Any clues as to why this works on 2.6.27 and not on 2.6.28? Some API
change that hasn't been factored in yet?

Or does someone have it working on 2.6.28 somewhere?

Thanks!

Joe

Mike Christie

unread,
Mar 11, 2009, 3:26:18 PM3/11/09
to open-...@googlegroups.com
Joe Landman wrote:
>
> Any clues as to why this works on 2.6.27 and not on 2.6.28? Some API
> change that hasn't been factored in yet?
>

It is a bug in the 2.6.28.x stable kernel. I think you need to upgrade
to 2.6.28.7. The bug got into one of the 2.6.29-rc releases and then I
think it was ported to a 2.6.28.x, then around 2.6.29-rc4 - rc5 it got
fixed and I think it then got fixed in .28 stable too.

It was due to some mlockall mm kernel change.

Reply all
Reply to author
Forward
0 new messages