ISCSI boot: working FC4 example

Visto 55 veces
Saltar al primer mensaje no leído

Mike Ingle

no leída,
28 feb 2006, 20:18:1428/2/06
a open-...@googlegroups.com
Booting from an ISCSI drive using Fedora Core 4 and open-iscsi-1.0.485
----------------------------------------------------------------------

There has been some discussion of booting on the list. Here is an
approach I am using, with an only slightly modified open-iscsi.

This writeup shows how to boot a Fedora Core 4 system from an ISCSI
root device, with no requirement for an internal disk. This process was
developed for failover purposes. We have a redundant ISCSI array and a
group of blades. Using this process we can boot any blade into any
function. If a blade crashes, another can be booted to take over the
same function.

The general approach is to modify the initrd to bring up ISCSI, mount
the root, and transfer control over to it. In doing this, I ran into a
number of problems. Open-iscsi required minor mods to run from initrd,
and some changes to the init process were required.

Normally when a Fedora Core 4 system boots, the "nash" interpreter in
the initrd runs the startup script, and then does a switch_root and
exits. The kernel then runs /sbin/init on the real root as PID 1. I
discovered that if you bring up "iscsid" in the initrd and leave it
running, this does not work. Init gets run as some pid other than 1, and
the system does not boot. Nor does a pivot_root approach in "nash" work.

You have to remove "K90network" from "/etc/rc.d/rc6.d",
"/etc/rc.d/rc1.d" and "/etc/rc.d/rc0.d" otherwise shutdowns and reboots
hang when the root file system goes away.

Step 1: copy your system image to the ISCSI drive and prepare it. Copy
the working FC4 image onto the root of the ISCSI partition. Change the
/etc/fstab so /dev/sda1 is root and remove other invalid partitions.
Here I am assuming there is only one partition mounted as root. If you
have others you will have to set them up. Remove K90network and make a
/initrd directory under your ISCSI root.

Step 2: modify open-iscsi. On line 238 of mgmt_ipc.c iscsid checks the
userid of the connecting iscsiadm process and rejects unknown users.
That does not work in the initrd environment even if you put in a
passwd file, so hard-code root:


*** mgmt_ipc.c.orig 2006-02-28 13:11:09.000000000 -0800
--- mgmt_ipc.c 2006-02-28 13:14:14.000000000 -0800
***************
*** 237,246 ****
--- 237,251 ----

pass = getpwuid(peercred.uid);
if (pass == NULL) {
log_error("peeruser_unix: unknown local user with uid
%d",
(int) peercred.uid);
+ if(peercred.uid == 0) {
+ strcpy(user, "root");
+ return 1;
+ }
+
return 0;
}

strncpy(user, pass->pw_name, PEERUSER_MAX);
return 1;


Modify iscsid.c to disable daemon logging:

*** iscsid.c.orig 2006-02-02 00:13:44.000000000 -0800
--- iscsid.c 2006-02-28 16:58:24.000000000 -0800
***************
*** 258,263 ****
--- 258,264 ----
pid_t pid;
int fd;

+ log_daemon = 0;
fd = open(pid_file, O_WRONLY|O_CREAT, 0644);
if (fd < 0) {
log_error("unable to create pid file");
***************
*** 285,294 ****
sprintf(buf, "%d\n", getpid());
write(fd, buf, strlen(buf));

! close(0);
! open("/dev/null", O_RDWR);
! dup2(0, 1);
! dup2(0, 2);
setsid();
} else {
if ((control_fd = ipc->ctldev_open()) < 0) {
--- 286,295 ----
sprintf(buf, "%d\n", getpid());
write(fd, buf, strlen(buf));

! // close(0);
! // open("/dev/null", O_RDWR);
! // dup2(0, 1);
! // dup2(0, 2);
setsid();
} else {
if ((control_fd = ipc->ctldev_open()) < 0) {


I also have to make the following change to io.c to get ISCSI working
in any environment, not just booting:

*** io.c.orig 2006-02-28 16:37:32.000000000 -0800
--- io.c 2006-02-28 16:38:19.000000000 -0800
***************
*** 145,151 ****
log_debug(1, "connecting to %s:%s", host, serv);
if (non_blocking)
set_non_blocking(conn->socket_fd);
! rc = connect(conn->socket_fd, (struct sockaddr *) ss, sizeof
(*ss));
return rc;
}

--- 145,151 ----
log_debug(1, "connecting to %s:%s", host, serv);
if (non_blocking)
set_non_blocking(conn->socket_fd);
! rc = connect(conn->socket_fd, (struct sockaddr *) ss, 16);
return rc;
}

Since there are no shared libraries in the initrd, we need to compile
the user-mode parts of open-iscsi static and fix a couple of library
references.

*** Makefile.orig 2006-02-28 13:17:25.000000000 -0800
--- Makefile 2006-02-28 13:19:30.000000000 -0800
***************
*** 21,27 ****
endif
endif
IPC_OBJ=netlink.o
! DBM_LIB=-ldb
else
ifeq ($(OSNAME),FreeBSD)
IPC_CFLAGS=
--- 21,27 ----
endif
endif
IPC_OBJ=netlink.o
! DBM_LIB=-ldb-4.3 -lpthread
else
ifeq ($(OSNAME),FreeBSD)
IPC_CFLAGS=
***************
*** 42,51 ****

iscsid: $(COMMON_SRCS) $(IPC_OBJ) iscsid.o mgmt_ipc.o initiator.o \
actor.o queue.o
! $(CC) $^ $(DBM_LIB) -o $@

iscsiadm: $(COMMON_SRCS) strings.o discovery.o iscsiadm.o
! $(CC) $^ $(DBM_LIB) -o $@

clean:
rm -f *.o $(PROGRAMS)
--- 42,51 ----

iscsid: $(COMMON_SRCS) $(IPC_OBJ) iscsid.o mgmt_ipc.o initiator.o \
actor.o queue.o
! $(CC) -static $^ $(DBM_LIB) -o $@

iscsiadm: $(COMMON_SRCS) strings.o discovery.o iscsiadm.o
! $(CC) -static $^ $(DBM_LIB) -o $@

clean:
rm -f *.o $(PROGRAMS)


Step 3: make your initrd.
You will need to make a bigger initrd to hold all this stuff. An initrd
is just a gzipped ext2 file system, so something like:

gzip -dc ORIGINAL-INITRD > /tmp/initrd-old
mkdir /initrd-old /initrd-new
mount -o loop /tmp/initrd-old /initrd-old
dd if=/dev/zero of=/tmp/initrd-new bs=1048576 count=8
mke2fs /tmp/initrd-new
mount -o loop /tmp/initrd-new /initrd-new
( cd /tmp/initrd-old ; tar cpf - . ) | \
( cd /tmp/initrd-new ; tar xvpf - )

Now copy in the iscsi components:
cp /root/open-iscsi-1.0-485/usr/iscsid /initrd-new/bin/
cp /root/open-iscsi-1.0-485/usr/iscsiadm /initrd-new/bin/
cp /root/open-iscsi-1.0-485/kernel/scsi_transport_iscsi.ko \
/initrd-new/lib/
cp /root/open-iscsi-1.0-485/kernel/iscsi_tcp.ko /initrd-new/lib/

You will need "ifconfig" and "route" from the net-tools distribution,
which will also need to be compiled static. Copy those under bin too.
Also copy your Ethernet card driver into lib. Mine is tg3.ko

At the root of the initrd, the startup script is hard linked as linuxrc
and init. This has to change:

cd /initrd-new
mv linuxrc linuxrc.nash
rm init

I had no luck using "nash" as the init process. It has no "exec" so
there is no way to pass control over to the real /sbin/init as pid 1.
I wrote my own startup stub like this:

#include <unistd.h>
#include <linux/unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
main()
{
int junk;
int pid1,pid2;
close(0);
close(1);
close(2);
open("/dev/console",O_RDONLY);
open("/dev/console",O_WRONLY);
open("/dev/console",O_WRONLY);
printf("Started as pid %u\n",getpid());
printf("Running nash\n");
pid1=fork();
if(pid1==0) {
printf("forked now execing\n");
execl("/bin/nash","nash","--force","/linuxrc.nash",0);
printf("exec failed\n");
exit(0);
}
do {
pid2=wait(&junk);
} while(pid1 != pid2);
printf("Nash finished\n");
printf("Pivot_root\n");
if(pivot_root("/sysroot","/sysroot/initrd")<0) {
printf("Pivot_root returned errno=%u\n",errno);
}
chdir("/");
printf("Running /sbin/init\n");
execl("/sbin/init","init","auto",0);
printf("Init failed\n");
}

Compile it -static and copy to "init" in the initrd. I also hard linked
it to "linuxrc" just in case. It expects the startup script to be
linuxrc.nash instead of linuxrc.

You should have already done iscsiadm discovery and login. Copy
/var/db/iscsi/discovery.db and /var/db/iscsi/node.db under initrd using
the full path, i.e. /initrd-new/var/db/iscsi (which you need to mkdir.)


My linuxrc.nash is this:
#!/bin/nash

mount -t proc /proc /proc
#setquiet
echo Mounted /proc filesystem
echo Mounting sysfs
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mknod /dev/console c 5 1
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mkdir /dev/pts
mkdir /dev/shm
echo Starting udev
/sbin/udevstart
echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug
echo "Loading network driver"
insmod /lib/tg3.ko
sleep 5
ifconfig lo up 127.0.0.1
ifconfig eth3 172.22.1.7 netmask 255.255.255.0 broadcast 172.22.1.255
ifconfig eth3 up
route add -net 172.22.1.0/24 dev eth3
ifconfig eth3
sleep 60
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading scsi_transport_iscsi.ko module"
insmod /lib/scsi_transport_iscsi.ko
echo "Loading iscsi_tcp.ko module"
insmod /lib/iscsi_tcp.ko
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko
echo "Loading jbd.ko module"
insmod /lib/jbd.ko
echo "Loading ext3.ko module"
insmod /lib/ext3.ko
echo "Loading reiserfs.ko module"
insmod /lib/reiserfs.ko
echo "Starting iscsid"
iscsid
sleep 5
echo "Logging into iscsi"
iscsiadm -m node --record=cea8e5 --login
sleep 10
echo Creating root device
mknod /dev/root b 8 1
echo Mounting root filesystem
mkdir /sysroot
mount -o defaults --ro -t reiserfs /dev/root /sysroot
umount /sys
umount /dev

---------------------------------

The "sleep 60" is required by a slow to initialize switch. You may need
less time. The record number in iscsiadm is the one expected by the
node.db you copied over. The sleep after iscsiadm leaves time for the
ISCSI connection to come up. You might have to increase it. If you are
using ext3 or other filesystem instead of reiserfs, you have to change
that too.


Step 4: booting
Now you can umount the initrd-new, gzip the image, and you should have a
usable initrd. You will need some way to get the kernel and initrd into
memory on the machine you intend to boot. I am using PXE over the net.
My kernel parameters are:

KERNEL vmlinuz-whatever
APPEND root=/dev/ram0 rw initrd=initrd-iscsi.img vga=791 init=/init
IPAPPEND=1

Notice the initrd is the root filesystem, and /init is the real init, as

far as the kernel boot is concerned. If you let it start as /linuxrc you

do not get pid=1 and nothing else works.

When you boot, the small init stub gets initial control and runs the
nash as a subprocess, while hanging onto pid 1. Once nash is finished,
the stub does pivot_root and then execs /sbin/init as pid=1 and
everything should work normally from there. The ramdisk stays mounted
under /initrd since there are processes running in it.


Aggarwal, VikasX

no leída,
1 mar 2006, 0:48:141/3/06
a open-...@googlegroups.com
Hi,
How did u know that boot interface has to be eth3 as you r assigning
ip to that.
-vikas aggarwal

is...@digitaltadpole.com

no leída,
1 mar 2006, 14:30:341/3/06
a open-iscsi
WOW!!! Great stuff! I haven't finished beating my way thru this all
yet, and I would like to compare it to the notes provided by Bryan
Black on http://www.linux-iscsi.org/index.php/Talk:Main_Page . The
notes in the thread above really complement and provide additional
insight into the "inner workings" of the boot process.

Thanks for sending this info out. I don't think there's a general
solution in place yet for iscsi boot, which would be a HUGE advantage
over regular SANs and normal Storage practicies.

I'll say a little more in a few days after I absorb all the information
and test it out a little.

Regards,

Mike Mazarick

PS - KEEP UP THE GREAT WORK!!

is...@digitaltadpole.com

no leída,
1 mar 2006, 14:31:281/3/06
a open-iscsi

Mike Ingle

no leída,
1 mar 2006, 20:57:171/3/06
a open-...@googlegroups.com
I found out by comparing the MAC address, which I had to put into the
DHCP server to boot. Single NIC machines will use eth0. This machine has
four NICs and the one that Linux 2.6 considers eth3 shows up as eth0 in
Linux 2.4

-----Original Message-----
From: open-...@googlegroups.com [mailto:open-...@googlegroups.com]
On Behalf Of Aggarwal, VikasX
Sent: Tuesday, February 28, 2006 9:48 PM
To: open-...@googlegroups.com
Subject: RE: ISCSI boot: working FC4 example


Hi,
How did u know that boot interface has to be eth3 as you r assigning
ip to that.
-vikas aggarwal

===============================================================

Thank you. This machine has been up all day today and has not crashed
yet (crossing fingers.) So far I have moved 50GB or so. This is using an
MPC Dataframe 420, which has no working Linux hardware ISCSI card that I
know of. A QLogic card failed miserably. I am using:

Linux 2.6.14-1.1656_FC4smp #1 SMP Thu Jan 5 22:24:06 EST 2006 i686 i686
i386 GNU/Linux

With older kernels, the ISCSI tended to hang.

Soon I will test a clone of our production environment, which has Oracle
10G and JBOSS. Let's see if the ISCSI is stable enough to host a
database.

-----Original Message-----
From: open-...@googlegroups.com [mailto:open-...@googlegroups.com]

Mike Christie

no leída,
1 mar 2006, 21:15:251/3/06
a open-...@googlegroups.com
Mike Ingle wrote:
> Booting from an ISCSI drive using Fedora Core 4 and open-iscsi-1.0.485
> ----------------------------------------------------------------------
>

Nice stuff.

>
> You have to remove "K90network" from "/etc/rc.d/rc6.d",
> "/etc/rc.d/rc1.d" and "/etc/rc.d/rc0.d" otherwise shutdowns and reboots
> hang when the root file system goes away.
>

We have this network problem for other cases like when using mount by
label with DM or MD RAID or dm-multipath.

Instead of preventing the network script from running during shutdown,
distros could try to add a script to work around all the cases (like Red
Hat's netfs) but that may be a pain and for iscsi root I am not sure
what we can do. Does it make sense to have the install script just
automate the rm K90network (or whatever it is in other distros) or is
that pretty evil?


>
> I also have to make the following change to io.c to get ISCSI working
> in any environment, not just booting:
>
> *** io.c.orig 2006-02-28 16:37:32.000000000 -0800
> --- io.c 2006-02-28 16:38:19.000000000 -0800
> ***************
> *** 145,151 ****
> log_debug(1, "connecting to %s:%s", host, serv);
> if (non_blocking)
> set_non_blocking(conn->socket_fd);
> ! rc = connect(conn->socket_fd, (struct sockaddr *) ss, sizeof
> (*ss));
> return rc;
> }
>
> --- 145,151 ----
> log_debug(1, "connecting to %s:%s", host, serv);
> if (non_blocking)
> set_non_blocking(conn->socket_fd);
> ! rc = connect(conn->socket_fd, (struct sockaddr *) ss, 16);
> return rc;
> }
>

You do not by any chance know what the errno is after that failure so we
can try to fix this case? I thought we had fixed this in the past when
ipv6 was added.

Aggarwal, VikasX

no leída,
1 mar 2006, 21:21:261/3/06
a open-...@googlegroups.com
Trying to clear my doubts further. Will it require rebuilding the
RAMDISK if sys admin wants to boot through a different NIC on a server
having myultiple NICs?

Also what will happen if the NIC enumerated as eth0 is physically
removed from the system. Will it still work with eth3? Or even if it
works, will there be a mismatch between what MAC address is recorded in
/etc/sysconfig/network-scripts/ifcfg-ethX and the ethX assigned to that
mac address during initrd(linuxrc) stage. Reason I have this doubt
because /etc/* is not available during initrd stage.


Thanks!!
-vikas

Mike Ingle

no leída,
2 mar 2006, 17:46:582/3/06
a open-...@googlegroups.com
> I also have to make the following change to io.c to get ISCSI working
> in any environment, not just booting:
>
> *** io.c.orig 2006-02-28 16:37:32.000000000 -0800
> --- io.c 2006-02-28 16:38:19.000000000 -0800
> ***************
> *** 145,151 ****
> log_debug(1, "connecting to %s:%s", host, serv);
> if (non_blocking)
> set_non_blocking(conn->socket_fd);
> ! rc = connect(conn->socket_fd, (struct sockaddr *) ss, sizeof
> (*ss));
> return rc;
> }
>
> --- 145,151 ----
> log_debug(1, "connecting to %s:%s", host, serv);
> if (non_blocking)
> set_non_blocking(conn->socket_fd);
> ! rc = connect(conn->socket_fd, (struct sockaddr *) ss, 16);
> return rc;
> }
>

You do not by any chance know what the errno is after that failure so we

can try to fix this case? I thought we had fixed this in the past when
ipv6 was added.

------------------------------

The sizeof() parameter is wrong. It is as I recall too long. This
problem has been there forever, at least in ES3.

Mike

Mike Ingle

no leída,
2 mar 2006, 17:50:332/3/06
a open-...@googlegroups.com
Yes there are a variety of gotchas there. With the existing setup you do
have to rebuild the RAMDISK if you want to change the IP address etc. I
suggest you either automate the rebuilding of the RAMDISK or put some
logic (using busybox perhaps) into the boot script within the RAMDISK.

Mike

-----Original Message-----
From: open-...@googlegroups.com [mailto:open-...@googlegroups.com]
On Behalf Of Aggarwal, VikasX
Sent: Wednesday, March 01, 2006 6:21 PM
To: open-...@googlegroups.com
Subject: RE: ISCSI boot: working FC4 example

Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos