iscsid stop using system not working correctly

1,690 views
Skip to first unread message

The Lee-Man

unread,
Jan 30, 2015, 5:09:54 PM1/30/15
to open-...@googlegroups.com
Hi Mike:

Just a heads up that stopping the open-iscsi iscsid daemon using systemd doesn't seem to be working correctly, at least not on SUSE SLE 12.

When I have one or more sessions present, and their startup value is set to "manual", when I try to stop the iscsid service, I get:

    # systemctl stop iscsid.service
    Job for iscsid.service canceled.

And a "ps" shows that iscsid is still running, but under a new process id. And, at times, I see that "iscsiadm -k 0 2" is hung.

Note that my systemd iscsid.service unit file looks like:

    [Unit]
    Description=Open-iSCSI
    Documentation=man:iscsid(8) man:iscsiuio(8) man:iscsiadm(8)
    DefaultDependencies=no
    After=network.target iscsiuio.service
    Before=remote-fs-pre.target

    [Service]
    Type=simple
    ExecStart=/sbin/iscsid -f
    ExecStop=/sbin/iscsiadm -k 0 2

    [Install]
    WantedBy=multi-user.target
    Also=iscsid.socket

While trying to track down this problem, I found a couple of issues:

1. I don't know what the "2" is on the "iscsiadm -k 0 2" is for. I see that command in systemd unit file in your master branch, and I see that other distributions use that, but the "2" seems to be ignored.

2. It looks like the "iscsiadm -k 0" is stopping the iscsi daemon, but that systemd is restarting it! I'm guessing this because (a) I get the "cancelled" message, and (b) iscsid is still running, but as a new process, i.e. it's been restarted.

I thought perhaps the problem was related to running iscsid as a "simple" service, in the foreground, but changing Type to forking and removing the "-f" from the iscsid command line did not change anything.

Then I simply commented out the "ExecStop" line, and iscsid now shutdowns correctly. This, I believe, is because systemd is shutting it down by sending first SIGHUP then SIGKILL, as per "man systemd.kill".

Before I comment out the "ExecStop=" for all SLE 12 users, I wondered if you've heard of any problems along these lines.

The Lee-Man

unread,
Jan 30, 2015, 5:14:26 PM1/30/15
to open-...@googlegroups.com
I believe I found out more about the problem. It is because the iscsid service is socket-activated. There is an iscsid.socket service file, and that is used to wake up and start iscsid if iscsiadm tries to talk to iscsid and iscsid is not running.

I believe that what is happening is that:
1. iscsiadm tells iscsid to stop using a socket that systemd is monitoring
2. iscsid gets the message, cleans up, closes the socket, and exits
3. systemd restarts the service because the socket communication failed
4. systemd returns failure on the message (the "...cancelled" message) and restarts the iscsid service that just stopped

I looked for some systemd.socket(5) variable that would control this, but I see none.

I think the fundamental problem is that I am trying to stop a service via the socket that
was used to socket-activate the process. Kind of like trying to put away a ladder I am standing on. :)

Chris Leech

unread,
Feb 4, 2015, 1:44:33 PM2/4/15
to open-...@googlegroups.com, leeman...@gmail.com
On Fri, Jan 30, 2015 at 02:09:54PM -0800, The Lee-Man wrote:
> Hi Mike:
>
> Just a heads up that stopping the open-iscsi iscsid daemon using systemd
> doesn't seem to be working correctly, at least not on SUSE SLE 12.
>
> When I have one or more sessions present, and their startup value is set to
> "manual", when I try to stop the iscsid service, I get:
>
> # systemctl stop iscsid.service
> Job for iscsid.service canceled.

That's not the normal service stop message is it? Job canceled? Did
you somehow have a pending start job queued up in systemd even though
iscsid was running?

> And a "ps" shows that iscsid is still running, but under a new process id.
> And, at times, I see that "iscsiadm -k 0 2" is hung.

I was going to say that this is about the only problem I haven't run
into with systemd, but it looks like I can hit this (some of the time,
not always or even often) if I loop on socket activation and service
stopping.

> 2. It looks like the "iscsiadm -k 0" is stopping the iscsi daemon, but that
> systemd is restarting it! I'm guessing this because (a) I get the
> "cancelled" message, and (b) iscsid is still running, but as a new process,
> i.e. it's been restarted.

I'm wondering if it has something to do with the ExecStop command
returning before iscsid actually terminates, although I'm not sure
what's triggering the service restart.

> I thought perhaps the problem was related to running iscsid as a "simple"
> service, in the foreground, but changing Type to forking and removing the
> "-f" from the iscsid command line did not change anything.

That's about the only difference I see from what I have in Fedora/RHEL.

- Chris

The Lee-Man

unread,
Feb 4, 2015, 3:51:08 PM2/4/15
to open-...@googlegroups.com, leeman...@gmail.com, cle...@redhat.com
On Wednesday, February 4, 2015 at 10:44:33 AM UTC-8, Chris Leech wrote:
On Fri, Jan 30, 2015 at 02:09:54PM -0800, The Lee-Man wrote:
> Hi Mike:
>
> Just a heads up that stopping the open-iscsi iscsid daemon using systemd
> doesn't seem to be working correctly, at least not on SUSE SLE 12.
>
> When I have one or more sessions present, and their startup value is set to
> "manual", when I try to stop the iscsid service, I get:
>
>     # systemctl stop iscsid.service
>     Job for iscsid.service canceled.

That's not the normal service stop message is it?  Job canceled?  Did
you somehow have a pending start job queued up in systemd even though
iscsid was running?

No, "Job ... canceled." is systemd-speak for "your request failed". If I remove the "ExecStop=" attribute from the service file I do not get this message, and the "stop" succeeds. 

> And a "ps" shows that iscsid is still running, but under a new process id.
> And, at times, I see that "iscsiadm -k 0 2" is hung.

I was going to say that this is about the only problem I haven't run
into with systemd, but it looks like I can hit this (some of the time,
not always or even often) if I loop on socket activation and service
stopping.

> 2. It looks like the "iscsiadm -k 0" is stopping the iscsi daemon, but that
> systemd is restarting it! I'm guessing this because (a) I get the
> "cancelled" message, and (b) iscsid is still running, but as a new process,
> i.e. it's been restarted.

I'm wondering if it has something to do with the ExecStop command
returning before iscsid actually terminates, although I'm not sure
what's triggering the service restart.

I think you're onto the meat of the problem: the sequencing systemd does between the "ExecStop=" and the command itself. 

The systemd.service(5) man page says about ExecStop:

"...After the commands configured in this option are run, all processes remaining for a service are terminated according to the KillMode= setting (see systemd.kill(5))."

So I'm thinking systemd does not like that this command actually stops the service. But this only happens if the service is socket-activated, so it must be the socket service that is restarting iscsid. If I stop iscsid.socket before iscsid.service, this does not happen. But systemd doesn't normally shut things down that way, since iscsid.service depends on iscsid.socket.

> I thought perhaps the problem was related to running iscsid as a "simple"
> service, in the foreground, but changing Type to forking and removing the
> "-f" from the iscsid command line did not change anything.

That's about the only difference I see from what I have in Fedora/RHEL.

You might also have a different variant of systemd?

I really believe there are two long-term solutions to this:

1. find a way to tell systemd that an ExecStop command will actually stop the service, and that's ok. OR

2. Remove the ExecStop command. If there are problems with stopping, deal with them by beefing up the signal handling in iscsid, since that's the method systemd uses to stop such services.

- Chris

Chris Leech

unread,
Feb 4, 2015, 5:27:09 PM2/4/15
to The Lee-Man, open-...@googlegroups.com
On Wed, Feb 04, 2015 at 12:51:08PM -0800, The Lee-Man wrote:
> >
> > I'm wondering if it has something to do with the ExecStop command
> > returning before iscsid actually terminates, although I'm not sure
> > what's triggering the service restart.
> >
>
> I think you're onto the meat of the problem: the sequencing systemd does
> between the "ExecStop=" and the command itself.
>
> The systemd.service(5) man page says about ExecStop:
>
> "...After the commands configured in this option are run, all processes
> remaining for a service are terminated according to the KillMode= setting
> (see systemd.kill(5))."
>
> So I'm thinking systemd does not like that this command actually stops the
> service. But this only happens if the service is socket-activated, so it
> must be the socket service that is restarting iscsid. If I stop
> iscsid.socket before iscsid.service, this does not happen. But systemd
> doesn't normally shut things down that way, since iscsid.service depends on
> iscsid.socket.

OK, I ran this test script until it failed in the way you're seeing

---
#!/bin/sh

# make sure systemd is logging at debug level
LOGLEVELDEBUG=$(($(kill -l RTMIN) + 22))
kill -n $LOGLEVELDEBUG 1

while true; do
# use discovery to activate systemd
iscsiadm -m discovery -t st -p 192.168.122.1
sleep 2
systemctl stop iscsid.service
if [ $? -ne 0 ]; then
exit $?
fi
sleep 2
done

journalctl -b -x -u iscsid.service -u iscsid.socket | gzip > systemd-iscsid.gz
---

Here's the journal output from the final, failing, pass
(bear with me, this is kind of long)

TL;DR - systemd starts listening on the admin socket again before it
even runs iscsiadm, and the IMMEDIATE_STOP command actually enqueues a
service start job.

This first part starts iscsid via socket activation, no problems

systemd[1]: Incoming traffic on iscsid.socket
systemd[1]: Trying to enqueue job iscsid.service/start/replace
systemd[1]: Installed new job iscsid.service/start as 6572
systemd[1]: Enqueued job iscsid.service/start as 6572
systemd[1]: iscsid.socket changed listening -> running
systemd[1]: Starting Open-iSCSI...
-- Subject: Unit iscsid.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit iscsid.service has begun starting up.
systemd[1]: About to execute: /usr/sbin/iscsid -f
systemd[1]: Forked /usr/sbin/iscsid as 27674
systemd[1]: iscsid.service changed dead -> running
systemd[27674]: Executing: /usr/sbin/iscsid -f
systemd[1]: Job iscsid.service/start finished, result=done
systemd[1]: Started Open-iSCSI.
-- Subject: Unit iscsid.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit iscsid.service has finished starting up.
--
-- The start-up result is done.

Now systemd is sent the stop command

systemd[1]: Trying to enqueue job iscsid.service/stop/replace
systemd[1]: Installed new job iscsid.service/stop as 6577
systemd[1]: Enqueued job iscsid.service/stop as 6577
systemd[1]: Stopping Open-iSCSI...
-- Subject: Unit iscsid.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit iscsid.service has begun shutting down.

systemd forks a new process in preparation to run the ExecStop

systemd[1]: About to execute: /sbin/iscsiadm -k 0 2
systemd[1]: Forked /sbin/iscsiadm as 27679

but then goes ahead and declares the service stopped

systemd[1]: iscsid.service changed running -> stop
systemd[1]: iscsid.socket got notified about service death (failed permanently: no)

and starts listening on the socket already

systemd[1]: iscsid.socket changed running -> listening

and now actually gets around to execing iscsiadm

systemd[27679]: Executing: /sbin/iscsiadm -k 0 2

and the socket unit sees the MGMT_IPC_IMMEDIATE_STOP command as a reason
to _start_ iscsid.service

systemd[1]: Incoming traffic on iscsid.socket
systemd[1]: Trying to enqueue job iscsid.service/start/replace

the stop job isn't finished, but we've got a start job now so cancel it

systemd[1]: Job iscsid.service/stop finished, result=canceled
systemd[1]: Installed new job iscsid.service/start as 6579
systemd[1]: Enqueued job iscsid.service/start as 6579
systemd[1]: iscsid.socket changed listening -> running
systemd[1]: Child 27674 belongs to iscsid.service

iscsid goes ahead and exists, as it was told to

systemd[1]: iscsid.service: main process exited, code=exited, status=0/SUCCESS
systemd[1]: Child 27679 belongs to iscsid.service
systemd[1]: iscsid.service: control process exited, code=exited status=0
systemd[1]: iscsid.service got final SIGCHLD for state stop
systemd[1]: iscsid.service changed stop -> dead
systemd[1]: iscsid.socket got notified about service death (failed permanently: no)
systemd[1]: iscsid.socket changed running -> listening
systemd[1]: iscsid.service: cgroup is empty
iscsid[27674]: iscsid: iscsid shutting down.

but remember that start job that got enqueued

systemd[1]: Starting Open-iSCSI...
-- Subject: Unit iscsid.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit iscsid.service has begun starting up.
systemd[1]: About to execute: /usr/sbin/iscsid -f
systemd[1]: Forked /usr/sbin/iscsid as 27682
systemd[1]: iscsid.service changed dead -> running
systemd[1]: Job iscsid.service/start finished, result=done
systemd[1]: Started Open-iSCSI.
-- Subject: Unit iscsid.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit iscsid.service has finished starting up.
--
-- The start-up result is done.
systemd[1]: iscsid.socket changed listening -> running
systemd[27682]: Executing: /usr/sbin/iscsid -f

The Lee-Man

unread,
Feb 5, 2015, 7:58:37 PM2/5/15
to open-...@googlegroups.com
Thanks, Chris, for tracking down the systemd problem and reporting it.

Chris Leech

unread,
Feb 12, 2015, 11:56:16 PM2/12/15
to open-...@googlegroups.com
On Thu, Feb 05, 2015 at 04:58:37PM -0800, The Lee-Man wrote:
> Thanks, Chris, for tracking down the systemd problem and reporting it.

In case anyone else was following this thread, Lennart has commited a
change to systemd to fix this case. It will be in systemd 219.

commit: d54ddab8cbad46290306fc6e3346089fe3772d5c
"core: don't watch for socket events when service is in SERVICE_STOP"

- Chris
Reply all
Reply to author
Forward
0 new messages