[PATCHv2 0/2] Handle iscsid shutdown more cleanly

37 views
Skip to first unread message

leeman...@gmail.com

unread,
Nov 22, 2016, 5:44:56 PM11/22/16
to open-...@googlegroups.com, ha...@suse.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

This set of two patches addresses the issues
I mentioned on the mailing list recently relating
to how systemd can interfere with shutting down
the iscsid daemon. The iscsiadm command "-k"
can hang if the daemon is not present, when
running under systemd, when socket activated.

The first patch, from Hannes, modifies the IPC
communications so that user-level commands, like
iscsiadm and iscsistart, do not hang when waiting
for a response from the iscsid daemon by using
the poll() system call. This version also has a
couple of small bug fixes on top of the version
that Hannes posted recently.

The second patch modifies iscsid to treat a
SIGTERM just like it had received the "immediate
stop" command from iscsiadm (via the "-k"
option), simplifying the shutdown of iscsid so
that it no longer requires IPC communications.

Hannes Reinecke (1):
Use timeout when waiting for responses from iscsid

Lee Duncan (1):
iscsid: treat SIGTERM like "iscsiadm -k 0".

usr/config.h | 1 +
usr/discovery.c | 16 +++++++-------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 ++++++-----
usr/iscsid.c | 29 +++++++++++++++-----------
usr/iscsid_req.c | 52 +++++++++++++++++++++++++++++++++++++---------
usr/iscsid_req.h | 15 +++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 +++++++------
usr/session_info.h | 5 +++--
12 files changed, 99 insertions(+), 53 deletions(-)

--
1.8.5.6

leeman...@gmail.com

unread,
Nov 22, 2016, 5:45:10 PM11/22/16
to open-...@googlegroups.com, ha...@suse.com, Hannes Reinecke
From: Hannes Reinecke <ha...@suse.de>

The server might already been terminated when iscsiadm tries to
send a request to it, hence we might be waiting forever for a reply.
With this patchset we're waiting at most one minute before giving up,
avoiding a hang in iscsiadm.

Changes since v1:
* Break out of poll() loop on error
* Initialize discovery timeout

Signed-off-by: Hannes Reinecke <ha...@suse.com>
Tested-by: Lee Duncan <ldu...@suse.com>
---
usr/config.h | 1 +
usr/discovery.c | 16 ++++++++--------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 +++++++-----
usr/iscsid.c | 2 +-
usr/iscsid_req.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
usr/iscsid_req.h | 15 +++++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 ++++++++------
usr/session_info.h | 5 +++--
11 files changed, 83 insertions(+), 41 deletions(-)

diff --git a/usr/config.h b/usr/config.h
index 2e36a0a164b6..a464cfd948d7 100644
--- a/usr/config.h
+++ b/usr/config.h
@@ -316,6 +316,7 @@ typedef struct discovery_rec {
discovery_type_e type;
char address[NI_MAXHOST];
int port;
+ int iscsid_req_tmo;
union {
struct iscsi_sendtargets_config sendtargets;
struct iscsi_slp_config slp;
diff --git a/usr/discovery.c b/usr/discovery.c
index 43c2359cdcbc..abf7a4feafde 100644
--- a/usr/discovery.c
+++ b/usr/discovery.c
@@ -64,7 +64,7 @@ static char initiator_name[TARGET_NAME_MAXLEN + 1];
static char initiator_alias[TARGET_NAME_MAXLEN + 1];
static struct iscsi_ev_context ipc_ev_context;

-static int request_initiator_name(void)
+static int request_initiator_name(int tmo)
{
int rc;
iscsiadm_req_t req;
@@ -78,7 +78,7 @@ static int request_initiator_name(void)
memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_CONFIG_INAME;

- rc = iscsid_exec_req(&req, &rsp, 1);
+ rc = iscsid_exec_req(&req, &rsp, 1, tmo);
if (rc)
return rc;

@@ -88,7 +88,7 @@ static int request_initiator_name(void)
memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_CONFIG_IALIAS;

- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, tmo);
if (rc)
/* alias is optional so return ok */
return 0;
@@ -344,7 +344,7 @@ int discovery_isns(void *data, struct iface_rec *iface,
if (iface && strlen(iface->iname))
iname = iface->iname;
else {
- rc = request_initiator_name();
+ rc = request_initiator_name(drec->iscsid_req_tmo);
if (rc) {
log_error("Cannot perform discovery. Initiatorname "
"required.");
@@ -454,7 +454,7 @@ int discovery_offload_sendtargets(int host_no, int do_login,
* and get back the results. We should do this since it would
* allows us to then process the results like software iscsi.
*/
- rc = iscsid_exec_req(&req, &rsp, 1);
+ rc = iscsid_exec_req(&req, &rsp, 1, drec->iscsid_req_tmo);
if (rc) {
log_error("Could not offload sendtargets to %s.",
drec->address);
@@ -802,7 +802,7 @@ static void iscsi_free_session(struct iscsi_session *session)

static iscsi_session_t *
iscsi_alloc_session(struct iscsi_sendtargets_config *config,
- struct iface_rec *iface, int *rc)
+ struct iface_rec *iface, int *rc, int tmo)
{
iscsi_session_t *session;

@@ -848,7 +848,7 @@ iscsi_alloc_session(struct iscsi_sendtargets_config *config,
strcpy(initiator_name, iface->iname);
/* MNC TODO add iface alias */
} else {
- *rc = request_initiator_name();
+ *rc = request_initiator_name(tmo);
if (*rc) {
log_error("Cannot perform discovery. Initiatorname "
"required.");
@@ -1571,7 +1571,7 @@ int discovery_sendtargets(void *fndata, struct iface_rec *iface,
iscsi_timer_clear(&connection_timer);

/* allocate a new session, and initialize default values */
- session = iscsi_alloc_session(config, iface, &rc);
+ session = iscsi_alloc_session(config, iface, &rc, drec->iscsid_req_tmo);
if (rc)
return rc;

diff --git a/usr/host.c b/usr/host.c
index f2052d33be73..9b65fe0ebed8 100644
--- a/usr/host.c
+++ b/usr/host.c
@@ -251,7 +251,7 @@ static int host_info_print_tree(void *data, struct host_info *hinfo)
printf("\tSessions:\n");
printf("\t*********\n");

- session_info_print_tree(&sessions, "\t", session_info_flags, 0);
+ session_info_print_tree(&sessions, "\t", session_info_flags, 0, -1);
session_info_free_list(&sessions);
return 0;
}
diff --git a/usr/iscsi_sysfs.c b/usr/iscsi_sysfs.c
index 3a37a481421f..d87250c8d4ef 100644
--- a/usr/iscsi_sysfs.c
+++ b/usr/iscsi_sysfs.c
@@ -1420,6 +1420,7 @@ int iscsi_sysfs_for_each_session(void *data, int *nr_found,
if (!info)
return ISCSI_ERR_NOMEM;

+ info->iscsid_req_tmo = -1;
n = scandir(ISCSI_SESSION_DIR, &namelist, trans_filter,
alphasort);
if (n <= 0)
diff --git a/usr/iscsiadm.c b/usr/iscsiadm.c
index c6705bd6c3d2..7943ff922e83 100644
--- a/usr/iscsiadm.c
+++ b/usr/iscsiadm.c
@@ -252,7 +252,7 @@ str_to_portal_type(char *str)
return ptype;
}

-static void kill_iscsid(int priority)
+static void kill_iscsid(int priority, int tmo)
{
iscsiadm_req_t req;
iscsiadm_rsp_t rsp;
@@ -274,7 +274,7 @@ static void kill_iscsid(int priority)

memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_IMMEDIATE_STOP;
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, tmo);
if (rc) {
iscsi_err_print_msg(rc);
log_error("Could not stop iscsid. Trying sending iscsid "
@@ -741,7 +741,7 @@ static char *get_config_file(void)
memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_CONFIG_FILE;

- rc = iscsid_exec_req(&req, &rsp, 1);
+ rc = iscsid_exec_req(&req, &rsp, 1, ISCSID_REQ_TIMEOUT);
if (rc)
return NULL;

@@ -791,7 +791,7 @@ session_stats(void *data, struct session_info *info)
req.command = MGMT_IPC_SESSION_STATS;
req.u.session.sid = info->sid;

- rc = iscsid_exec_req(&req, &rsp, 1);
+ rc = iscsid_exec_req(&req, &rsp, 1, info->iscsid_req_tmo);
if (rc)
return rc;

@@ -2933,6 +2933,7 @@ static int exec_disc_op(int disc_type, char *ip, int port,
int rc = 0;

memset(&drec, 0, sizeof(struct discovery_rec));
+ drec.iscsid_req_tmo = -1;

switch (disc_type) {
case DISCOVERY_TYPE_SENDTARGETS:
@@ -3240,6 +3241,7 @@ main(int argc, char **argv)
int packet_size=32, ping_count=1, ping_interval=0;
int do_discover = 0, sub_mode = -1;
int portal_type = -1;
+ int timeout = ISCSID_REQ_TIMEOUT;
struct sigaction sa_old;
struct sigaction sa_new;
struct list_head ifaces;
@@ -3425,7 +3427,7 @@ main(int argc, char **argv)
}

if (killiscsid >= 0) {
- kill_iscsid(killiscsid);
+ kill_iscsid(killiscsid, timeout);
goto free_ifaces;
}

diff --git a/usr/iscsid.c b/usr/iscsid.c
index f8ffd23c500f..c866432b5923 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
@@ -274,7 +274,7 @@ static int sync_session(void *data, struct session_info *info)
memcpy(&req.u.session.rec, &rec, sizeof(node_rec_t));

retry:
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, info->iscsid_req_tmo);
if (rc == ISCSI_ERR_ISCSID_NOTCONN && retries < 30) {
retries++;
sleep(1);
diff --git a/usr/iscsid_req.c b/usr/iscsid_req.c
index 2950d748c644..aebe1d25e90c 100644
--- a/usr/iscsid_req.c
+++ b/usr/iscsid_req.c
@@ -24,6 +24,7 @@
#include <errno.h>
#include <fcntl.h>
#include <sys/un.h>
+#include <poll.h>
#include <sys/types.h>
#include <sys/socket.h>

@@ -33,6 +34,7 @@
#include "iscsi_util.h"
#include "config.h"
#include "iscsi_err.h"
+#include "iscsid_req.h"
#include "uip_mgmt_ipc.h"

static void iscsid_startup(void)
@@ -118,16 +120,46 @@ int iscsid_request(int *fd, iscsiadm_req_t *req, int start_iscsid)
return ISCSI_SUCCESS;
}

-int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp)
+int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp,
+ int timeout)
{
- int iscsi_err;
+ size_t len = sizeof(*rsp);
+ int iscsi_err = ISCSI_ERR_ISCSID_COMM_ERR;
int err;
+ int poll_wait = 0;

- if ((err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL)) != sizeof(*rsp)) {
- log_error("got read error (%d/%d), daemon died?", err, errno);
- iscsi_err = ISCSI_ERR_ISCSID_COMM_ERR;
- } else
- iscsi_err = rsp->err;
+ if (timeout == -1) {
+ timeout = ISCSID_REQ_TIMEOUT;
+ poll_wait = 1;
+ }
+ while (len) {
+ struct pollfd pfd;
+
+ pfd.fd = fd;
+ pfd.events = POLLIN;
+ err = poll(&pfd, 1, timeout);
+ if (!err) {
+ if (poll_wait)
+ continue;
+ return ISCSI_ERR_ISCSID_NOTCONN;
+ } else if (err < 0) {
+ if (errno == EINTR)
+ continue;
+ log_error("got poll error (%d/%d), daemon died?",
+ err, errno);
+ return ISCSI_ERR_ISCSID_COMM_ERR;
+ } else if (!pfd.revents & POLLIN)
+ continue;
+ err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL);
+ if (err < 0) {
+ log_error("got read error (%d/%d), daemon died?",
+ err, errno);
+ break;
+ } else {
+ len -= err;
+ iscsi_err = rsp->err;
+ }
+ }
close(fd);

if (!iscsi_err && cmd != rsp->command)
@@ -136,7 +168,7 @@ int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp)
}

int iscsid_exec_req(iscsiadm_req_t *req, iscsiadm_rsp_t *rsp,
- int start_iscsid)
+ int start_iscsid, int tmo)
{
int fd;
int err;
@@ -145,7 +177,7 @@ int iscsid_exec_req(iscsiadm_req_t *req, iscsiadm_rsp_t *rsp,
if (err)
return err;

- return iscsid_response(fd, req->command, rsp);
+ return iscsid_response(fd, req->command, rsp, tmo);
}

int iscsid_req_wait(iscsiadm_cmd_e cmd, int fd)
@@ -153,7 +185,7 @@ int iscsid_req_wait(iscsiadm_cmd_e cmd, int fd)
iscsiadm_rsp_t rsp;

memset(&rsp, 0, sizeof(iscsiadm_rsp_t));
- return iscsid_response(fd, cmd, &rsp);
+ return iscsid_response(fd, cmd, &rsp, -1);
}

int iscsid_req_by_rec_async(iscsiadm_cmd_e cmd, node_rec_t *rec, int *fd)
diff --git a/usr/iscsid_req.h b/usr/iscsid_req.h
index 8cb4a922f46d..67e509e4607f 100644
--- a/usr/iscsid_req.h
+++ b/usr/iscsid_req.h
@@ -21,17 +21,20 @@
#ifndef ISCSID_REQ_H_
#define ISCSID_REQ_H

+#define ISCSID_REQ_TIMEOUT 1000
+
struct iscsiadm_req;
struct iscsiadm_rsp;
struct node_rec;

extern int iscsid_exec_req(struct iscsiadm_req *req, struct iscsiadm_rsp *rsp,
- int iscsid_start);
-extern int iscsid_req_wait(int cmd, int fd);
-extern int iscsid_req_by_rec_async(int cmd, struct node_rec *rec, int *fd);
-extern int iscsid_req_by_rec(int cmd, struct node_rec *rec);
-extern int iscsid_req_by_sid_async(int cmd, int sid, int *fd);
-extern int iscsid_req_by_sid(int cmd, int sid);
+ int iscsid_start, int tmo);
+extern int iscsid_req_wait(iscsiadm_cmd_e cmd, int fd);
+extern int iscsid_req_by_rec_async(iscsiadm_cmd_e cmd, struct node_rec *rec,
+ int *fd);
+extern int iscsid_req_by_rec(iscsiadm_cmd_e cmd, struct node_rec *rec);
+extern int iscsid_req_by_sid_async(iscsiadm_cmd_e cmd, int sid, int *fd);
+extern int iscsid_req_by_sid(iscsiadm_cmd_e cmd, int sid);

extern int uip_broadcast(void *buf, size_t buf_len, int fd_flags,
uint32_t *status);
diff --git a/usr/iscsistart.c b/usr/iscsistart.c
index 7ff2236c9c79..14b1eb5304ab 100644
--- a/usr/iscsistart.c
+++ b/usr/iscsistart.c
@@ -123,7 +123,7 @@ static int stop_event_loop(void)

memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_IMMEDIATE_STOP;
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, -1);
if (rc) {
iscsi_err_print_msg(rc);
log_error("Could not stop event_loop");
@@ -235,7 +235,7 @@ static int login_session(struct node_rec *rec)
memcpy(&req.u.session.rec, rec, sizeof(*rec));

retry:
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, ISCSID_REQ_TIMEOUT);
/*
* handle race where iscsid proc is starting up while we are
* trying to connect.
diff --git a/usr/session_info.c b/usr/session_info.c
index 2f48e65402e0..9a89bfc4a138 100644
--- a/usr/session_info.c
+++ b/usr/session_info.c
@@ -93,7 +93,7 @@ static int session_info_print_flat(void *data, struct session_info *info)
return 0;
}

-static int print_iscsi_state(int sid, char *prefix)
+static int print_iscsi_state(int sid, char *prefix, int tmo)
{
iscsiadm_req_t req;
iscsiadm_rsp_t rsp;
@@ -120,7 +120,7 @@ static int print_iscsi_state(int sid, char *prefix)
req.command = MGMT_IPC_SESSION_INFO;
req.u.session.sid = sid;

- err = iscsid_exec_req(&req, &rsp, 1);
+ err = iscsid_exec_req(&req, &rsp, 1, tmo);
/*
* for drivers like qla4xxx, iscsid does not display
* anything here since it does not know about it.
@@ -236,7 +236,7 @@ static int print_scsi_state(int sid, char *prefix, unsigned int flags)
}

void session_info_print_tree(struct list_head *list, char *prefix,
- unsigned int flags, int do_show)
+ unsigned int flags, int do_show, int tmo)
{
struct session_info *curr, *prev = NULL;

@@ -289,7 +289,7 @@ void session_info_print_tree(struct list_head *list, char *prefix,

if (flags & SESSION_INFO_ISCSI_STATE) {
printf("%s\t\tSID: %d\n", prefix, curr->sid);
- print_iscsi_state(curr->sid, prefix);
+ print_iscsi_state(curr->sid, prefix, tmo);
}

if (flags & SESSION_INFO_ISCSI_TIM) {
@@ -406,7 +406,8 @@ int session_info_print(int info_level, struct session_info *info, int do_show)
if (info) {
INIT_LIST_HEAD(&info->list);
list_add_tail(&list, &info->list);
- session_info_print_tree(&list, "", flags, do_show);
+ session_info_print_tree(&list, "", flags, do_show,
+ info->iscsid_req_tmo);
num_found = 1;
break;
}
@@ -421,7 +422,8 @@ int session_info_print(int info_level, struct session_info *info, int do_show)
if (err || !num_found)
break;

- session_info_print_tree(&list, "", flags, do_show);
+ session_info_print_tree(&list, "", flags, do_show,
+ info->iscsid_req_tmo);
session_info_free_list(&list);
break;
default:
diff --git a/usr/session_info.h b/usr/session_info.h
index 726aefdce1b0..179b088d2070 100644
--- a/usr/session_info.h
+++ b/usr/session_info.h
@@ -28,6 +28,7 @@ struct session_info {
/* local info */
struct iface_rec iface;
int sid;
+ int iscsid_req_tmo;

struct session_timeout tmo;
struct session_CHAP chap;
@@ -60,8 +61,8 @@ struct session_link_info {
extern int session_info_create_list(void *data, struct session_info *info);
extern void session_info_free_list(struct list_head *list);
extern int session_info_print(int info_level, struct session_info *match_info,
- int do_show);
+ int do_show);
extern void session_info_print_tree(struct list_head *list, char *prefix,
- unsigned int flags, int do_show);
+ unsigned int flags, int do_show, int tmo);

#endif
--
1.8.5.6

leeman...@gmail.com

unread,
Nov 22, 2016, 5:46:06 PM11/22/16
to open-...@googlegroups.com, ha...@suse.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

The same code that is executed by iscsid
when iscsiadm sends the "immediate stop"
command should be executed when iscsid
receives a SIGTERM.

Signed-off-by: Lee Duncan <ldu...@suse.com>
---
usr/iscsid.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/usr/iscsid.c b/usr/iscsid.c
index c866432b5923..7126da2fdb5e 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
@@ -303,6 +303,20 @@ static void iscsid_shutdown(void)
}
}

+static void iscsid_shutdown_and_cleanup(void)
+{
+ idbm_terminate();
+ sysfs_cleanup();
+ ipc->ctldev_close();
+ mgmt_ipc_close(mgmt_ipc_fd);
+ if (daemon_config.initiator_name)
+ free(daemon_config.initiator_name);
+ if (daemon_config.initiator_alias)
+ free(daemon_config.initiator_alias);
+ free_initiator();
+ iscsid_shutdown();
+}
+
static void catch_signal(int signo)
{
log_debug(1, "pid %d caught signal %d", getpid(), signo);
@@ -313,7 +327,7 @@ static void catch_signal(int signo)

switch (signo) {
case SIGTERM:
- iscsid_shutdown();
+ iscsid_shutdown_and_cleanup();
exit(0);
break;
default:
@@ -556,15 +570,6 @@ int main(int argc, char *argv[])

event_loop(ipc, control_fd, mgmt_ipc_fd);

- idbm_terminate();
- sysfs_cleanup();
- ipc->ctldev_close();
- mgmt_ipc_close(mgmt_ipc_fd);
- if (daemon_config.initiator_name)
- free(daemon_config.initiator_name);
- if (daemon_config.initiator_alias)
- free(daemon_config.initiator_alias);
- free_initiator();
- iscsid_shutdown();
+ iscsid_shutdown_and_cleanup();
return 0;
}
--
1.8.5.6

The Lee-Man

unread,
Nov 22, 2016, 5:50:34 PM11/22/16
to open-iscsi, ha...@suse.com, ldu...@suse.com
Please ignore this. I am going to post an updated version
that sorts out the "poll()" logic in iscsid_req.c, as Uli requested
with the earlier patch.

On Tuesday, November 22, 2016 at 2:44:56 PM UTC-8, The Lee-Man wrote:

leeman...@gmail.com

unread,
Nov 22, 2016, 6:18:52 PM11/22/16
to open-...@googlegroups.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

This set of two patches addresses the issues
I mentioned on the mailing list recently relating
to how systemd can interfere with shutting down
the iscsid daemon. The iscsiadm command "-k"
can hang if the daemon is not present, when
running under systemd, when socket activated.

The first patch, from Hannes, modifies the IPC
communications so that user-level commands, like
iscsiadm and iscsistart, do not hang when waiting
for a response from the iscsid daemon by using
the poll() system call. This version also has a
couple of small bug fixes on top of the version
that Hannes posted recently.

The second patch modifies iscsid to treat a
SIGTERM just like it had received the "immediate
stop" command from iscsiadm (via the "-k"
option), simplifying the shutdown of iscsid so
that it no longer requires IPC communications.

Changes since v2:
* updated first patch to rework poll() logic

Changes since v1:
* added 2nd patch
* fixed 2 spots in the first patch

Hannes Reinecke (1):
Use timeout when waiting for responses from iscsid

Lee Duncan (1):
iscsid: treat SIGTERM like "iscsiadm -k 0".

usr/config.h | 1 +
usr/discovery.c | 16 ++++++++--------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 +++++++-----
usr/iscsid.c | 29 +++++++++++++++++------------
usr/iscsid_req.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
usr/iscsid_req.h | 15 +++++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 ++++++++------
usr/session_info.h | 5 +++--
11 files changed, 98 insertions(+), 52 deletions(-)

--
2.1.4

leeman...@gmail.com

unread,
Nov 22, 2016, 6:18:57 PM11/22/16
to open-...@googlegroups.com, Hannes Reinecke, Hannes Reinecke, Lee Duncan
From: Hannes Reinecke <ha...@suse.de>

The server might already been terminated when iscsiadm tries to
send a request to it, hence we might be waiting forever for a reply.
With this patchset we're waiting at most one minute before giving up,
avoiding a hang in iscsiadm.

Changes since v2:
* rework readability in new poll() code

Changes since v1:
* Break out of poll() loop on error
* Initialize discovery timeout

Signed-off-by: Hannes Reinecke <ha...@suse.com>
Signed-off-by: Lee Duncan <ldu...@suse.com>
---
usr/config.h | 1 +
usr/discovery.c | 16 ++++++++--------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 +++++++-----
usr/iscsid.c | 2 +-
usr/iscsid_req.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
usr/iscsid_req.h | 15 +++++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 ++++++++------
usr/session_info.h | 5 +++--
11 files changed, 82 insertions(+), 41 deletions(-)

diff --git a/usr/config.h b/usr/config.h
index fd31a54d0130..5b1bb1d624c5 100644
--- a/usr/config.h
+++ b/usr/config.h
@@ -315,6 +315,7 @@ typedef struct discovery_rec {
discovery_type_e type;
char address[NI_MAXHOST];
int port;
+ int iscsid_req_tmo;
union {
struct iscsi_sendtargets_config sendtargets;
struct iscsi_slp_config slp;
diff --git a/usr/discovery.c b/usr/discovery.c
index 593d22650f0d..6ee8bd915f5a 100644
@@ -1573,7 +1573,7 @@ int discovery_sendtargets(void *fndata, struct iface_rec *iface,
iscsi_timer_clear(&connection_timer);

/* allocate a new session, and initialize default values */
- session = iscsi_alloc_session(config, iface, &rc);
+ session = iscsi_alloc_session(config, iface, &rc, drec->iscsid_req_tmo);
if (rc)
return rc;

diff --git a/usr/host.c b/usr/host.c
index 63334907f2d1..7e88e5728ac3 100644
--- a/usr/host.c
+++ b/usr/host.c
@@ -251,7 +251,7 @@ static int host_info_print_tree(void *data, struct host_info *hinfo)
printf("\tSessions:\n");
printf("\t*********\n");

- session_info_print_tree(&sessions, "\t", session_info_flags, 0);
+ session_info_print_tree(&sessions, "\t", session_info_flags, 0, -1);
session_info_free_list(&sessions);
return 0;
}
diff --git a/usr/iscsi_sysfs.c b/usr/iscsi_sysfs.c
index 84c396cbf6f9..8ca668fdb3bc 100644
--- a/usr/iscsi_sysfs.c
+++ b/usr/iscsi_sysfs.c
@@ -1420,6 +1420,7 @@ int iscsi_sysfs_for_each_session(void *data, int *nr_found,
if (!info)
return ISCSI_ERR_NOMEM;

+ info->iscsid_req_tmo = -1;
n = scandir(ISCSI_SESSION_DIR, &namelist, trans_filter,
alphasort);
if (n <= 0)
diff --git a/usr/iscsiadm.c b/usr/iscsiadm.c
index 9602f6c331ec..4b2bd34cbb2e 100644
@@ -2956,6 +2956,7 @@ static int exec_disc_op(int disc_type, char *ip, int port,
int rc = 0;

memset(&drec, 0, sizeof(struct discovery_rec));
+ drec.iscsid_req_tmo = -1;

switch (disc_type) {
case DISCOVERY_TYPE_SENDTARGETS:
@@ -3263,6 +3264,7 @@ main(int argc, char **argv)
int packet_size=32, ping_count=1, ping_interval=0;
int do_discover = 0, sub_mode = -1;
int portal_type = -1;
+ int timeout = ISCSID_REQ_TIMEOUT;
struct sigaction sa_old;
struct sigaction sa_new;
struct list_head ifaces;
@@ -3448,7 +3450,7 @@ main(int argc, char **argv)
}

if (killiscsid >= 0) {
- kill_iscsid(killiscsid);
+ kill_iscsid(killiscsid, timeout);
goto free_ifaces;
}

diff --git a/usr/iscsid.c b/usr/iscsid.c
index 8dd72a43bb0b..0c2634448d09 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
@@ -274,7 +274,7 @@ static int sync_session(void *data, struct session_info *info)
memcpy(&req.u.session.rec, &rec, sizeof(node_rec_t));

retry:
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, info->iscsid_req_tmo);
if (rc == ISCSI_ERR_ISCSID_NOTCONN && retries < 30) {
retries++;
sleep(1);
diff --git a/usr/iscsid_req.c b/usr/iscsid_req.c
index 2950d748c644..a2e6d6e39ceb 100644
--- a/usr/iscsid_req.c
+++ b/usr/iscsid_req.c
@@ -24,6 +24,7 @@
#include <errno.h>
#include <fcntl.h>
#include <sys/un.h>
+#include <poll.h>
#include <sys/types.h>
#include <sys/socket.h>

@@ -33,6 +34,7 @@
#include "iscsi_util.h"
#include "config.h"
#include "iscsi_err.h"
+#include "iscsid_req.h"
#include "uip_mgmt_ipc.h"

static void iscsid_startup(void)
@@ -118,16 +120,45 @@ int iscsid_request(int *fd, iscsiadm_req_t *req, int start_iscsid)
+ } else if (pfd.revents & POLLIN) {
+ err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL);
+ if (err < 0) {
+ log_error("read error (%d/%d), daemon died?",
+ err, errno);
+ break;
+ }
+ len -= err;
+ iscsi_err = rsp->err;
+ }
+ }
close(fd);

if (!iscsi_err && cmd != rsp->command)
@@ -136,7 +167,7 @@ int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp)
}

int iscsid_exec_req(iscsiadm_req_t *req, iscsiadm_rsp_t *rsp,
- int start_iscsid)
+ int start_iscsid, int tmo)
{
int fd;
int err;
@@ -145,7 +176,7 @@ int iscsid_exec_req(iscsiadm_req_t *req, iscsiadm_rsp_t *rsp,
if (err)
return err;

- return iscsid_response(fd, req->command, rsp);
+ return iscsid_response(fd, req->command, rsp, tmo);
}

int iscsid_req_wait(iscsiadm_cmd_e cmd, int fd)
@@ -153,7 +184,7 @@ int iscsid_req_wait(iscsiadm_cmd_e cmd, int fd)
index 16a12efb29a7..5cf09721636b 100644
--- a/usr/iscsistart.c
+++ b/usr/iscsistart.c
@@ -123,7 +123,7 @@ static int stop_event_loop(void)

memset(&req, 0, sizeof(req));
req.command = MGMT_IPC_IMMEDIATE_STOP;
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, -1);
if (rc) {
iscsi_err_print_msg(rc);
log_error("Could not stop event_loop");
@@ -235,7 +235,7 @@ static int login_session(struct node_rec *rec)
memcpy(&req.u.session.rec, rec, sizeof(*rec));

retry:
- rc = iscsid_exec_req(&req, &rsp, 0);
+ rc = iscsid_exec_req(&req, &rsp, 0, ISCSID_REQ_TIMEOUT);
/*
* handle race where iscsid proc is starting up while we are
* trying to connect.
diff --git a/usr/session_info.c b/usr/session_info.c
index 89422d8e3933..9dc6e25b05fb 100644
@@ -407,7 +407,8 @@ int session_info_print(int info_level, struct session_info *info, int do_show)
if (info) {
INIT_LIST_HEAD(&info->list);
list_add_tail(&list, &info->list);
- session_info_print_tree(&list, "", flags, do_show);
+ session_info_print_tree(&list, "", flags, do_show,
+ info->iscsid_req_tmo);
num_found = 1;
break;
}
@@ -422,7 +423,8 @@ int session_info_print(int info_level, struct session_info *info, int do_show)
2.1.4

leeman...@gmail.com

unread,
Nov 22, 2016, 6:19:00 PM11/22/16
to open-...@googlegroups.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

The same code that is executed by iscsid
when iscsiadm sends the "immediate stop"
command should be executed when iscsid
receives a SIGTERM.

Signed-off-by: Lee Duncan <ldu...@suse.com>
---
usr/iscsid.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/usr/iscsid.c b/usr/iscsid.c
index 0c2634448d09..c87e1e7a47ac 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
2.1.4

Ulrich Windl

unread,
Nov 23, 2016, 3:39:06 AM11/23/16
to open-...@googlegroups.com
>>> <leeman...@gmail.com> schrieb am 22.11.2016 um 23:42 in Nachricht
<cover.14798448...@suse.com>:
[...]
> The second patch modifies iscsid to treat a
> SIGTERM just like it had received the "immediate
> stop" command from iscsiadm (via the "-k"
> option), simplifying the shutdown of iscsid so
> that it no longer requires IPC communications.

Good common practice see to perform an orderly (clean) shutdown on first receipt of SIGTERM; if another SIGTERM arrives a few seconds (time depending on the application) later, a more immediate shutdown should be attempted (still trying to be as clean as possible).

I had implemented such a pattern in a network server, and I'll use it for an example here (forgive!):
1st signal prevents accepting new connections; server will shut down if all connections are gone
2nd signal will shutdown connections if current requests have been answered
3rd signal will terminate active connections (those who transfer data) (i.e. after a recv/send)
4th signal will terminate connections blocked on sending or receiving data (i.e. kill an "active" recv/send)
So eventually the server will arrive in the "1st signal received state" eventually...

Regards,
Ulrich

>
> Hannes Reinecke (1):
> Use timeout when waiting for responses from iscsid
>
> Lee Duncan (1):
> iscsid: treat SIGTERM like "iscsiadm -k 0".
>
> usr/config.h | 1 +
> usr/discovery.c | 16 +++++++-------
> usr/host.c | 2 +-
> usr/iscsi_sysfs.c | 1 +
> usr/iscsiadm.c | 12 ++++++-----
> usr/iscsid.c | 29 +++++++++++++++-----------
> usr/iscsid_req.c | 52
> +++++++++++++++++++++++++++++++++++++---------
> usr/iscsid_req.h | 15 +++++++------
> usr/iscsistart.c | 4 ++--
> usr/session_info.c | 14 +++++++------
> usr/session_info.h | 5 +++--
> 12 files changed, 99 insertions(+), 53 deletions(-)
>
> --
> 1.8.5.6
>
> --
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to open-iscsi+...@googlegroups.com.
> To post to this group, send email to open-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.




Christian Seiler

unread,
Nov 23, 2016, 7:13:32 AM11/23/16
to open-...@googlegroups.com, ha...@suse.com, Lee Duncan
Hi,

On 11/22/2016 11:43 PM, leeman...@gmail.com wrote:
> static void catch_signal(int signo)
> [...]
> switch (signo) {
> case SIGTERM:
> - iscsid_shutdown();
> + iscsid_shutdown_and_cleanup();
> exit(0);
> break;

I think this is not a good idea because most functions called
here will not be async signal safe, and you are calling them
from a signal handler. Beforehand this was not a problem since
iscsid_shutdown() is just a kill of the process group.

This might lead to very ugly race condition and deadlocks, and
the race conditions could cause crashes or even arbitrary
memory to be overwritten in the worst case. (Think code that
is in the middle of updating a data structure, and the signal
comes at a time when the structure is in an inconsistent
state.)

Creating async-signal-safe data structures is not impossible
(as long as atomic operations are available), but very hard to
do properly, and even harder when shutdown of these data
structures in a signal handler is to be supported. Writing
thread-safe code is much, much easier, for example.

The recommended way of performing complicated actions in
response to signals in daemons is to have the signal handler
set a flag (either using atomic operations or with the good
old "volatile") and have the regular event loop always check
to see if the flag was changed, and if so initiate a
shutdown.

Regards,
Christian

Lee Duncan

unread,
Nov 23, 2016, 4:43:57 PM11/23/16
to open-...@googlegroups.com, ha...@suse.com, Lee Duncan
On Nov 23, 2016, at 4:13 AM, Christian Seiler <christi...@gmail.com> wrote:

Hi,

On 11/22/2016 11:43 PM, leeman...@gmail.com wrote:
static void catch_signal(int signo)
[...]
switch (signo) {
case SIGTERM:
- iscsid_shutdown();
+ iscsid_shutdown_and_cleanup();
exit(0);
break;

I think this is not a good idea because most functions called
here will not be async signal safe, and you are calling them
from a signal handler. Beforehand this was not a problem since
iscsid_shutdown() is just a kill of the process group.

The signal is ignored while in the signal handler, and I expect
not to get a SIGTERM storm. But perhaps you are right.


This might lead to very ugly race condition and deadlocks, and
the race conditions could cause crashes or even arbitrary
memory to be overwritten in the worst case. (Think code that
is in the middle of updating a data structure, and the signal
comes at a time when the structure is in an inconsistent
state.)

Creating async-signal-safe data structures is not impossible
(as long as atomic operations are available), but very hard to
do properly, and even harder when shutdown of these data
structures in a signal handler is to be supported. Writing
thread-safe code is much, much easier, for example.

The recommended way of performing complicated actions in
response to signals in daemons is to have the signal handler
set a flag (either using atomic operations or with the good
old "volatile") and have the regular event loop always check
to see if the flag was changed, and if so initiate a
shutdown.

I will resubmit the patch series with the 2nd patch updated.

I see now (looking at the code), that I can just set the “exit
the event loop” flag when SIGTERM is received. And it has
the added benefit of being idempotent, so I don’t need to
guard against receiving multiple signals.


Regards,
Christian


-- 

Lee-Man

"Reincarnated as love poetry." -- Big Head Todd


Lee Duncan

unread,
Nov 23, 2016, 4:47:51 PM11/23/16
to open-...@googlegroups.com
On Nov 23, 2016, at 12:39 AM, Ulrich Windl <Ulrich...@rz.uni-regensburg.de> wrote:
>
>>>> <leeman...@gmail.com> schrieb am 22.11.2016 um 23:42 in Nachricht
> <cover.14798448...@suse.com>:
> [...]
>> The second patch modifies iscsid to treat a
>> SIGTERM just like it had received the "immediate
>> stop" command from iscsiadm (via the "-k"
>> option), simplifying the shutdown of iscsid so
>> that it no longer requires IPC communications.
>
> Good common practice see to perform an orderly (clean) shutdown on first receipt of SIGTERM; if another SIGTERM arrives a few seconds (time depending on the application) later, a more immediate shutdown should be attempted (still trying to be as clean as possible).

In this case, I think this is overkill since systemd is already managing shutdown escalation for us.

Systemd will, by default, send a SIGTERM. I have configured it to send that SIGTERM to the main
process only by setting KillMode=mixed in the unit file.

Systemd will then send SIGKILL to all the processes in the process group after its default stop
timeout. This timeout defaults to 90 seconds but can of course be configured separately if desired
for iscsid.

I believe the best approach may be to handle it exactly like the IPC request for shutdown is
handled, i.e. set the “exit the polling loop” flag, then return. If this approach does not work
within the timeout period a SIGKILL is going to be received.
> You received this message because you are subscribed to a topic in the Google Groups "open-iscsi" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/open-iscsi/bwoKUjj6cdk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to open-iscsi+...@googlegroups.com.

Christian Seiler

unread,
Nov 23, 2016, 5:14:50 PM11/23/16
to open-...@googlegroups.com, ha...@suse.com, Lee Duncan
On 11/23/2016 10:43 PM, Lee Duncan wrote:
> On Nov 23, 2016, at 4:13 AM, Christian Seiler <christi...@gmail.com> wrote:
>>
>> Hi,
>>
>> On 11/22/2016 11:43 PM, leeman...@gmail.com wrote:
>>> static void catch_signal(int signo)
>>> [...]
>>> switch (signo) {
>>> case SIGTERM:
>>> - iscsid_shutdown();
>>> + iscsid_shutdown_and_cleanup();
>>> exit(0);
>>> break;
>>
>> I think this is not a good idea because most functions called
>> here will not be async signal safe, and you are calling them
>> from a signal handler. Beforehand this was not a problem since
>> iscsid_shutdown() is just a kill of the process group.
>
> The signal is ignored while in the signal handler, and I expect
> not to get a SIGTERM storm. But perhaps you are right.

That's not what I meant when discussing async-signal
safety. My point was not about other signals interrupting
the signal handler itself - during the signal handler's
execution, they will be blocked, as you rightly stated.

My worry is the following scenario: the normal code is
calling a routine, and that routine gets interrupted by a
SIGTERM, then the data structures in memory could be in an
inconsistent state (because a part of the memory was changed,
but other parts haven't yet). In that case, the signal
handler would try to tear down an inconsistent data
structure.

> I see now (looking at the code), that I can just set the “exit
> the event loop” flag when SIGTERM is received. And it has
> the added benefit of being idempotent, so I don’t need to
> guard against receiving multiple signals.

Yes, I think that's the best solution.

Regards,
Christian

leeman...@gmail.com

unread,
Nov 23, 2016, 6:01:04 PM11/23/16
to open-...@googlegroups.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

This set of two patches addresses the issues
I mentioned on the mailing list recently relating
to how systemd can interfere with shutting down
the iscsid daemon. The iscsiadm command "-k"
can hang if the daemon is not present, when
running under systemd, when socket activated.

The first patch, from Hannes, modifies the IPC
communications so that user-level commands, like
iscsiadm and iscsistart, do not hang when waiting
for a response from the iscsid daemon by using
the poll() system call. This version also has a
couple of small bug fixes on top of the version
that Hannes posted recently.

The second patch modifies iscsid to treat a
SIGTERM just like it had received the "immediate
stop" command from iscsiadm (via the "-k"
option), simplifying the shutdown of iscsid so
that it no longer requires IPC communications.

Changes since v3:
* reworked the second patch to just set the
"exit the polling loop" flag

Changes since v2:
* updated first patch to rework poll() logic

Changes since v1:
* added 2nd patch
* fixed 2 spots in the first patch

Hannes Reinecke (1):
Use timeout when waiting for responses from iscsid

Lee Duncan (1):
iscsid: treat SIGTERM like "iscsiadm -k 0"

usr/config.h | 1 +
usr/discovery.c | 16 ++++++++--------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 +++++++-----
usr/iscsid.c | 5 ++---
usr/iscsid_req.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
usr/iscsid_req.h | 15 +++++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 ++++++++------
usr/session_info.h | 5 +++--
11 files changed, 83 insertions(+), 43 deletions(-)

--
2.1.4

leeman...@gmail.com

unread,
Nov 23, 2016, 6:01:09 PM11/23/16
to open-...@googlegroups.com, Hannes Reinecke, Hannes Reinecke, Lee Duncan
From: Hannes Reinecke <ha...@suse.de>

The server might already been terminated when iscsiadm tries to
send a request to it, hence we might be waiting forever for a reply.
With this patchset we're waiting at most one minute before giving up,
avoiding a hang in iscsiadm.

Changes since v2:
* rework readability in new poll() code

Changes since v1:
* Break out of poll() loop on error
* Initialize discovery timeout

Signed-off-by: Hannes Reinecke <ha...@suse.com>
Signed-off-by: Lee Duncan <ldu...@suse.com>
---
usr/config.h | 1 +
usr/discovery.c | 16 ++++++++--------
usr/host.c | 2 +-
usr/iscsi_sysfs.c | 1 +
usr/iscsiadm.c | 12 +++++++-----
usr/iscsid.c | 2 +-
usr/iscsid_req.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
usr/iscsid_req.h | 15 +++++++++------
usr/iscsistart.c | 4 ++--
usr/session_info.c | 14 ++++++++------
usr/session_info.h | 5 +++--
diff --git a/usr/iscsid.c b/usr/iscsid.c
index 8dd72a43bb0b..0c2634448d09 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c

leeman...@gmail.com

unread,
Nov 23, 2016, 6:01:15 PM11/23/16
to open-...@googlegroups.com, Lee Duncan
From: Lee Duncan <ldu...@suse.com>

The same code that is executed by iscsid
when iscsiadm sends the "immediate stop"
command should be executed when iscsid
receives a SIGTERM.

Changes since v3:
* now just set the "event loop stop" flag

Signed-off-by: Lee Duncan <ldu...@suse.com>
---
usr/iscsid.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/usr/iscsid.c b/usr/iscsid.c
index 0c2634448d09..81a14f259b5f 100644
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
@@ -313,8 +313,7 @@ static void catch_signal(int signo)

switch (signo) {
case SIGTERM:
- iscsid_shutdown();
- exit(0);
+ event_loop_exit(NULL);
break;
default:
break;
--
2.1.4

Lee Duncan

unread,
Nov 28, 2016, 7:01:57 PM11/28/16
to open-...@googlegroups.com, Lee Duncan, Hannes Reinecke
No comment on these? Since the previous version drew comments, can I assume that means everybody is happy with this version?

Hannes? Anyone?

--

Ignore the Lee-Man behind the curtain ...


"Life's a long song. But the tune ends too soon for us all." -- Ian Anderson


Christian Seiler

unread,
Nov 29, 2016, 7:33:48 AM11/29/16
to open-...@googlegroups.com, Lee Duncan, Hannes Reinecke
On 11/29/2016 01:01 AM, Lee Duncan wrote:
> No comment on these? Since the previous version drew comments, can I
> assume that means everybody is happy with this version?

From a Debian maintainer perspective, I'd be happy with it.

Only minor thing is that you might want to mark
static int event_loop_stop; (usr/event_poll.c)
to be volatile, to be on the safe side when modifying it
from within a signal handler. Probably not really required
here (the compiler is not allowed to optimize out the
access anyway, since you call non-static functions within
the loop), but it doesn't hurt either, just in case...

Regards,
Christian
Reply all
Reply to author
Forward
0 new messages