[PATCH] swupdate_async_start: fix hang with update containing many small steps

7 views
Skip to first unread message

Dominique Martinet

unread,
May 12, 2026, 7:11:37 AM (yesterday) May 12
to swup...@googlegroups.com, Dominique Martinet
Running swupdate with a SWU (streamed through swupdate_async_start such
as with `swupdate -i` or `swupdate-client`) can hang if the update
contains many small steps followed by a large step:
- swupdate_async_thread() is stuck trying to write the SWU content to
the server
- but the server sent many progress events and filled progressfd
backlog, and closes the server side of in send_progress_msg() after
retries failed (client is still blocked)
- when server continued installing more items the client is unblocked
and reads up on progress notifications, notices the socket is closed
(POLLHUP) and fails
- the server fails the update as it is not complete

This fails with logs similar to this:
```
progress_ipc_receive_nb failed (-1)
Cannot consume progress events. Fail.
[ERROR] : SWUPDATE failed [0] ERROR install_from_file.c : endupdate : 55 : SWUpdate *failed* !
[TRACE] : SWUPDATE running : [unlink_sockets] : unlink socket /tmp/swupdateprog
[TRACE] : SWUPDATE running : [unlink_sockets] : unlink socket /tmp/sockinstctrl
[ERROR] : SWUPDATE failed [0] ERROR cpio_utils.c : hash_compare : 521 : HASH mismatch : 4395313e6f79b6554521e1933304e95d0bea836d2bc72e9b3828fd62c5125baf <--> 3eb039c6666dfd614a73b930c98c38ab77ba362ce3b9d745c7684d0d10335726
```

This patch makes the sending socket nonblocking, allows partial sends of
SWU content (there is no need for it to be a full write unlike IPC
messages), and polls both progressfd and connfd to handle both sockets
simultaneously.

The poll is somewhat redundant with the other poll on progressfd in
progress_ipc_receive_nb(), but progress_ipc_receive_nb() is part of the
progress_ipc API and cannot be changed, so is left as is.

Fixes: 8b4efc0d23db ("swupdate_async_start: fix early termination blocking in client")
Link: https://groups.google.com/g/swupdate/c/F3XKUdP0KTc/m/m2jk4EdkBgAJ
---

This is an implementation of my second suggestion in the other thread,
which is clean enough even if it's a bit complex for my liking.

I've tested with the reproducer I made as well as the one our customer
gave us and things appear to work OK, but I'll play with it a bit more
tomorrow (I shouldn't spam any more unless something comes up :))

If you have any feedback, happy to throw this away for anything else
that'd resolve the issue - this code has always been a bit messy and I'm
not helping on that side...

Thanks!

ipc/network_ipc-if.c | 53 ++++++++++++++++++++++++++++++++++++--------
ipc/network_ipc.c | 5 +++++
2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/ipc/network_ipc-if.c b/ipc/network_ipc-if.c
index 7c209ad484fc..4dd5d39fca45 100644
--- a/ipc/network_ipc-if.c
+++ b/ipc/network_ipc-if.c
@@ -5,13 +5,15 @@
* SPDX-License-Identifier: LGPL-2.1-or-later
*/

+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
-#include <errno.h>
-#include <signal.h>
-#include <pthread.h>
-#include <inttypes.h>
#include <unistd.h>
#include "network_ipc.h"
#include "progress_ipc.h"
@@ -155,14 +157,15 @@ static int consume_progress_events(int *progressfd)
static void *swupdate_async_thread(void *data)
{
char *pbuf;
- int size;
+ int size = 0, buf_offset = 0;
+ struct pollfd pfds[2];
sigset_t sigpipe_mask;
sigset_t saved_mask;
struct timespec zerotime = {0, 0};
struct async_lib *rq = (struct async_lib *)data;
int swupdate_result = FAILURE;
int progressfd = -1;
- int ret;
+ int ret, nfd, old_flags;
int early_status = -1;

sigemptyset(&sigpipe_mask);
@@ -183,22 +186,54 @@ static void *swupdate_async_thread(void *data)
goto out;
}

+ /* make connfd non-blocking while we're writing image */
+ old_flags = fcntl(rq->connfd, F_GETFL);
+ if (old_flags < 0) {
+ fprintf(stderr, "Could not get connfd flags? %m\n");
+ goto out;
+ }
+ fcntl(rq->connfd, F_SETFL, old_flags | O_NONBLOCK);
+
/* Start writing the image until end */

do {
if (!rq->wr)
break;
+ if (size == buf_offset) {
+ buf_offset = 0;
+ rq->wr(&pbuf, &size);
+ }

- rq->wr(&pbuf, &size);
+ pfds[0].fd = progressfd;
+ pfds[0].events = POLLIN;
+ nfd = 1;
if (size) {
- if (swupdate_image_write(pbuf, size) != size) {
+ pfds[1].fd = rq->connfd;
+ pfds[1].events = POLLOUT;
+ nfd = 2;
+ }
+ do {
+ ret = poll(pfds, nfd, -1);
+ } while (ret < 0 && errno == EINTR);
+
+ if (ret < 0) {
+ fprintf(stderr, "poll failed: %m\n");
+ goto out;
+ }
+ if (nfd == 2 && pfds[1].revents & POLLOUT) {
+ ret = swupdate_image_write(pbuf + buf_offset, size - buf_offset);
+ if (ret < 0 && errno != EAGAIN) {
perror("swupdate_image_write failed");
swupdate_result = FAILURE;
goto out;
+ } else if (ret > 0) {
+ buf_offset += ret;
}
}
/* Consume progress events so that the pipe does not get full
- * and block the daemon */
+ * and block the daemon
+ * Note this polls progressfd again so we don't need to check
+ * pfds[0].revents */
ret = consume_progress_events(&progressfd);
if (ret == -1) {
/* If we cannot get events, then we won't be able to get the result.
diff --git a/ipc/network_ipc.c b/ipc/network_ipc.c
index 96a7701d0c9a..ede80dfcffe7 100644
--- a/ipc/network_ipc.c
+++ b/ipc/network_ipc.c
@@ -320,6 +320,11 @@ int ipc_send_data(int connfd, char *buf, int size)

while (len) {
ret = write(connfd, buf, (size_t)len);
+ if (ret < 0 && errno == EINTR)
+ continue;
+ /* return partially written length if any */
+ if (ret < 0 && errno == EAGAIN && len != size)
+ return size - len;
if (ret < 0)
return ret;
len -= ret;
--
2.47.3


Reply all
Reply to author
Forward
0 new messages