[Talk Proposal] Debugging a systemd PID 1 Crash: A Race Condition in Unit Alias Deserialization

48 views
Skip to first unread message

Balakumaran Kannan

unread,
Mar 27, 2026, 8:31:08 AM (5 days ago) Mar 27
to Kernel Meetup Bangalore
Hi All,

Preferred Format: 25+5 mins

While systemd is not a kernel subsystem, it is the first userspace process (PID 1) on virtually every modern Linux distribution — the bridge between the kernel and the rest of the system. A crash in systemd is, for all practical purposes, as catastrophic as a kernel panic. This talk presents a real-world debugging journey of exactly such a crash, emphasizing the distributive nature of Linux: how independently developed components — the kernel, systemd, service daemons, and application scripts — must compose correctly across version boundaries, and how subtle breakages emerge when they don't.

When we rolled out Azure Linux 3.0 across 1,600+ cloud VMs, 29 machines froze hard — PID 1 crashed with `assert_not_reached()` in `service_sigchld_event()` at `service.c:3863`. No OOM, no kernel panic — systemd killed itself. We traced the crash from vague "Transport endpoint is not connected" errors to a race condition in systemd's daemon-reload serialization path, introduced in systemd 254: a symlink alias (`syslog.service` → `rsyslog.service`) causes the unit's deserialized state to be silently overwritten to DEAD while its `main_pid` remains set, leading to an assertion failure when SIGCHLD arrives.

The talk covers: journal log forensics and initial triage; source-level root cause analysis of systemd's serialization and deserialization paths (`service.c`, `manager.c`); constructing a deterministic reproducer that defeats hashmap ordering non-determinism; the mitigation we shipped and upstream engagement (systemd/systemd#14141, #38817, PR #39703).

Attendees will walk away with: practical techniques for debugging PID 1 crashes using journal logs and systemd source code; an understanding of how daemon-reload serialization works internally; awareness of how unit symlink aliases can corrupt service state; and broader lessons on the risks of version upgrades in a distributive ecosystem where no single component owns the integration contract.

Presenter:
I am one of the maintainers of Azure Linux, Microsoft's Linux distribution. With 15 years of experience in Linux systems engineering, I have worked across the networking subsystem, scheduler, KVM, BSP, and device drivers at organizations including Flipkart, Hewlett Packard Enterprise, and Sony. I previously presented a talk on the EEVDF Scheduler at the Linux Kernel Meetup.

Suchit Karunakaran

unread,
Mar 27, 2026, 8:34:46 AM (5 days ago) Mar 27
to Balakumaran Kannan, Kernel Meetup Bangalore
+1

--
You received this message because you are subscribed to the Google Groups "Kernel Meetup Bangalore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kernel-meetup-ban...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/kernel-meetup-bangalore/CAHPKR9%2BuG5h9029%2BQu4LSkwqdJvEpQLEwXQDfc-1VqsjRdiQHA%40mail.gmail.com.

Archana choudhary

unread,
Mar 27, 2026, 8:36:30 AM (5 days ago) Mar 27
to Suchit Karunakaran, Balakumaran Kannan, Kernel Meetup Bangalore

Diksha

unread,
Mar 27, 2026, 8:41:23 AM (5 days ago) Mar 27
to Balakumaran Kannan, Kernel Meetup Bangalore
+1

--

Ankita Pareek

unread,
Mar 27, 2026, 10:18:17 AM (5 days ago) Mar 27
to Balakumaran Kannan, Kernel Meetup Bangalore
+1

--

Leo Mar

unread,
Mar 27, 2026, 12:30:03 PM (5 days ago) Mar 27
to Ankita Pareek, Balakumaran Kannan, Kernel Meetup Bangalore

Sudipta Pandit

unread,
Mar 27, 2026, 1:22:55 PM (5 days ago) Mar 27
to Leo Mar, Ankita Pareek, Balakumaran Kannan, Kernel Meetup Bangalore

Kanishk Bansal

unread,
Mar 27, 2026, 1:23:34 PM (5 days ago) Mar 27
to Kernel Meetup Bangalore
+1

Kaiwan N Billimoria

unread,
Mar 27, 2026, 11:28:37 PM (4 days ago) Mar 27
to Balakumaran Kannan, Kernel Meetup Bangalore
+1 

--

Priyajit Ghosh

unread,
Mar 28, 2026, 8:18:08 AM (4 days ago) Mar 28
to Kernel Meetup Bangalore
+1

CoderDomain Opt

unread,
Mar 30, 2026, 1:44:42 PM (2 days ago) Mar 30
to Priyajit Ghosh, Kernel Meetup Bangalore

Kaitepalli Kavya Sree

unread,
Mar 30, 2026, 8:18:37 PM (2 days ago) Mar 30
to Balakumaran Kannan, Kernel Meetup Bangalore
+1

On Fri, 27 Mar 2026, 18:01 Balakumaran Kannan, <kumara...@gmail.com> wrote:
--

Nagamani Veerappa

unread,
Mar 31, 2026, 2:55:19 PM (11 hours ago) Mar 31
to Balakumaran Kannan, Kernel Meetup Bangalore
+1

On Fri, Mar 27, 2026 at 6:01 PM Balakumaran Kannan <kumara...@gmail.com> wrote:
--
Reply all
Reply to author
Forward
0 new messages