Breaking semaphores

48 views
Skip to first unread message

Jack Firth

unread,
Jan 17, 2020, 11:37:16 PM1/17/20
to Racket Users
The docs for semaphores say this:

In general, it is impossible using only semaphore-wait to implement the guarantee that either the semaphore is decremented or an exception is raised, but not both. Racket therefore supplies semaphore-wait/enable-break (see Semaphores), which does permit the implementation of such an exclusive guarantee.

I understand the purpose of semaphore-wait/enable-break, but there's something about semaphore-wait that confuses me: why does it allow breaking at all? My understanding is that if breaks are enabled, semaphore-wait still tries to block and decrement the counter, even though a break at any time could destroy the integrity of the semaphore. Does that mean it's not kill-safe to use a semaphore as a lock? Wouldn't it be safer if semaphore-wait automatically disabled breaks while waiting?

Alexis King

unread,
Jan 18, 2020, 3:47:06 AM1/18/20
to Jack Firth, Racket Users
Killing a thread is different from breaking a thread. Killing a thread kills the thread unrecoverably, and no cleanup actions are run. This usually isn’t what you want, but there’s always a tension between these kinds of things: defensive programmers ask “How do I make myself unkillable so I can safely clean up?” but then implementors of a dynamic environment (like, say, DrRacket) find themselves asking “How do I kill a runaway thread?” Assuming you’re not DrRacket, you usually want `break-thread`, not `kill-thread`.

But perhaps you know that already, and your question is just about breaking, so by “kill-safe” you mean “break-safe.” You ask why `semaphore-break` doesn’t just disable breaking, but that wouldn’t help with the problem the documentation alludes to. The problem is that there’s fundamentally a race condition in code like this:

(semaphore-wait sem)
; do something important
(semaphore-post sem)

If this code is executed in a context where breaks are enabled, it’s not break-safe whether or not `semaphore-wait` were to disable breaks while waiting on the semaphore. As soon as `semaphore-wait` returns, the queued break would be delivered, the stack would unwind, and the matching `semaphore-post` call would never execute, potentially holding a lock forever. So the issue isn’t that the semaphore’s internal state gets somehow corrupted, but that the state no longer reflects the value you want.

The right way to write that code is to disable breaks in the critical section:

(parameterize-break #f
(semaphore-wait sem)
; do something important
(semaphore-post sem))

This eliminates the race condition, since a break cannot be delivered until the `semaphore-post` executes (and synchronous, non-break exceptions can be protected against via `dynamic-wind` or an exception handler). But this creates a new problem, since if a break is delivered while the code is blocked on the semaphore, it won’t be delivered until the semaphore is posted/unlocked, which may be a very long time. You’d really rather just break the thread, since it hasn’t entered the critical section yet, anyway.

This is what `semaphore-wait/enable-break` is for. You can think of it as a version of `semaphore-wait` that re-enables breaks internally, inside its implementation, and it installs an exception handler to ensure that if a break is delivered at the worst possible moment (after the count has been decremented but before breaks are disabled again), it reverses the change and re-raises the break exception. (I have no idea if this is how it’s actually implemented, but I think it’s an accurate model of its behavior.) This does exactly what we want, since it ensures that if we do enter the critical section, breaks are disabled until we exit it, but we can still be interrupted if we’re blocked waiting to enter it.

So it’s not so much that there’s anything really special going on here, but more that break safety is inherently anti-modular where state is involved, and you can’t implement `semaphore-wait/enable-break`-like constructs if you only have access to the `semaphore-wait`-like sibling.

Jack Firth

unread,
Jan 18, 2020, 3:54:23 AM1/18/20
to Alexis King, Racket Users
I do understand all of that, and you're right that "kill-safe" isn't what I meant.

What I'm confused about is why, if it's inherently not guaranteed to leave the semaphore in a consistent state, semaphore-wait attempts to work at all if breaks are enabled. Why not raise some helpful error like "it's unsafe to wait on a semaphore while breaks are enabled, did you forget to disable breaks?". What's the actual use case for calling semaphore-wait (and not semaphore-wait/enable-break) while breaks are enabled?

Alexis King

unread,
Jan 18, 2020, 4:04:18 AM1/18/20
to Jack Firth, Racket Users
It is guaranteed to leave the semaphore in a consistent state, from the perspective of the implementation of semaphores. No matter what you do, you won’t ever corrupt a semaphore (assuming you’re not using unsafe operations and assuming the runtime is not buggy).

But perhaps you mean inconsistent from the point of view of the application, not from the point of view of the Racket runtime. In that case, it’s true that when using semaphores as locks, using them in a context where breaks are enabled is almost certainly wrong. It’s not immediately clear to me that there aren’t any valid uses of semaphores where you would want breaks to be enabled, but I admit, I have no idea what they are.

Semaphores are low-level primitives, though, so I think it makes some sense for them to just do the minimal possible thing. Perhaps a library ought to offer a slightly more specialized “critical section” abstraction a la Windows (or perhaps something like Haskell’s MVars) that manages disabling interrupts in the critical section for you. (Why doesn’t this exist already? My guess is that most Racket programmers don’t worry about these details, since they don’t call `break-thread` anywhere, and they want SIGINT to just kill their process, anyway.)

Jack Firth

unread,
Jan 18, 2020, 4:10:33 AM1/18/20
to Alexis King, Racket Users
I don't see how it has to do with semaphores being low-level. If waiting on a semaphore while breaks are enabled is almost certainly wrong, checking whether breaks are enabled and raising an error seems like a way more sensible default behavior than just silently doing something that's almost certainly wrong. If car and cdr can check their arguments by default, shouldn't semaphores guard against misuse too?

Alexis King

unread,
Jan 18, 2020, 4:14:31 AM1/18/20
to Jack Firth, Racket Users
Like I said, it isn’t clear to me that all uses of `semaphore-wait` when breaks are enabled are incorrect. You could argue that then you should have a `semaphore-wait/trust-me-even-though-breaks-are-enabled`, and sure, I don’t think that would necessarily be bad. I just imagine the API just wasn’t originally designed that way for some reason or another, possibly simply because it wasn’t considered at the time. Maybe Matthew can give a more satisfying answer, but I don’t know; I’m just speculating.

Jack Firth

unread,
Jan 18, 2020, 4:21:49 AM1/18/20
to Alexis King, Racket Users
It isn't clear to me either. I can't think of a use case for it, but I'm hoping either somebody else can or somebody can confirm that it's not a good API precedent. I'm trying to build some concurrency libraries and I'd like to be sure there isn't some important use case I'm missing.

Alexis King

unread,
Jan 18, 2020, 4:28:46 AM1/18/20
to Jack Firth, Racket Users
Actually, I change my mind, I can trivially think of a case where it’s fine: if you’re just using a semaphore as an event. One thread waits with `semaphore-wait`, another thread calls `semaphore-post`, and after the count is decremented, it’s never re-incremented. It’s just used to gate execution, not guard access to a resource. No need to disable breaks here.

(Also, an aside: I think your `car`/`cdr` example is different, because `car`/`cdr`’s checks on pairs guard against memory corruption in the Racket runtime, and Racket is a memory-safe language. A better comparison would be that `car`/`cdr` don’t check whether or not their argument is a proper list—the higher-level `first`/`rest` do that, instead.)

Jack Firth

unread,
Jan 18, 2020, 4:37:05 AM1/18/20
to Alexis King, Racket Users
Wouldn't you want to force the first thread to wait with semaphore-wait/enable-break in that case? Since if they're disabled then that thread can't be cooperatively terminated. If you use `semaphore-wait` it seems like you completely hand off control over whether breaks are enabled or not, which seems like something that use sites should care about one way or the other. What sort of semaphore-based communication would be truly indifferent to whether breaking is enabled?

Alexis King

unread,
Jan 18, 2020, 4:45:30 AM1/18/20
to Jack Firth, Racket Users
No, I don’t think so, and here’s why: imagine a library provides an abstraction that internally uses semaphores as events. The library uses `semaphore-wait` to wait on the event. The client of this library now has the option to disable breaks if it turns out this code is actually going to be used inside a larger critical section, and they don’t want breaks to be re-enabled by the library! They really want everything in the critical section to keep breaks disabled. So in that case, the break-agnostic behavior of `semaphore-wait` really is the right one.

This is what I mean by semaphore’s being a low-level primitive, though. There are lots of different behaviors one might want that could be better served by higher-level abstractions that can make more assumptions about how they’ll be used, but semaphores have to support all of them. I think it makes sense that they provide the minimal set of behaviors needed to implement those things—it keeps the building blocks as simple and modular as possible. You can always implement the more complex behavior on top, but it’d be annoying to discover you needed to work around the interface trying to protect you from yourself while you’re implementing a new concurrency abstraction.

Jack Firth

unread,
Jan 18, 2020, 5:04:33 AM1/18/20
to Alexis King, Racket Users
I am making a new concurrency abstraction, and I already have to work around the interface because it forces me to make this choice at every use site. What I was planning on doing was pushing this decision into the value itself, rather than the use site. So what if `make-semaphore` had a `#:break-handling-mode` argument that controlled whether or not waiting on that particular semaphore would either enable breaks, or check that breaks or disabled, or neither of those?

Alexis King

unread,
Jan 18, 2020, 5:27:48 AM1/18/20
to Jack Firth, Racket Users
I don’t personally have any problems with Racket’s semaphore interface as it exists today. I think having the choice of whether or not to enable breaks mostly makes sense as something the ambient environment controls, not individual pieces of synchronization logic, since you usually want control structures like `with-handlers` and `dynamic-wind` to be the things that mask interrupts in the appropriate places. A hypothetical `with-critical-section` form would be similar in that respect. This allows a limited form of composability between concurrency constructs that is otherwise hard to achieve.

For the reasons I’ve already given, I think it would be more useful to offer higher-level concurrency primitives like events, mutexes, etc., since those could offer more structure based on the particular use case in question. (Also, I realized Haskell’s MVars are basically just Racket channels, though Racket’s channels don’t have a peek operation.)

More generally, I think Haskell’s concurrency libraries are good prior art here that would be worth looking at. Haskell’s “asynchronous exceptions” are directly analogous to Racket’s breaks, though Haskell allows arbitrary exceptions to be raised asynchronously rather than only allowing the more restrictive interface of `thread-break`. Haskell’s `mask` operator correspond’s to Racket’s `parameterize-break`. Even though the primitives are essentially the same, Haskell’s libraries provide a much richer set of higher-level abstractions, both in the standard library (see Control.Exception and Control.Concurrent.*) and in other packages.

Alexis King

unread,
Jan 18, 2020, 5:34:45 AM1/18/20
to Jack Firth, Racket Users
Oh, an addendum: I would be remiss not to mention the excellent paper on the design of Haskell’s asynchronous exception system, which provides both examples of problems in the wild and more general elaboration on both the design space and the particular point within it the authors chose for Haskell. The paper is “Asynchronous Exceptions in Haskell” by Marlow, Peyton Jones, Moran, and Reppy, and it is available here:


Another thing worth reading is this recent blog post by Simon Marlow (the first author of the aforementioned paper) on asynchronous exceptions:

Jack Firth

unread,
Jan 18, 2020, 5:46:55 AM1/18/20
to Alexis King, Racket Users
I appreciate the sentiment about prior art, but I'm already familiar with both of those links and a significant part of my day job involves working on concurrency frameworks. Specific use cases are more what I'm after. For instance, what would you like to use mutexes for?

Alexis King

unread,
Jan 18, 2020, 6:00:40 AM1/18/20
to Jack Firth, Racket Users
I would use mutexes in relatively standard ways, I think, to protect critical sections that access shared mutable state or external resources that may require some form of serialization. The usual approach of using a semaphore works fine, but it does require the aforementioned break manipulation song and dance to be entirely robust, and it would be nice to not have to worry about it.

On Jan 18, 2020, at 04:46, Jack Firth <jackh...@gmail.com> wrote:



Alexis King

unread,
Jan 18, 2020, 6:06:51 AM1/18/20
to Jack Firth, Racket Users
Oh: something more ambitious that I would enjoy having would be an implementation of IVars and LVars to avoid needing to think about locking entirely.

On Jan 18, 2020, at 05:00, Alexis King <lexi....@gmail.com> wrote:



Jack Firth

unread,
Jan 18, 2020, 6:11:14 AM1/18/20
to Alexis King, Racket Users
Is there a specific instance where you were writing some code and thought mutexes would help, or is there a specific project with a problem you'd like to solve with mutexes? Or IVars and LVars?

I started down this road because I noticed that the test result counting in rackunit seems like it might experience data races if test cases in the same namespace are run from different threads. So I tried to rewrite it with a semaphore-based lock and found it frustrating, and I got it wrong to boot.
Reply all
Reply to author
Forward
0 new messages