More thoughts on Rust: enums, confined proc-macros, SES for Rust

8 views
Skip to first unread message

Jonathan S. Shapiro

unread,
Jan 7, 2026, 7:01:00 PM (2 days ago) Jan 7
to cap-talk
Some further musings about Rust. The first is minor, but it's a hole in the type system and a pragmatic annoyance. The second may actually be interesting if it's feasible.

Enums are not Sum Types

This is a small thing. For those who have more important things to spend time on, the TL;DR version is that [I think] conflating sum types with enumeration types was a bad idea. It leaves an obvious deficiency in the type system around the discriminators, and it mixes up keywords that have almost 70 years of PL history behind them in an unfortunate way.

The context is that I was throwing together a helper library for tokenizers. The library consists of a collection of recognizers that can be instantiated in various ways using closures or literal strings. Each of those instantiations recognizes a token of some sort (or in some cases, a non-token "match" like a comment). The token matches want to get assigned a token type, but the token type enumeration is not defined by the library. It wants to be supplied as a parameterized type. Which turns out to be a very painful thing to do in Rust.

A [closed] enumeration (as opposed to a Rust enum) is usually a way to connect identifiers to numbers within a convenience namespace. It has a concrete size known at compile time, and it is a value type. Setting aside namespaces and match dispatching, it's essentially a newtype on the underlying hardware integer type along with a whole bunch of constant definitions structured in a way that plays nice with the match construct. For all of these reasons, it's very easy to deal with as something used to instantiate a generic type parameter.

A [closed] sum type - especially an unboxed sum type, which is the Rust default - does not have a length that can be known when an unrelated library is compiled, and therefore makes for interesting itchy problems when it is used to instantiate a generic type.

Once you introduce operations that expose sum type discriminators as values, those need a type, which would most naturally be some form of closed enumeration type. That type is missing in Rust.

So if you go and build a library that defines a generic struct "struct Token<TokType>", where TokType is supplied by the library's consumer, the required constraints are unbelievably messy. And when you try to clean that up, you discover that the current implementation of type aliasing isn't type aliasing at all. Type aliasing is supposed to be handled by Beta reduction, but somewhere along the way somebody took a shortcut with the Rust substitution logic that I don't understand. It's a known issue, and I gather some people are looking at how to revise it. I wonder what compatibility issues (if any) will emerge.

Stray Thoughts

This is mostly musing, but I think enums could be re-imagined in a compatible way as parameterized over their underlying discriminator's native integer type, such that the type produced by an enum declaration is something like "NewEnumType: Enum<PI: PrimInt>" where "Enum" is a primordial trait, and PI is the concrete type of the discriminator and is usually resolved by the compiler (this works for both discriminator and sum enumerations). In the presence of a size annotation, compatibility would require that type to be dealt with as well.

The Enum constraint would mean that

fn f(e: Enum<u8>)

accepts any parameter for e that (a) is a Rust enum type, that (b) has an unsigned byte as its discriminator. I don't think this contrived example is useful more than once or twice a millenium, but it would justify "trait<E: Enum<D>> Discrim { fn discriminator(&self) -> D }" which is currently awkward to explain within the type system.

Which brings us to the "lets abuse the type theory" part you've all been sitting on your seats waiting for. Hang on to your chips....

A bunch of people on forums have asked for a way to require discriminator-style enums as parameters for various reasons. I think it's a little late to introduce a kind system into Rust, but if you re-imagine enums as I've just described, then "E where E:Enum<D> + E: D, works quite nicely. The first part says that D is the discriminant type for E. The second part effectively says that E is D (which is actually what we want for discriminator enums).

Dang. Dropped my chips.

Proc-Macro

The fact that I can install a module that publishes a macro that runs arbitrary code at compile time without realizing that I did so is a little disturbing. Yeah, I get that if I'm depending on the code in your module for all of my fans, neighbors, and customers then I ought to share the experience isn't unreasonable. But it moves the problem from "audit before run" to "audit before compile", and I suspect a lot of people don't realize that the threshold of vulnerability moved.

For a while, I was contemplating a similar approach for macros in BitC. The difference is that I planned to require such proc-macros to be escape-free, leaving the compile stage safe.

I realized today that we probably want something a bit stronger than escape-free: a compiler-verifiable attribute that a function is confined. Meaning that it is escape free and that any function references in its result value are references to confined functions. The critical point about confined is that primordial procedures in the standard library that might leak information via system calls are considered "not confined".

Assuming it's computable by the compiler, I think there are two really nice and really valuable properties we might obtain from this:
  1. It's general: modules can annotate which of their exports are confined and the compiler can validate this.
  2. I suppose we could imagine "confined use", meaning that all imported identifiers are required to be confined.
  3. The crate repositories can perform this validation during the CI/CD process while the binary form of the crate is being compiled, and sign the binaries in a way that indicates this validation has been performed.
If that's possible, then the number of crates in the world that present certain kinds of security risks seems to drop quite a bit, and we have an already trusted third party (in this case, crates.io) attesting that the property holds for the binary forms.

We certainly don't want to require that crates on crates.io are confined, but it would be a really nice step forward.

The really tricky part, I think, is that we might need "confined" to become part of function types so that this is modular across independently authored crates, and we would need some scheme that connects the dots in the right ways.

It's sort of like "SES for Rust". Somebody tell MarkM, Crock, and the SES crowd so they can get a smile out of that. It's not dull, at least.

Matt Rice

unread,
Jan 7, 2026, 9:05:06 PM (2 days ago) Jan 7
to cap-...@googlegroups.com
On Thu, Jan 8, 2026 at 12:01 AM Jonathan S. Shapiro
<jonathan....@gmail.com> wrote:
>
> Some further musings about Rust. The first is minor, but it's a hole in the type system and a pragmatic annoyance. The second may actually be interesting if it's feasible.
>
> Enums are not Sum Types
>
> This is a small thing. For those who have more important things to spend time on, the TL;DR version is that [I think] conflating sum types with enumeration types was a bad idea. It leaves an obvious deficiency in the type system around the discriminators, and it mixes up keywords that have almost 70 years of PL history behind them in an unfortunate way.
>
> The context is that I was throwing together a helper library for tokenizers. The library consists of a collection of recognizers that can be instantiated in various ways using closures or literal strings. Each of those instantiations recognizes a token of some sort (or in some cases, a non-token "match" like a comment). The token matches want to get assigned a token type, but the token type enumeration is not defined by the library. It wants to be supplied as a parameterized type. Which turns out to be a very painful thing to do in Rust.
>
> A [closed] enumeration (as opposed to a Rust enum) is usually a way to connect identifiers to numbers within a convenience namespace. It has a concrete size known at compile time, and it is a value type. Setting aside namespaces and match dispatching, it's essentially a newtype on the underlying hardware integer type along with a whole bunch of constant definitions structured in a way that plays nice with the match construct. For all of these reasons, it's very easy to deal with as something used to instantiate a generic type parameter.
>
> A [closed] sum type - especially an unboxed sum type, which is the Rust default - does not have a length that can be known when an unrelated library is compiled, and therefore makes for interesting itchy problems when it is used to instantiate a generic type.
>
> Once you introduce operations that expose sum type discriminators as values, those need a type, which would most naturally be some form of closed enumeration type. That type is missing in Rust.
>
> So if you go and build a library that defines a generic struct "struct Token<TokType>", where TokType is supplied by the library's consumer, the required constraints are unbelievably messy. And when you try to clean that up, you discover that the current implementation of type aliasing isn't type aliasing at all. Type aliasing is supposed to be handled by Beta reduction, but somewhere along the way somebody took a shortcut with the Rust substitution logic that I don't understand. It's a known issue, and I gather some people are looking at how to revise it. I wonder what compatibility issues (if any) will emerge.
>

I have felt like this is a valid criticism, and been kicked by that horse too.
I'm probably going to also bleed into your next stray thought here too but
I'm curious if you've looked at #[repr(...)] types yet, and their RFC
extension pattern types.
https://doc.rust-lang.org/stable/unstable-book/language-features/pattern-types.html
pattern types are kind of an extension of repr

The following code fails to compile here with the error below:

```rust
#[repr(u8)]
enum Foo {
Bar = 1
}

#[repr(u8)]
#[derive(Debug)]
enum Bar {
Baz(u8) = 2,
}

fn main() {
let x = Foo::Bar;
let y = Bar::Baz(3);
eprintln!("{}", x as u8);
eprintln!("{}", y as u8);
}
```

```console
Compiling playground v0.0.1 (/playground)
error[E0605]: non-primitive cast: `Bar` as `u8`
--> src/main.rs:16:19
|
16 | eprintln!("{}", y as u8);
| ^^^^^^^ an `as` expression can be used to
convert enum types to numeric types only if the enum type is unit-only
or field-less
|
= note: see https://doc.rust-lang.org/reference/items/enumerations.html#casting
for more information

```

Now this doesn't really work in a generic context, for arbitrary T.
There are some proc macro crates that work with enums, in particular
the strum crate...
It acts kind of weird in that it has a function that gives each enum
variant a tag.

Anyhow the way I've approached the problem is based on a fork the
strum proc macro,
I never released this on crates.io the crate needs work because I only
managed to upstream half of it to strum,
and it's better to just avoid proc_macro inheritance (for lack of a
better term).

https://github.com/ratmice/enum_extra

This defines a proc macro which derives a trait, the trait is only
derived for macros that are sum-type like,
and proves some property about the enum variants, like the

#[derive(NonZeroRepr, EnumMetadata)]
#[repr(i32)
enum CompileFail {
// Should fail to compile because the NonZeroRepr proc_macro panic's
the compilation..
A = X + 1,
}

If your proc_macro's then emit assertions about this kind of thing,
you can then get pretty good code generation
that will optimize well even though e.g. in this case we're routing
through i32 which doesn't provide the property that
it is NonZero.

The way this works is that the `EnumMetadata` trait was a trait that
derives a bunch of associated consts such as max variant number,
and associated types like the `i32` of the given repr.

NonZeroRepr then maps the i32 -> NonZeroI32 and so on.


> Stray Thoughts
>
> This is mostly musing, but I think enums could be re-imagined in a compatible way as parameterized over their underlying discriminator's native integer type, such that the type produced by an enum declaration is something like "NewEnumType: Enum<PI: PrimInt>" where "Enum" is a primordial trait, and PI is the concrete type of the discriminator and is usually resolved by the compiler (this works for both discriminator and sum enumerations). In the presence of a size annotation, compatibility would require that type to be dealt with as well.
>
> The Enum constraint would mean that
>
> fn f(e: Enum<u8>)
>
>
> accepts any parameter for e that (a) is a Rust enum type, that (b) has an unsigned byte as its discriminator. I don't think this contrived example is useful more than once or twice a millenium, but it would justify "trait<E: Enum<D>> Discrim { fn discriminator(&self) -> D }" which is currently awkward to explain within the type system.
>
> Which brings us to the "lets abuse the type theory" part you've all been sitting on your seats waiting for. Hang on to your chips....
>
> A bunch of people on forums have asked for a way to require discriminator-style enums as parameters for various reasons. I think it's a little late to introduce a kind system into Rust, but if you re-imagine enums as I've just described, then "E where E:Enum<D> + E: D, works quite nicely. The first part says that D is the discriminant type for E. The second part effectively says that E is D (which is actually what we want for discriminator enums).
>
> Dang. Dropped my chips.
>
> Proc-Macro
>
> The fact that I can install a module that publishes a macro that runs arbitrary code at compile time without realizing that I did so is a little disturbing. Yeah, I get that if I'm depending on the code in your module for all of my fans, neighbors, and customers then I ought to share the experience isn't unreasonable. But it moves the problem from "audit before run" to "audit before compile", and I suspect a lot of people don't realize that the threshold of vulnerability moved.
>
> For a while, I was contemplating a similar approach for macros in BitC. The difference is that I planned to require such proc-macros to be escape-free, leaving the compile stage safe.
>
> I realized today that we probably want something a bit stronger than escape-free: a compiler-verifiable attribute that a function is confined. Meaning that it is escape free and that any function references in its result value are references to confined functions. The critical point about confined is that primordial procedures in the standard library that might leak information via system calls are considered "not confined".
>
> Assuming it's computable by the compiler, I think there are two really nice and really valuable properties we might obtain from this:
>
> It's general: modules can annotate which of their exports are confined and the compiler can validate this.
> I suppose we could imagine "confined use", meaning that all imported identifiers are required to be confined.
> The crate repositories can perform this validation during the CI/CD process while the binary form of the crate is being compiled, and sign the binaries in a way that indicates this validation has been performed.
>
> If that's possible, then the number of crates in the world that present certain kinds of security risks seems to drop quite a bit, and we have an already trusted third party (in this case, crates.io) attesting that the property holds for the binary forms.
>
> We certainly don't want to require that crates on crates.io are confined, but it would be a really nice step forward.
>
> The really tricky part, I think, is that we might need "confined" to become part of function types so that this is modular across independently authored crates, and we would need some scheme that connects the dots in the right ways.
>
> It's sort of like "SES for Rust". Somebody tell MarkM, Crock, and the SES crowd so they can get a smile out of that. It's not dull, at least.
>
> --
> You received this message because you are subscribed to the Google Groups "cap-talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAAP%3D3QOiceO5kGUhts722FwO1gSKML6yv%2BGA2R52oCaLZsoh8w%40mail.gmail.com.

Matt Rice

unread,
Jan 7, 2026, 9:41:38 PM (2 days ago) Jan 7
to cap-...@googlegroups.com
I guess to summarize my thoughts on this, when writing generic code
over enum's that need to deal directly with discriminants
one way is proc_macro's, they can inspect each variant and ensure that
it has a numeric representation, or provide more invariants
(in my example that variant repr's are all non-zero).

Unlike with normal generics where a generic type doesn't actually have
any kind of trait for casting to numeric types, the derive proc_macros
run on the actual underlying concrete type, then your generic code can
require bounds on the trait.

I don't think because proc_macro's run across crate boundaries there
is a way to seal the trait ensuring that a manual derive of the trait
can't intentionally derive it for an enum that doesn't conform to the
invariant.

It isn't like I like proc_macro's for this, but I really don't know another way.

Kevin Reid

unread,
Jan 8, 2026, 12:11:00 AM (yesterday) Jan 8
to cap-...@googlegroups.com
On Wed, Jan 7, 2026 at 4:01 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:
Proc-Macro

The fact that I can install a module that publishes a macro that runs arbitrary code at compile time without realizing that I did so is a little disturbing. Yeah, I get that if I'm depending on the code in your module for all of my fans, neighbors, and customers then I ought to share the experience isn't unreasonable. But it moves the problem from "audit before run" to "audit before compile", and I suspect a lot of people don't realize that the threshold of vulnerability moved.

A couple of pieces of context, nothing to do with the current implementation, but about future plans:
  •  There is an accepted proposal that Cargo and rustc should support sandboxing of all build-time execution. In the case of proc macros, this may be able to take the form of confining them solely to accept tokens and produce tokens. Implementation work has not started yet; this is just “we are on board with moving in this direction eventually” as opposed to rejecting the idea.
  • However, the compiler team has also stated (though I don’t have a citation handy) that they do not want to declare the compiler itself a security boundary, that is, it is not intended to be robust against untrusted input code. This is primarily not wanting to take on the maintenance burden of responding to vulnerability reports, I believe.

Matt Rice

unread,
Jan 8, 2026, 6:41:19 PM (16 hours ago) Jan 8
to cap-...@googlegroups.com
Only one thing to add which is probably obvious but, lockfiles are
also important to prevent semver upgrades because "audit before
compile" is also "audit before upgrading each dependency version"
which happens automatically during compile without lockfiles, so
really without lockfiles it seems more like audit before each compile,
and hope you win the race condition...
Reply all
Reply to author
Forward
0 new messages