On Thu, Jan 8, 2026 at 12:01 AM Jonathan S. Shapiro
<
jonathan....@gmail.com> wrote:
>
> Some further musings about Rust. The first is minor, but it's a hole in the type system and a pragmatic annoyance. The second may actually be interesting if it's feasible.
>
> Enums are not Sum Types
>
> This is a small thing. For those who have more important things to spend time on, the TL;DR version is that [I think] conflating sum types with enumeration types was a bad idea. It leaves an obvious deficiency in the type system around the discriminators, and it mixes up keywords that have almost 70 years of PL history behind them in an unfortunate way.
>
> The context is that I was throwing together a helper library for tokenizers. The library consists of a collection of recognizers that can be instantiated in various ways using closures or literal strings. Each of those instantiations recognizes a token of some sort (or in some cases, a non-token "match" like a comment). The token matches want to get assigned a token type, but the token type enumeration is not defined by the library. It wants to be supplied as a parameterized type. Which turns out to be a very painful thing to do in Rust.
>
> A [closed] enumeration (as opposed to a Rust enum) is usually a way to connect identifiers to numbers within a convenience namespace. It has a concrete size known at compile time, and it is a value type. Setting aside namespaces and match dispatching, it's essentially a newtype on the underlying hardware integer type along with a whole bunch of constant definitions structured in a way that plays nice with the match construct. For all of these reasons, it's very easy to deal with as something used to instantiate a generic type parameter.
>
> A [closed] sum type - especially an unboxed sum type, which is the Rust default - does not have a length that can be known when an unrelated library is compiled, and therefore makes for interesting itchy problems when it is used to instantiate a generic type.
>
> Once you introduce operations that expose sum type discriminators as values, those need a type, which would most naturally be some form of closed enumeration type. That type is missing in Rust.
>
> So if you go and build a library that defines a generic struct "struct Token<TokType>", where TokType is supplied by the library's consumer, the required constraints are unbelievably messy. And when you try to clean that up, you discover that the current implementation of type aliasing isn't type aliasing at all. Type aliasing is supposed to be handled by Beta reduction, but somewhere along the way somebody took a shortcut with the Rust substitution logic that I don't understand. It's a known issue, and I gather some people are looking at how to revise it. I wonder what compatibility issues (if any) will emerge.
>
I have felt like this is a valid criticism, and been kicked by that horse too.
I'm probably going to also bleed into your next stray thought here too but
I'm curious if you've looked at #[repr(...)] types yet, and their RFC
extension pattern types.
https://doc.rust-lang.org/stable/unstable-book/language-features/pattern-types.html
pattern types are kind of an extension of repr
The following code fails to compile here with the error below:
```rust
#[repr(u8)]
enum Foo {
Bar = 1
}
#[repr(u8)]
#[derive(Debug)]
enum Bar {
Baz(u8) = 2,
}
fn main() {
let x = Foo::Bar;
let y = Bar::Baz(3);
eprintln!("{}", x as u8);
eprintln!("{}", y as u8);
}
```
```console
Compiling playground v0.0.1 (/playground)
error[E0605]: non-primitive cast: `Bar` as `u8`
--> src/main.rs:16:19
|
16 | eprintln!("{}", y as u8);
| ^^^^^^^ an `as` expression can be used to
convert enum types to numeric types only if the enum type is unit-only
or field-less
|
= note: see
https://doc.rust-lang.org/reference/items/enumerations.html#casting
for more information
```
Now this doesn't really work in a generic context, for arbitrary T.
There are some proc macro crates that work with enums, in particular
the strum crate...
It acts kind of weird in that it has a function that gives each enum
variant a tag.
Anyhow the way I've approached the problem is based on a fork the
strum proc macro,
I never released this on
crates.io the crate needs work because I only
managed to upstream half of it to strum,
and it's better to just avoid proc_macro inheritance (for lack of a
better term).
https://github.com/ratmice/enum_extra
This defines a proc macro which derives a trait, the trait is only
derived for macros that are sum-type like,
and proves some property about the enum variants, like the
#[derive(NonZeroRepr, EnumMetadata)]
#[repr(i32)
enum CompileFail {
// Should fail to compile because the NonZeroRepr proc_macro panic's
the compilation..
A = X + 1,
}
If your proc_macro's then emit assertions about this kind of thing,
you can then get pretty good code generation
that will optimize well even though e.g. in this case we're routing
through i32 which doesn't provide the property that
it is NonZero.
The way this works is that the `EnumMetadata` trait was a trait that
derives a bunch of associated consts such as max variant number,
and associated types like the `i32` of the given repr.
NonZeroRepr then maps the i32 -> NonZeroI32 and so on.
> Stray Thoughts
>
> This is mostly musing, but I think enums could be re-imagined in a compatible way as parameterized over their underlying discriminator's native integer type, such that the type produced by an enum declaration is something like "NewEnumType: Enum<PI: PrimInt>" where "Enum" is a primordial trait, and PI is the concrete type of the discriminator and is usually resolved by the compiler (this works for both discriminator and sum enumerations). In the presence of a size annotation, compatibility would require that type to be dealt with as well.
>
> The Enum constraint would mean that
>
> fn f(e: Enum<u8>)
>
>
> accepts any parameter for e that (a) is a Rust enum type, that (b) has an unsigned byte as its discriminator. I don't think this contrived example is useful more than once or twice a millenium, but it would justify "trait<E: Enum<D>> Discrim { fn discriminator(&self) -> D }" which is currently awkward to explain within the type system.
>
> Which brings us to the "lets abuse the type theory" part you've all been sitting on your seats waiting for. Hang on to your chips....
>
> A bunch of people on forums have asked for a way to require discriminator-style enums as parameters for various reasons. I think it's a little late to introduce a kind system into Rust, but if you re-imagine enums as I've just described, then "E where E:Enum<D> + E: D, works quite nicely. The first part says that D is the discriminant type for E. The second part effectively says that E is D (which is actually what we want for discriminator enums).
>
> Dang. Dropped my chips.
>
> Proc-Macro
>
> The fact that I can install a module that publishes a macro that runs arbitrary code at compile time without realizing that I did so is a little disturbing. Yeah, I get that if I'm depending on the code in your module for all of my fans, neighbors, and customers then I ought to share the experience isn't unreasonable. But it moves the problem from "audit before run" to "audit before compile", and I suspect a lot of people don't realize that the threshold of vulnerability moved.
>
> For a while, I was contemplating a similar approach for macros in BitC. The difference is that I planned to require such proc-macros to be escape-free, leaving the compile stage safe.
>
> I realized today that we probably want something a bit stronger than escape-free: a compiler-verifiable attribute that a function is confined. Meaning that it is escape free and that any function references in its result value are references to confined functions. The critical point about confined is that primordial procedures in the standard library that might leak information via system calls are considered "not confined".
>
> Assuming it's computable by the compiler, I think there are two really nice and really valuable properties we might obtain from this:
>
> It's general: modules can annotate which of their exports are confined and the compiler can validate this.
> I suppose we could imagine "confined use", meaning that all imported identifiers are required to be confined.
> The crate repositories can perform this validation during the CI/CD process while the binary form of the crate is being compiled, and sign the binaries in a way that indicates this validation has been performed.
>
> If that's possible, then the number of crates in the world that present certain kinds of security risks seems to drop quite a bit, and we have an already trusted third party (in this case,
crates.io) attesting that the property holds for the binary forms.
>
> We certainly don't want to require that crates on
crates.io are confined, but it would be a really nice step forward.
>
> The really tricky part, I think, is that we might need "confined" to become part of function types so that this is modular across independently authored crates, and we would need some scheme that connects the dots in the right ways.
>
> It's sort of like "SES for Rust". Somebody tell MarkM, Crock, and the SES crowd so they can get a smile out of that. It's not dull, at least.
>
> --
> You received this message because you are subscribed to the Google Groups "cap-talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
cap-talk+u...@googlegroups.com.
> To view this discussion visit
https://groups.google.com/d/msgid/cap-talk/CAAP%3D3QOiceO5kGUhts722FwO1gSKML6yv%2BGA2R52oCaLZsoh8w%40mail.gmail.com.