--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CAAP%3D3QOcKgj%3D%3Dc27G2kBV7f6KqTCPquT89YZ7SnSAZ1DhzRLtw%40mail.gmail.com.
Borrow Checker You hear a lot of people talking about the borrow checker and how it is hard to understand. I've hit a couple of cases where an item was retired by a copy I didn't recognize, but the diagnostics from the compiler are quite good at explaining what happened - I'm sure this wasn't always the case.
I've hit two issues, one mostly just surprising and the other annoying. The surprising one is the tendency to introduce borrows of borrows of borrows that aren't necessary. Hypothesis: if "&(**foo)" works, the analyzer probably shouldn't have suggested deeper borrowing.
cargo-mk wants to do dependency ordering, which leaves me with two indexing structures: a HashMap for name lookup and a vector for order-of-build. There's a cycle check in there, and it would be very natural in other languages to have a boolean field saying "we've already checked this one, and it is cycle free". But this requires mutability, and rust won't let you do that straightforwardly when the data structure is indexed in more than one way. I'm aware of several workarounds and I used one.
Lifetime Variables This is one of the weak spots, but it's mainly a matter of missing documentation. I think these are actually region type variables (c.f. Cyclone), but even now I'm not sure.
When the compiler demands introduction of lifetime variables, it doesn't say why, and the inference rules are not clear.
I find that if I assume they are actually region type variables and decorate accordingly I can get by, but it would be really nice (a) to have that intuition confirmed or corrected, and (b) to better understand what the inference rules are. If I build a list of Box<& 'a Foo>, where Foo has lifetime 'a, what lifetime is assigned to the boxes?
The 'static LifespanThe 'static lifespan, which the Coyotos kernel and any fast tokenizer are going to rely on a lot, is subtly dangerous. I don't have an intrinsic problem with this - it's doing what I told it to do - but (contrary to documentation) it isn't safe. It's surprisingly easy to generate crashes that the compiler does not diagnose.If V is a vector, and Vslice is a slice of that vector, then any operation that re-sizes V invalidates Vslice leaving dangling pointers. For ordinary lifespans this is diagnosed by the compiler as a lifetime violation, because the slice lifetimes must exit before the vector lifetime, but for 'static there appears to be a reductio bug (I need to test this).Since 'static is (effectively) the "forever" lifespan, it's reasonable to imagine that the reference lifespan induction "grounds out" at 'static, and the compiler therefore allows a 'static slice to be constructed from a 'static vector. The problem with this is that it isn't correct when the 'static vector is mutable and can be resized.
Exports Puzzle MeIf we have a workspace containing crates A and B-lib, where A depends on B-lib, we may not want to export B-lib public items to consumers outside the workspace. Same issue for a package having multiple sub-crates. It may also be true that we have fields that should be visible within the package but not visible to the public.I think this is something I don't understand yet, but it provisionally feels to me as if visibility is half-baked. I obviously need to read up.
It would be really helpful to have a tool I could point at a crate, package, or workspace that would tell me what symbols are exported. Does such a tool exist?
The main issue I see with build.rs is that it causes common tools to get re-built in every consuming build. If capidl is bound as a library, then it is compiled into the build.rs binary in every consuming workspace. It is totally unclear to me why that should be necessary even given the desire for artifact traceability.
Indeed, the rustc developers realized this would be a problem, and put a lot of effort into building helpfully structured error messages.
I've hit two issues, one mostly just surprising and the other annoying. The surprising one is the tendency to introduce borrows of borrows of borrows that aren't necessary. Hypothesis: if "&(**foo)" works, the analyzer probably shouldn't have suggested deeper borrowing.Can you say more about where you are seeing this suggestion? In most cases (not all!), either one `&` or one `*` is sufficient. Perhaps you are confusing rust-analyzer’s “inlay hints”, that are telling you what borrowing is automatically happening, with suggestions to add it explicitly?
cargo-mk wants to do dependency ordering, which leaves me with two indexing structures: a HashMap for name lookup and a vector for order-of-build. There's a cycle check in there, and it would be very natural in other languages to have a boolean field saying "we've already checked this one, and it is cycle free". But this requires mutability, and rust won't let you do that straightforwardly when the data structure is indexed in more than one way. I'm aware of several workarounds and I used one.I would say that the most appropriate solution *for a marking problem* is to use interior mutability — either Cell<bool> (for single-threaded-only programs) or AtomicBool.
let namemap: HashMap<String, &mut MyStruct> = HashMap::new()let vec_of_map: Vector<& MyStruct> = vec![]
let safemap: HashMap<String, bool> ...
Lifetime Variables This is one of the weak spots, but it's mainly a matter of missing documentation. I think these are actually region type variables (c.f. Cyclone), but even now I'm not sure.I’m not up on programming-language-theory enough to tell you the right technical terms, but the way I think about it is that a lifetime (not a lifetime variable!) specifies an end-point in time, at which some set of references becomes invalid (when they must not be used). Note I said “end-point” — this is because the start-point is defined by the reference value coming to exist. (Therefore, it is possible to introduce new references under already-existing lifetimes, as long as they will continue to be valid for long enough.)
When the compiler demands introduction of lifetime variables, it doesn't say why, and the inference rules are not clear.The inference rules for contexts where an explicit variable might or might be required are the lifetime elision rules: <https://doc.rust-lang.org/reference/lifetime-elision.html#lifetime-elision-in-functions>Note that while these rules involve types, they are driven purely by the grammar of types (that is, “does this generic type have a lifetime parameter?”), not by more elaborate type inference.
& 'b Box::new(item: & 'a T) [implicitly: where 'b < 'a because the box is constructed from the item]
I find that if I assume they are actually region type variables and decorate accordingly I can get by, but it would be really nice (a) to have that intuition confirmed or corrected, and (b) to better understand what the inference rules are. If I build a list of Box<& 'a Foo>, where Foo has lifetime 'a, what lifetime is assigned to the boxes?“What lifetime is assigned to the boxes?” is an ill-formed question. The type Box<i32> contains no lifetimes, and the type Box<&'a Foo> contains one lifetime. A common confusion is to think that lifetimes describe how long values exist — this is not true. Lifetimes describe how long borrows exist, or how long references are valid for. Box<i32> is not a reference and does not contain any references.
The 'static LifespanThe 'static lifespan, which the Coyotos kernel and any fast tokenizer are going to rely on a lot, is subtly dangerous. I don't have an intrinsic problem with this - it's doing what I told it to do - but (contrary to documentation) it isn't safe. It's surprisingly easy to generate crashes that the compiler does not diagnose.If V is a vector, and Vslice is a slice of that vector, then any operation that re-sizes V invalidates Vslice leaving dangling pointers. For ordinary lifespans this is diagnosed by the compiler as a lifetime violation, because the slice lifetimes must exit before the vector lifetime, but for 'static there appears to be a reductio bug (I need to test this).Since 'static is (effectively) the "forever" lifespan, it's reasonable to imagine that the reference lifespan induction "grounds out" at 'static, and the compiler therefore allows a 'static slice to be constructed from a 'static vector. The problem with this is that it isn't correct when the 'static vector is mutable and can be resized.I don't have enough information to tell you what exactly is wrong, but from this description, you have written unsafe code that contains undefined behavior. It is not possible (without exploiting very tricky type-system holes) to use 'static to create dangling pointers.
If you have an &'static reference to a vector, then mutating that vector by other means than that specific reference is UB.
let buf & mut [u8] = vec![]let buf_slice = buf[1:] // error
Exports Puzzle MeIf we have a workspace containing crates A and B-lib, where A depends on B-lib, we may not want to export B-lib public items to consumers outside the workspace. Same issue for a package having multiple sub-crates. It may also be true that we have fields that should be visible within the package but not visible to the public.I think this is something I don't understand yet, but it provisionally feels to me as if visibility is half-baked. I obviously need to read up.Rust does not, currently, support any notion of fine-grained visibility beyond the crate, only visibility within modules. Are you sure you need separate library crates and not modules?
It would be really helpful to have a tool I could point at a crate, package, or workspace that would tell me what symbols are exported. Does such a tool exist?It is conventional to use rustdoc (cargo doc --open) for this purpose, and consider whatever you can see in the documentation to be what is public for API design and stability purposes.
The main issue I see with build.rs is that it causes common tools to get re-built in every consuming build. If capidl is bound as a library, then it is compiled into the build.rs binary in every consuming workspace. It is totally unclear to me why that should be necessary even given the desire for artifact traceability.There is work towards Cargo being able to share artifacts between multiple workspaces. (This will not completely prevent all rebuilds, because different workspaces can have different compilation options and package features, but it will help a lot.)
The main issue I see with build.rs is that it causes common tools to get re-built in every consuming build. If capidl is bound as a library, then it is compiled into the build.rs binary in every consuming workspace. It is totally unclear to me why that should be necessary even given the desire for artifact traceability. And it has the (perhaps unintended) consequence that pre-build artifacts are successfully obscured from the human developer - one must dig into build.rs to find out what they are.
On Sat, Jan 3, 2026 at 3:46 PM Kevin Reid <kpr...@switchb.org> wrote:cargo-mk wants to do dependency ordering, which leaves me with two indexing structures: a HashMap for name lookup and a vector for order-of-build. There's a cycle check in there, and it would be very natural in other languages to have a boolean field saying "we've already checked this one, and it is cycle free". But this requires mutability, and rust won't let you do that straightforwardly when the data structure is indexed in more than one way. I'm aware of several workarounds and I used one.I would say that the most appropriate solution *for a marking problem* is to use interior mutability — either Cell<bool> (for single-threaded-only programs) or AtomicBool.But if I understand the borrow rules, that doesn't actually work. In the case at hand, we have:let namemap: HashMap<String, &mut MyStruct> = HashMap::new()let vec_of_map: Vector<& MyStruct> = vec![]We want those references to refer to the same object, with the result that setting HashMap["some/path"].safe = true is visible when accessing the same MyMap within vec_of_map. But (a) the borrow rules don't allow this aliasing, and (b) we kind of want the rest of the structure to be immutable.
I’m not up on programming-language-theory enough to tell you the right technical terms, but the way I think about it is that a lifetime (not a lifetime variable!) specifies an end-point in time, at which some set of references becomes invalid (when they must not be used). Note I said “end-point” — this is because the start-point is defined by the reference value coming to exist. (Therefore, it is possible to introduce new references under already-existing lifetimes, as long as they will continue to be valid for long enough.)For clarity: when we are asked to add a lifetime annotation 'a, the 'a is not a lifetime. It is a lifetime type variable, used mainly to unify lifetimes within a type declaration.
The inference rules for contexts where an explicit variable might or might be required are the lifetime elision rules: <https://doc.rust-lang.org/reference/lifetime-elision.html#lifetime-elision-in-functions>Note that while these rules involve types, they are driven purely by the grammar of types (that is, “does this generic type have a lifetime parameter?”), not by more elaborate type inference.We may be talking past each other here. If you have a construction involving many elements, some annotated with lifetime variable 'a and others annotated with lifetime variable 'b, the rule is that all 'a annotations denote the same lifetime, all 'b annotations denote the same lifetime, and the relationship between 'a and 'b (if any can be established) is determined by inference constraints. For example:& 'b Box::new(item: & 'a T) [implicitly: where 'b < 'a because the box is constructed from the item]
means that the lifetime of the constructed box (the 'b) is less than the lifetime of the [interior] boxed type. Formally, the constraint here is 'b < 'a. Though in the absence of explicit T::Drop it would technically be acceptable for the constraint to be "'b <= 'a". That isn't the rust specification; I'm saying it's okay as a matter of correctness as long as there are no finalizers for T.'
The reason I'm harping on this a little is that if you have something with 'static lifespan way up at the beginning of the call stack, it can unify its way down the call stack through the lifespan type variables, and you can find yourself very deep in the call stack constructing an instance with lifespan 'a = 'b = ... = 'z = 'static by virtue of chained lifetime variable inference. Which can lead to very interesting, and very useful, and occasionally very confusing lifetimes getting bound to allocations that appear deep in the call stack.
“What lifetime is assigned to the boxes?” is an ill-formed question. The type Box<i32> contains no lifetimes, and the type Box<&'a Foo> contains one lifetime. A common confusion is to think that lifetimes describe how long values exist — this is not true. Lifetimes describe how long borrows exist, or how long references are valid for. Box<i32> is not a reference and does not contain any references.From a type theory perspective, that is not correct. Every construction implicitly has a lifetime variable for the constructed thing.
Well, I'm not sure how tricky the type system holes are, but I agree with you. My point was that these type system holes are undiagnosed by the compiler, and that is a bug.
If you have an &'static reference to a vector, then mutating that vector by other means than that specific reference is UB.Yes. But I'm saying it shouldn't be. I'm saying that it's a type error, and it should be diagnosed. If taking a slice of a mutable vector and then appending to the underlying vector is undefined behavior, then any Rust claim to memory safety is just flatly and unequivocally a lie. To be clear, I think this is a lifespan handling bug in the compiler rather than a fundamental flaw in the language. I'm just saying there's a bug. Given:let buf & mut [u8] = vec![]let buf_slice = buf[1:] // errorI claim that construction of the slice is a [static] type error because the vector is mutable and can therefore be appended with the result of invalidating the underlying storage locations and leaving the slice value unsafe.
It is conventional to use rustdoc (cargo doc --open) for this purpose, and consider whatever you can see in the documentation to be what is public for API design and stability purposes.I'm not sure that has any relationship to what is public from a language definition perspective. There's no guarantee that a publicly exported function has any documentation at all.
What I'm after is "here is the set of symbols that are recognized as resolvable at the linker level of abstraction."
There is work towards Cargo being able to share artifacts between multiple workspaces. (This will not completely prevent all rebuilds, because different workspaces can have different compilation options and package features, but it will help a lot.)That seems constructive, and yes, certain options dictate distinct builds. I'm a little suspicious that feature unification across target and host builds (workspaces and build-workspaces, if you will) may turn out to be entertaining.
I guess I have a little difficulty with Kevin's explanation that they are borrowings not objects when we look at a particularly weird case like, https://github.com/ratmice/fnmutant/blob/master/src/lib.rs
It isn't clear what is being borrowed here, the type variables are just generic types which can be owned/borrowed/whatever. the usage of the type variable `for <'a> Fn(...)` is saying that the function is valid for all lifetimes 'a.
I guess what I'm really trying to say hereregardless of that code, is that implied bounds seem to act in mysterious ways on *types* rather than on borrows.
On Sat, Jan 3, 2026 at 10:26 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:But if I understand the borrow rules, that doesn't actually work. In the case at hand, we have:let namemap: HashMap<String, &mut MyStruct> = HashMap::new()let vec_of_map: Vector<& MyStruct> = vec![]We want those references to refer to the same object, with the result that setting HashMap["some/path"].safe = true is visible when accessing the same MyMap within vec_of_map. But (a) the borrow rules don't allow this aliasing, and (b) we kind of want the rest of the structure to be immutable.The point of using Cell or atomics is that you then don't require &mut to change the value of the cell. (&mut is a bit of a misnomer; it is often more accurate to describe it as an exclusive reference than a mutable reference.) But perhaps I don't understand the situation.
Also, my general advice to Rust beginners is that you should refrain from building primary data structures out of references.
let namemap: HashMap<String, &mut 'static FileInfo> = HashMap::new()let vec_of_map: Vector<& 'static FileInfo> = vec![]
Cell<& 'static FileInfo>?
& 'static Cell<& 'static FileInfo>
The simple rule of thumb is that references should be used for temporary purposes only. As a refinement of this, in applications like compilers which have phases/passes where some results are computed in each pass, it's okay to take long-lived & references to pass 1 while in pass 2. But it’s not clear to me whether that’s what you are doing.
For clarity: when we are asked to add a lifetime annotation 'a, the 'a is not a lifetime. It is a lifetime type variable, used mainly to unify lifetimes within a type declaration.Yes, that’s right. But lifetime variables are used to discuss lifetimes, so knowing what lifetimes are is the first step to understanding lifetime variables.
The inference rules for contexts where an explicit variable might or might be required are the lifetime elision rules: <https://doc.rust-lang.org/reference/lifetime-elision.html#lifetime-elision-in-functions>Note that while these rules involve types, they are driven purely by the grammar of types (that is, “does this generic type have a lifetime parameter?”), not by more elaborate type inference.
'b does not refer to a lifetime “of the box”. It is a lifetime of a particular borrow of the box. This is a key distinction, because whenever the box's contents are mutated, this invalidates all prior borrows of the box (ends their lifetime).
IIUC, the problem here is that Rust’s “lifetimes” are not the same thing as type theory’s “lifetimes”. As I said above, Rust lifetimes are of borrowings, not objects. It is certainly an unfortunate choice of terms, but you need to keep this distinction in mind or you will continue to be confused by Rust.
Compiler bugs are of course possible, but my claim is that it is more likely that you have written incorrect unsafe code, than that you have hit a compiler bug. (Of course, that is easily disproven if your program contains no unsafe code.) Stand-alone compilable sample code to discuss would be very helpful to this discussion.
To refine my previous message: you can already configure Cargo to use the same build directory for multiple workspaces...
The most common and dramatic example of this is that T: 'static does not mean that values of type T must continue to exist until the program exits; rather, it means that a value of type T can be kept around until the program exits, if its owner so chooses.
On Mon, Jan 5, 2026 at 5:37 AM Kevin Reid <kpr...@switchb.org> wrote:
> Or, in another perspective, it means “if values of type T contain any references, they must be &'static references” (but that’s a simplification, because it’s possible for a type to be meaningfully constrained by a lifetime parameter without any actual references per se existing within it).
Anyhow I think it is a nice simplification, since once you've gone
outside of that you are probably finding yourself in the weeds.
Turns out I simplified my example in counterproductive ways. The actual code is a tokenizer that uses permanent & 'static [u8] buffers for the content of each file, and creates & 'static FileInfo structures to associate a file name or input source with the buffer. So the actual data structure pair is closer to:let namemap: HashMap<String, &mut 'static FileInfo> = HashMap::new()let vec_of_map: Vector<& 'static FileInfo> = vec![]and it is important here that the same FileInfo reference be able to be stored in both collections. The problem here is that one of them needs to be mutable so that one of its fields can be updated later. Once updated, that update needs to be visible when looked up from *either* container.
I'm not sure where the Cell wrapper would go here?
'b does not refer to a lifetime “of the box”. It is a lifetime of a particular borrow of the box. This is a key distinction, because whenever the box's contents are mutated, this invalidates all prior borrows of the box (ends their lifetime).This statement was very helpful. It suggests that what Rust is calling "lifetimes" is actually something that has historically been called "extent". Extents describe the lifetimes of bindings. When prior borrows are invalidated, I now think what's actually happening is that their bindings are revoked and the extent of those bindings therefore ends. After that, those names can no longer be used to reference the data they used to reference.
This is distinct from "liveness", which describes the lifetime of the underlying value. Reclamation can only occur safely when liveness ends, and liveness cannot end while bindings with valid extents continue to exist.
Coming from a hardware point of view, the fact that rust has undefined behavior seems like a huge red flag. In the unsafe language subset, undefined behavior is sometimes unavoidable because hardware has it and then you're stuck with it. But in the safe part of the language, undefined behaviot amounts to saying that the language doesn't have defined semantics.
On Sun, Jan 4, 2026 at 9:37 PM Kevin Reid <kpr...@switchb.org> wrote:The most common and dramatic example of this is that T: 'static does not mean that values of type T must continue to exist until the program exits; rather, it means that a value of type T can be kept around until the program exits, if its owner so chooses.
Hmm. I see what you are saying, but if I understand things correctly the only ways you can end up with 'static are something being global, something getting re-owned into a 'static container (a reductio case), something in the heap that has been leaked, or something that gets boxed by box::new() that has a lifetime variable in the right position that got unified with one of the previous cases.
At very least there is a difference in lifetime requirements once 'static enters the picture. For everything else, the rule is that if A [might] hold a reference to B then 'a < 'b (where these are the associated lifetime variables for the reference to A and the contained reference to B, respectively). But in the case where 'b == 'static, the lifetime constraint is actually 'a <= 'b.
In many cases, we could accept a <= constraint for the first case as well. The '=' only needs to be removed if there is something that imposes an ordering on reclamation, like a finalizer. I believe "drop" semantics imposes that requirement as well. Conceptually, things in 'static cannot have drop semantics even if they carry the trait. It's surprisingly hard to run drop code after you exit...
On Wed, Jan 7, 2026 at 12:37 PM Jonathan S. Shapiro <jonathan....@gmail.com> wrote:The data structure you have declared and described here cannot be constructed in Rust. A &mut reference is an exclusive reference.
The borrow checker will not allow you to construct simultaneously usable & and &mut references, and if you use unsafe code to do it anyway, the program exhibits undefined behavior. Everything that follows from that cannot be attributed to any “type system hole”; the hole is that you’ve applied a sledgehammer to the wall and the building is falling down.
The correct way to have shared, mutable parts of your data is to put the shared, mutable parts in an interior-mutability type such as RefCell, Mutex, Cell, AtomicBool, etc, and then use & shared references or Rc to refer to the whole data structures as needed.
I'm not sure where the Cell wrapper would go here?The Cell goes on the boolean field.struct FileInfo {....marked: Cell<bool>,}
This may be an accurate analogy — again, I don’t have enough PL theory to comment — but note that in Rust a “binding” is a variable or perhaps its introduction — `let (x, y) = z` has two bindings — and both references and the things they borrow do not necessarily exist in bindings, but are simply values. What you describe sounds more like the theory of a language like Lisp or Java where values may be implicitly shared among many bindings.
Safe Rust cannot be responsible for undefined behavior. Unsafe Rust can be.
Note that I’m not saying “Safe Rust cannot cause undefined behavior” because safe code can call a safe function that contains unsafe code, and therefore be part of the causality of the undefined behavior. But if such a call results in undefined behavior, we consider this a bug in the unsafe code; we say that the particular safe function containing unsafe code is unsound.
Actually, I believe all lifetime constraints are <= constraints (at least, the only kind you can write explicitly are). This is not a problem for finalization, because drop ordering is defined completely independently of lifetimes.
In general, lifetime analysis never affects the behavior of the program, only whether compilation succeeds or fails. This is an intentional design constraint; it allows borrow checking to be clever, and for future versions of Rust to increase how many programs the borrow checker accepts by redesigning the borrow checking algorithm, without any of that influencing the correctness of the program (such as it might if, say, the order of two drop side-effects were swapped).
On Wed, Jan 7, 2026 at 2:14 PM Kevin Reid <kpr...@switchb.org> wrote:Safe Rust cannot be responsible for undefined behavior. Unsafe Rust can be.
That hasn't always matched my experience so far, but I did say at the beginning that I chose my first programs with some amount of malice. Regardless, deficiencies in the compiler (if any) shouldn't be confused with deficiencies in the language specification. Could have been my error, but if I find that I've reconstructed them I'll pass examples along.
From a PL perspective, I'd argue that unsafety is both transitive and sticky, and that (absent external proof) any code that depends on unsafe code is, from an end-to-end perspective, unsafe by definition.
It does, however, mean that the casual statement that the lifetime of the reference must be shorter than the thing referred to is not strictly correct. A better formulation would be "lifetime must be no longer than the thing referred." Not a big deal, except that phrasing was used often enough with me that it led to a substantially incorrect understanding of how things had to be working.