Approaches to the AppSlice Aliasing Soundness Issues

Leon Schuermann

unread,

May 10, 2021, 10:38:28 AM5/10/21

to Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, Philip Levis

Hello!

As part of the development of Tock 2.0, some unsoundness regarding the
handling of Upcalls (formerly Callbacks) was discovered and is supposed
to be fixed with the release of Tock 2.0. While the Upcall story seems
(for the near future) figured out[0], the AppSlice-related issues still
require a solution be implemented. In particular, we previously
permitted apps to allow overlapping memory regions, which are then
converted to mutable Rust slices in the Tock kernel. This is unsound as
per the Rust reference aliasing rules[1].

This document is supposed to be a summary and analysis of the different
options we have here, in particular if we were to continue supporting
overlapping memory regions to be shared with the kernel. It further
quickly introduces Tock and the context we're in, such that it could
also be checked by people more knowledgeable of Rust's internals outside
of the Tock ecosystem.

This document incorporates ideas from discussions with Hudson Ayers and
Amit Levy. Thanks!

======================================================================

# Using Application-Provided Buffers in the Tock OS Kernel

This document outlines potential approaches and issues with using
application-provided memory regions in the Tock OS kernel, with
specific emphasis on upholding all basic requirements for kernel code
to be sound w.r.t. the requirements of the Rust programming language.

## Environment Overview

Tock is an embedded operating system targeting low-power
microcontrollers, with a kernel written in Rust. It is designed to
only have a single thread of execution (i.e. run on a single CPU core
only, with no real concurrency in the kernel or applications).

Applications can be loaded dynamically and run independently of the
kernel, communicating and interacting with the kernel through a system
call interface. Applications can be preempted by the kernel, whereas
kernel code must cooperatively yield control back to an
application. Each application has dedicated RAM (RWX) and flash (RX)
sections, to which no other application has direct access.

As part of the system call interface, an application may provide a
kernel driver with arbitrary buffers fully contained in it's assigned
memory regions, to either provide or receive large amounts of data
from the driver.

## Current Situation

Currently, only a limited number of checks are performed on a shared
(_allowed_) buffer from a userspace application: if the buffer is
read-only by the kernel, it must be fully contained either within the
application's RAM or flash region. If the buffer is read-writable by
the kernel, it must be fully-contained in the application's RAM
region.

This can lead to potential issues in the Tock kernel, written in
Rust. For instance, when creating a mutable or immutable Rust slice
(`&mut [u8]`, `&[u8]`) from the application-provided pointer and
length information, this slice must not be stored in between context
switches to userspace. Rust assumes that in the case of a read-only
reference, the contents of the structure in question may not change
while this reference is in scope (except when using `UnsafeCell`, see
below) -- whereas a scheduled userspace application may change the
buffer contents at will. In the case of a mutable reference, Rust can
assume to have exclusive mutable access to the structure in
question. Again, a scheduled userspace application can violate this
assumption.

This specific problem does not affect the kernel as of today, as
userspace buffers are generally stored as an `AppSlice`
(`ReadOnlyAppSlice`, `ReadWriteAppSlice`) struct. This struct can be
used to acquire a slice (`&[u8]` or `&mut [u8]`) provided in a
closure, with an anonymous lifetime limited to the scope of the
closure. This guarantees that a slice created from a
userspace-provided buffer never escapes the closure, and thus such a
slice will never be in scope across context switches to userspace.

Furthermore, constructing an immutable slice from an `AppSlice`
requires an immutable borrow of said `AppSlice`. Constructing a
mutable slice from an `AppSlice` requires a mutable borrow. Thus any
given `AppSlice` can only either provide multiple immutable slices
pointing to the userspace memory regions, or a single mutable slice,
at any given time.

However, these invariants only hold when each `AppSlice` points to a
distinct memory location. As soon as two `AppSlice`s point to
overlapping memory regions, Rust's[1] / LLVM's[2] pointer aliasing rules
could be grossly violated, as either multiple mutable references, or a
single mutable and one or more immutable references could point to an
overlapping memory region. This is because the ownership / borrow rules
only affect a single `AppSlice`, but two overlapping userspace buffers
allowed to different drivers / sub-drivers create multiple `AppSlice`s.

## Strategies to Comply with Rusts / LLVM's Pointer Aliasing Rules

There seem to be two general strategies to work around the
aforementioned issues:

1. Perform runtime-checks to ensure that no two userspace buffers can
overlap.

This might have a significant performance & resource overhead, as
presumably a central data structure must keep information about all
buffers already shared by userspace. Furthermore, every `allow`
operation (sharing a new memory region) must walk over the entire
list to check for an overlapping region. In a sorted data structure,
it might be possible to perform a binary search instead.

2. Allowing userspace to share aliased buffers, and using mechanisms in
the kernel to comply with Rust's / LLVM's buffer aliasing rules.

Potential candidates for these mechanisms seem to be volatile memory
operations, or utilizing Rust's `UnsafeCell` to allow interior
mutability.

The first option is conceptually simple. However, due to the persistent
overhead in terms of memory usage and performance implications on every
`allow` system-call, which is coupled to the number of shared buffers,
the second strategy might be preferable.

To provide access to userspace buffers, there seem to be at least 3
options that might work:

- `core::ptr::{read,write}_volatile`
- `core::ptr::{read,write,copy}`
- `core::cell::UnsafeCell`

### Volatile Memory Accesses

Using volatile memory accesses to read from and write to raw pointers
seems to be the safest option to use. Using volatile means that the
compiler must assume the value behind the pointer changes at any time,
and as such cannot perform any optimizations around that. Volatile
memory accesses may however cause much more inefficient code to be
generated, in particular w.r.t. optimizations done by LLVM on the IR,
such as using a `memcpy` intrinsic. If possible, this potential
overhead should be avoided.

### Direct Pointer Accesses

With `core::ptr::{read,write,copy, ...}`, Rust provides functions to
access values behind raw pointers, without ever obtaining a proper
Rust reference. If the soundness of these methods is not affected by
aliasing of the memory regions in question (`core::ptr::copy` is even
explicitly designed to work on overlapping memory regions), they could
be a viable alternative to using volatile reads/writes. Presumably,
none of the Rust reference aliasing rules would be violated, given
that no references to the userspace buffers are constructed at any
time.

Compared to the current interface, both of these approaches would have
the significant disadvantage of not being able to obtain a Rust slice
from an `AppSlice`, that can be passed down to lower layers. Instead,
all buffer operations would need to go through a custom interface,
offering methods such as

- `AppSlice::copy_from_slice(&mut self, slice: &[u8])`
- `AppSlice::copy_to_slice(&mut self, slice: &mut [u8])`
- `AppSlice::copy_from_app_slice(&mut self, other: &AppSlice)`

It might be possible to implement the `Index` operator for convenience
and sub-slicing, but full compatibility with Rust slices can not be
achieved.

### Slice of `UnsafeCell`s / `Cell`s

It might be possible to transmute the provided userspace buffers to
slices of `core::cell::Cell`s, which internally use
`core::cell::UnsafeCell`. By design, `UnsafeCell`s allow having
interior mutability behind an immutable reference. This might be a
viable alternative, if the `core::ptr::{read,write,copy}` methods
cannot be used, or could even be used to hand out slices of
`&[Cell<u8>]` to other code.

The `UnsafeCell` examples[3] include an example of a single value
being wrapped in an `UnsafeCell`, which is then accessed (and mutated)
through two different immutable references. When transmuting a shared
buffer of bytes into a slice of `Cell`s of `u8`s, essentially the same
scenario is constructed. It is worth noting that an `UnsafeCell` can
not be used to soundly acquire a mutably aliased reference to the
contents, so the surrounding wrapper (i.e. `Cell`) must ensure that an
acquired mutable reference to the contents must be the only one in
scope.

However, the availability of the `UnsafeCell::into_inner(self)` and
`UnsafeCell::get_mut(&mut self)` methods strongly suggests that this
type is not supposed to be used (rather created) this way: the
soundness of `UnsafeCell` relies on the fact that immutable references
are taken to an instance of this type, which owns the contained
value. This way, by requiring a move or a mutable reference to `self`,
the two aforementioned methods can be safely implemented.

Converse to the example of the `UnsafeCell` documentation, the
references to the `UnsafeCell`s would not be obtained from an existing
`UnsafeCell` instance, but rather created and pointing to arbitrary
memory. This assumes that

1. the memory layout of the `UnsafeCell` is exactly the same as its
underlying type (given through `#[repr(transparent)]`)

2. the `UnsafeCell` can be safely used and it will uphold the required
guarantees without its constructor having been called

3. the `UnsafeCell` can be safely used without its destructor being
called

If these assumptions are valid, one could argue that an `UnsafeCell`
might be fit for the purposes described here, in particular as a
`&[Cell<u8>]` slice only allows retrieving immutable references to the
contained values, making `UnsafeCell::into_inner` and
`UnsafeCell::get_mut` inaccessible.

In general, I fear that using `UnsafeCell` would require a lot of
assumptions that are hard to check, and would have questionable
utility (as it would only allow us to get access to a `&[Cell<u8>]`,
vs. the current `&[u8]` type, and as such be incompatible).

### Next Steps

Having no mutable aliasing of buffers in Rust is a vital requirement
to ensure soundness of the kernel. Hence, this is an important issue
to solve.

I think it's worth exploring both the runtime checks (using
experiments to get an estimate of the introduced overhead) as well as
methods to access these unpredictable userspace memory regions in a
sound way from within the kernel.

I hope that presenting these different options can get the discussion
started. In particular, my knowledge about the precise inner workings
of Rust is rather limited, such as the interactions with LLVM and how
specific structs such as `UnsafeCell` interact with the compiler. I
can imagine this can be used to derive a few interesting questions for
people more familiar with these areas.

======================================================================

I'd love to hear about feedback regarding these issues/approaches. Maybe
there are other ways to access userspace memory regions we didn't think
about, or one of these options doesn't work for some reason. If we can
refine this document, we could also think about posting it to some other
Rust-related channel and try to get feedback on what the best
option/tradeoff is.

Thanks!

- Leon

[0]: https://github.com/tock/tock/pull/2462
[1]: https://github.com/rust-lang/reference/blob/8db4edd7eab7931859713ac8016ba123ce42b061/src/behavior-considered-undefined.md
[2]: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules
[3]: https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/library/core/src/cell.rs#L1741

Johnathan Van Why

unread,

May 10, 2021, 5:03:38 PM5/10/21

to Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Miguel Young de la Sota

+ Miguel Young de la Sota

Nitpick: I believe RAM is RW but not executable.

When the "unexpected" change to memory comes from the current thread of execution, core::ptr::{read,write}_volatile and core::ptr::{read_write} are equivalent. They're only different when memory is modified by something external to the current thread of execution, such as when hardware changes the value directly. Therefore I don't think we need to use volatile accesses.

### Direct Pointer Accesses

With `core::ptr::{read,write,copy, ...}`, Rust provides functions to
access values behind raw pointers, without ever obtaining a proper
Rust reference. If the soundness of these methods is not affected by
aliasing of the memory regions in question (`core::ptr::copy` is even
explicitly designed to work on overlapping memory regions), they could
be a viable alternative to using volatile reads/writes. Presumably,
none of the Rust reference aliasing rules would be violated, given
that no references to the userspace buffers are constructed at any
time.

I can confirm the last sentence is correct. There is a bit of sublety around "allocations" (for two raw pointers to be equal they need to be from the same allocation, or the allocations must be sequential), but semantically the process memory is all one big allocation so that doesn't cause any issues for the kernel.

Compared to the current interface, both of these approaches would have
the significant disadvantage of not being able to obtain a Rust slice
from an `AppSlice`, that can be passed down to lower layers. Instead,
all buffer operations would need to go through a custom interface,
offering methods such as

- `AppSlice::copy_from_slice(&mut self, slice: &[u8])`
- `AppSlice::copy_to_slice(&mut self, slice: &mut [u8])`
- `AppSlice::copy_from_app_slice(&mut self, other: &AppSlice)`

It might be possible to implement the `Index` operator for convenience
and sub-slicing, but full compatibility with Rust slices can not be
achieved.

You should be able to implement Index<Output = Cell<u8>>, although you'd have to be careful to make sure those references don't overlap with other operations (e.g. by making other methods take &mut self).

I believe these are valid, although the app_memory slice (I think that's what it is called) would have to be a [Cell<u8>]. I agree this is somewhat tricky and therefore scary.

When we have a more specific design picked out, I can get some more experienced Rust devs to take a look at it.

Vadim Sukhomlinov

unread,

May 10, 2021, 7:31:22 PM5/10/21

to Johnathan Van Why, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Miguel Young de la Sota

Overlapping mutable references are quite useful for crypto - say output can be in any location, but the same location as one of the sources can be used to save memory assuming correct implementation where source data is read before destination is modified.

In Rust asm it is specified 'inlateout', though used in different contexts. Above use is common in C code, but Rust explicitly prohibits it (can't borrow shared reference while borrowing it as mutable). This requires additional Rust wrappers around C code to handle this case, where only one mutable reference exists, but passed as a pointer twice to C code. But I'm not sure if it is a good idea to support it in Tock.

I'm not a big fan of volatile memory access as it inhibits many optimizations while doesn't really buy a lot as if that overlap happens between capsules which cooperatively update that location (say capsule A calls pass that buffer to capsule B, while capsule B already does something with overlapping part). It's probably rare case, and much simpler case in C was solved by proper choice of memmove vs. memcpy where the first one implemented run-time check only on data processed which was quite inexpensive. Similar idea can be implemented in Tock - an API to check wherever the provided buffer overlaps with anything other capsules currently allowed and expected to be used in cases where capsules interact.

If capsules do not interact this problem may occur at driver level in theory, but still not sure if volatile would be a solution - you either explicitly know that this overlap is ok due to internal logic or it's not ok, so have to handle it properly. Volatile will affect results of computations, possibly make it more predictable, but logical correctness is up to the developer.

I'm more in favor of simple run-time checks not on all allowed buffers, but on those currently in use - probably as part of .take(). Since most capsules are called sequentially in kernel loop, this check will most likely be trivial. However, I'm not sure what behavior of .take() should be when there is desirable overlap. May be take_overlap_ok() is good enough solution, and .take() should return None if overlap is detected for the app - this will probably result in invalid command invocation and fail gracefully without panic. Still need to maintain a list of taken slices (can we standardize it at capsule level?) and a mechanism to drop it, but it can be part of AppSlice logic.

There are more exotic cases where the buffer contains pointers used by capsules to minimize the number of allow() calls - say in some cases I need to allow 5 times in a row for a single command. Such cases are hard to check unless the capsule behaves politely to others and sets these buffers as AppSlice.

With unsafe allowed for kernel I'd not rely too much on soundness achieved by compiler. Say, I have max 5 allow buffers for syscall, but in some cases these are &[u8] and in some cases - &[u32] - I declare all slices as &[u8] and just translate u8 to u32 slices checking alignment and length. But this alone can result in overlapping references due to different types. Only dynamic checks will actually detect it and it's up to the developer to decide how to handle it.

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CAJqTQ1hv7y0i3WbVF0jLsT5A0VNmuTzkM6S8zCzbztE_7mHiFQ%40mail.gmail.com.

Brad Campbell

unread,

May 11, 2021, 11:59:17 AM5/11/21

to Vadim Sukhomlinov, Johnathan Van Why, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Miguel Young de la Sota

It seems there are two key questions to be answered:

Is having overlapping `allow`ed buffers a desirable feature, regardless of any rust safety considerations?
What does "significant" mean in terms of what level of per-allow-call overhead would be acceptable?

I don't have a use case for #1 in mind. I would think that any capsule that wants to do something complicated with memory could have a process allow a (large) RW appslice and manage it internally.

As for #2, is this a case of we'll know it when we see it? Also, can an optimized process and `Driver` implementation just only `allow` buffers once? I think this is related to my question about TRD104.

- Brad

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CADEg7HnDFPu8_osovtCgmi%3DBR9BGcRCro%2BJS1anzns%2BeFQgjcw%40mail.gmail.com.

Leon Schuermann

unread,

May 11, 2021, 12:02:29 PM5/11/21

to Johnathan Van Why, Miguel Young de la Sota, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Vadim Sukhomlinov

"'Johnathan Van Why' via Tock Embedded OS Development Discussion"

<tock...@googlegroups.com> writes:
> On Mon, May 10, 2021 at 7:38 AM Leon Schuermann <le...@is.currently.online>
> wrote:
>> ### Volatile Memory Accesses
>>
>> Using volatile memory accesses to read from and write to raw pointers
>> seems to be the safest option to use. Using volatile means that the
>> compiler must assume the value behind the pointer changes at any time,
>> and as such cannot perform any optimizations around that. Volatile
>> memory accesses may however cause much more inefficient code to be
>> generated, in particular w.r.t. optimizations done by LLVM on the IR,
>> such as using a `memcpy` intrinsic. If possible, this potential
>> overhead should be avoided.
>>
>
> When the "unexpected" change to memory comes from the current thread of
> execution, core::ptr::{read,write}_volatile and core::ptr::{read_write} are
> equivalent. They're only different when memory is modified by something
> external to the current thread of execution, such as when hardware changes
> the value directly. Therefore I don't think we need to use volatile
> accesses.

Great! This matches my understanding of the situation, it's good to have
confirmation.

Miguel Young de la Sota <mcy...@google.com> writes:
> (Also, volatile is absolutely wrong here. Using volatile to cheat the
> compiler is how you wind up with miscompilations. You might as well cheat
> at poker with the Devil.)

I agree that volatile is probably not warranted here, rather I put it in
the document simply as I was reasonably confident that using purely &
exclusively volatile memory accesses to a particular region of memory
would _for sure_ not violate any of Rust's reference aliasing rules, if
only for the fact that there are no references involved.

Of course, once you have a single mutable or immutable Rust slice
pointing to this memory in scope, even with volatile accesses,
manipulation through raw pointers is out of the question. The purpose of
this discussion is to make sure we're not trying to outsmart the
compiler :). Hence, when talking about the `core::ptr::{read, write,
copy, ...}` functions, I would _never_ also create a proper Rust slice
over such a buffer.

"'Johnathan Van Why' via Tock Embedded OS Development Discussion"

<tock...@googlegroups.com> writes:
>> ### Direct Pointer Accesses
>>
>> With `core::ptr::{read,write,copy, ...}`, Rust provides functions to
>> access values behind raw pointers, without ever obtaining a proper
>> Rust reference. If the soundness of these methods is not affected by
>> aliasing of the memory regions in question (`core::ptr::copy` is even
>> explicitly designed to work on overlapping memory regions), they could
>> be a viable alternative to using volatile reads/writes. Presumably,
>> none of the Rust reference aliasing rules would be violated, given
>> that no references to the userspace buffers are constructed at any
>> time.
>>
>
> I can confirm the last sentence is correct. There is a bit of sublety
> around "allocations" (for two raw pointers to be equal they need to be from
> the same allocation, or the allocations must be sequential), but
> semantically the process memory is all one big allocation so that doesn't
> cause any issues for the kernel.

I've read through the documentation on the `ptr` module,
primitive pointer type, and `ptr::{read, write, ...}`. There were some
specifics regarding allocations (a pointer is valid only if it is
dereferenceable, i.e. the memory range of the given type's size starting at
the pointer must be within a single allocated object). That definition seems
both fuzzy considering we're operating on memory not managed by Rust or an
allocator of the current execution context, as well as unproblematic given
we're always guaranteed to access one or multiple `u8` and never risk to run
over the bounds of a single "u8-allocation".

Does that make sense, and is this the subtlety you're referring to?

>> Compared to the current interface, both of these approaches would have
>> the significant disadvantage of not being able to obtain a Rust slice
>> from an `AppSlice`, that can be passed down to lower layers. Instead,
>> all buffer operations would need to go through a custom interface,
>> offering methods such as
>>
>> - `AppSlice::copy_from_slice(&mut self, slice: &[u8])`
>> - `AppSlice::copy_to_slice(&mut self, slice: &mut [u8])`
>> - `AppSlice::copy_from_app_slice(&mut self, other: &AppSlice)`
>>
>> It might be possible to implement the `Index` operator for convenience
>> and sub-slicing, but full compatibility with Rust slices can not be
>> achieved.
>>
>
> You should be able to implement Index<Output = Cell<u8>>, although you'd
> have to be careful to make sure those references don't overlap with other
> operations (e.g. by making other methods take &mut self).

Absolutely. Although I would be slightly afraid to hand out a proper
&Cell<u8> when we're using `core::ptr` methods for other parts of the
code (as then other mutations would not necessarily go through a
Cell). Having our own wrapper reading from/writing to a pointer sounds
easier to check for correctness, as no Rust reference to the userspace
region (even through a Cell) is created.

Having custom types created and returned (moved) out of the Index trait
is unfortunately not possible, given it returns a only reference.
Nonetheless, I've created a rough sketch of how I would imagine indexing
and subslicing could look like, even if we can't use the `[...]` syntax
for that [0].

Miguel Young de la Sota <mcy...@google.com> writes:
> Though all this may be unnecessary. Cell is transparent so we can just use
> a &Cell<[u8]> and use as_slice_of_cells as needed. All we need is to define
> a function for copying from &[u8] into a &[Cell<u8>], which is probably
> worth proposing to as an API in std proper.

How could I have missed that method? This does sound like a really good
option for this use case. I'm still a little afraid when it comes to
transmuting a userspace buffer into a &Cell<u8> for the reasons outlined
in the initial post, but looking at the implementation of
Cell::as_slice_of_cells, casting into a Cell<T> doesn't really seem to
be an issue for the standard library either...

This is some great stuff already. I do think this topic is sufficiently
complex that a synchronous update might help to clarify a few
things. I'll be happy to send out a poll for a slot to everyone already
on this thread, and others who'd like to join and send me a quick
message.

- Leon

[0]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f8447940eaeea6364fc7ee2b2137e7ea

Hudson Randal Ayers

unread,

May 11, 2021, 12:09:32 PM5/11/21

to Miguel Young de la Sota, Leon Schuermann, Johnathan Van Why, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Vadim Sukhomlinov

> Cell<T> is transparent, so the transmute should be fine (as long as the caller is aware the kernel is entitled to do this and also only uses a Cell<T>; given the user process is subordinate to the kernel I don't see an issue here).

What if the user process is written in C?

From: Miguel Young de la Sota <mcy...@google.com>
Sent: Tuesday, May 11, 2021 9:07 AM
To: Leon Schuermann <le...@is.currently.online>
Cc: Johnathan Van Why <jrva...@google.com>; Hudson Randal Ayers <hay...@stanford.edu>; Amit Levy <aal...@cs.princeton.edu>; Tock Embedded OS Development Discussion <tock...@googlegroups.com>; Philip Levis <p...@cs.stanford.edu>; Vadim Sukhomlinov <sukho...@google.com>
Subject: Re: [tock-dev] Re: Approaches to the AppSlice Aliasing Soundness Issues

> This does sound like a really good
option for this use case. I'm still a little afraid when it comes to
transmuting a userspace buffer into a &Cell<u8>

Cell<T> is transparent, so the transmute should be fine (as long as the caller is aware the kernel is entitled to do this and also only uses a Cell<T>; given the user process is subordinate to the kernel I don't see an issue here).

> There were some specifics regarding allocations

I think the rules around this are fuzzy because kernel programming tends to leave the bounds of what a language can define. In C, (contrary to popular belief, apparently) materializing pointers to the aether (e.g. to a userspace region) is always UB, but kernels do it anyway. My understanding is that Rust doesn't particularly want this to be UB, given how patently unuseful such a declaration is for an optimizer.

Amit Levy

unread,

May 11, 2021, 12:21:20 PM5/11/21

to Miguel Young de la Sota, Tock Embedded OS Development Discussion

On 5/11/21 9:09 AM, Hudson Randal Ayers wrote:

> Cell<T> is transparent, so the transmute should be fine (as long as the caller is aware the kernel is entitled to do this and also only uses a Cell<T>; given the user process is subordinate to the kernel I don't see an issue here).

What if the user process is written in C?

C doesn't have restrictions on aliasing, so I don't think this would be
a problem. In other words a C array of `uint8_t`s has a similar semantic
meaning to a Rust slice of `Cell<u8>`

Jett ✈ Rink

unread,

May 11, 2021, 1:39:02 PM5/11/21

to Amit Levy, Miguel Young de la Sota, Tock Embedded OS Development Discussion

I have only seen a few different Tock Apps at this point, but I haven't seen the use case for overlapping allow buffers yet. I also see very few logical allow buffers (i.e. less than 5) per App. I think a run time check for overlapping buffers would be pretty cheap and gives us the most easy-to-work-with data structures in the kernel yet still maintains rust safety.

-Jett

--
You received this message because you are subscribed to a topic in the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tock-dev/cY0-eKc6aos/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tock-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/05fb1745-4dbc-8c7c-9d97-c62399112a90%40amitlevy.com.

Miguel Young de la Sota

unread,

May 11, 2021, 2:21:33 PM5/11/21

to Amit Levy, Tock Embedded OS Development Discussion

That's a good question. I suspect the answer is "it doesn't matter" because C's semantics for T* and const T* seem to be analogous to &UnsafeCell<T>.

Fundamentally, I think the question to ask is this: what compiler misoptimizations are we worried about? The most that can happen from alias analysis is "seemingly unnecessary" reads and writes being deleted. If that's a concern, we just need Rust to know that the memory is somehow "shared" even though it can mutate it (UnsafeCell is the thing that does this for us), and in C, we just don't care because C assumes everything aliases outside of some cases that don't matter here.

If what we care about is a write in the kernel becoming visible to the process in a single-core environment, you're gonna race no matter what, and volatile won't save you (though it sounds like this isn't the problem at hand).

(To be clear, I have little context on the problem being solved so I'm trying my best to apply the correct flavor of Rust/C lawyering. =P)

Miguel

Leon Schuermann

unread,

May 11, 2021, 2:33:32 PM5/11/21

to Brad Campbell, Vadim Sukhomlinov, Johnathan Van Why, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Miguel Young de la Sota

Brad Campbell <bra...@gmail.com> writes:
> It seems there are two key questions to be answered:
>

> 1. Is having overlapping `allow`ed buffers a desirable feature,

> regardless of any rust safety considerations?

I'd argue that overlapping `allow`ed buffers might be a very desirable
feature, and I do think that Vadim's crypto usecase is quite convincing,
along with implementations of network protocols (say, routing a packet
from 6LoWPAN to Ethernet). Most likely, the arbitration of when a crypto
operation is finished / a packet is received or transmitted is not
coupled to allow operations themselves, and so applications could take
advantage of the fact that -- chosen wisely -- allows to certain
userspace buffers pointing to the correct frame boundaries in a buffer
would neither require changing the allowed buffers over time, nor a copy
in userspace.

This does however collide with the requirement that buffers must not be
modified while shared with the kernel (related to your question on
TRD104). At the risk of derailing this discussion: at the time that
seemed like a good requirement to me. However, given we can't really
reasonable enforce it given the MPU's granularity, it might be worth
questioning that restriction if it makes things easier here?

> 2. What does "significant" mean in terms of what level of per-allow-call
> overhead would be acceptable?

That's a good question, which I suppose will have to involve
benchmarking the actual cost. I'd like to make the following observation
nonetheless: If we don't support overlapping buffers, and don't support
modifying allowed buffers, applications will have to make `allow` system
calls much more frequently. This might emphasize any per-allow cost.

- Leon

Leon Schuermann

unread,

May 11, 2021, 2:42:11 PM5/11/21

to Miguel Young de la Sota, Tock Embedded OS Development Discussion

"'Miguel Young de la Sota' via Tock Embedded OS Development Discussion"

<tock...@googlegroups.com> writes:
> That's a good question. I suspect the answer is "it doesn't matter" because
> C's semantics for T* and const T* seem to be analogous to &UnsafeCell<T>.
>
> Fundamentally, I think the question to ask is this: what compiler
> misoptimizations are we worried about? The most that can happen from alias
> analysis is "seemingly unnecessary" reads and writes being deleted. If
> that's a concern, we just need Rust to know that the memory is somehow
> "shared" even though it can mutate it (UnsafeCell is the thing that does
> this for us), and in C, we just don't care because C assumes everything
> aliases outside of some cases that don't matter here.

I think this describes the problem at hand. To be clear (you might've
already answered that question). If we were to only use `ptr::{read,
write, copy}`, this issue would be taken care of as well, right? I
suppose using these methods on raw pointers will have similar semantics
compared to raw pointer accesses in C?

> If what we care about is a write in the kernel becoming visible to the
> process in a single-core environment, you're gonna race no matter what, and
> volatile won't save you (though it sounds like this isn't the problem at
> hand).

For this, a compiler fence[0] should help, right? Specifically, for a
single-core environment.

- Leon

[0]: https://doc.rust-lang.org/core/sync/atomic/fn.compiler_fence.html

Miguel Young de la Sota

unread,

May 11, 2021, 3:14:01 PM5/11/21

to Leon Schuermann, Tock Embedded OS Development Discussion

I think that what you describe about pointers is... probably correct? I think Rust believes all raw pointers alias unless told otherwise (e.g. ptr::copy is a memmove; memcpy has a more scary name because of aliasing restrictions). But I think a slice of cells gets the same point across to rustc without have as many sharp corners.

Also, I posted https://internals.rust-lang.org/t/memcpy-ing-into-a-slice-of-cells/14682 as a result of this discussion, since it seems like an all-around useful API.

(Also, compiler fences are mostly about ordering of reads and writes locally. I'd need a specific example to know whether this is what you actually want.)

Miguel

Philip Levis

unread,

May 11, 2021, 3:18:16 PM5/11/21

to Brad Campbell, Vadim Sukhomlinov, Johnathan Van Why, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

On May 11, 2021, at 8:59 AM, Brad Campbell <bra...@gmail.com> wrote:

It seems there are two key questions to be answered:
Is having overlapping `allow`ed buffers a desirable feature, regardless of any rust safety considerations?
What does "significant" mean in terms of what level of per-allow-call overhead would be acceptable?
I don't have a use case for #1 in mind. I would think that any capsule that wants to do something complicated with memory could have a process allow a (large) RW appslice and manage it internally.

I think Brad has a good formulation here. I think there are three basic cost to consider:

1) Per-allow costs

2) Per-access costs to an allowed buffer (may be zero)

3) Code complexity of accessing AppSlices within a capsule

The first two are quantitative, the third is qualitative.

I chatted with Hudson yesterday, and am very interested in exploring how far down we can push 1. I think there are a lot of similarities with boundary/occlusion/interior checks in graphics, for which people have come up with highly, highly optimized approaches. We might be able to apply those techniques here and make it fast.

That being said, I think this discussion of the types and soundness is excellent: I’m currently interested in option 1 (a table and runtime checks), but the best way for us to make the decision between option 1 and option 2 (types) is to push each of them as far as we can.

Phil

———————
Philip Levis (he/him)
Associate Professor, Computer Science and Electrical Engineering

Faculty Director, lab64 Maker Space
Stanford University
http://csl.stanford.edu/~pal

Brad Campbell

unread,

May 11, 2021, 3:48:03 PM5/11/21

to Philip Levis, Vadim Sukhomlinov, Johnathan Van Why, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

For consideration, I turned on syscall tracing with the hail test app. The hail app is interesting because it uses 12 of the 14 grant regions configured for the hail board as it samples everything on the board periodically. I also upped the debug buffer to avoid it overflowing when trying to trace syscalls. Here is about the first 5 seconds of the log filtered for just the allows:

0] read-only allow(0x1, 1, @0x2000acf0, 0x11) = Allow
0] read-only allow(0x1, 1, @0x2000ad08, 0x1c) = Allow
0] read-only allow(0x1, 1, @0x2000acf0, 0x20) = Allow
0] read-only allow(0x1, 1, @0x2000acf0, 0x1c) = Allow
0] read-write allow(0x80004, 0, @0x20009581, 0x180) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x10) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x5) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0xd) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x6) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0xd) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x5) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x8) = Allow
0] read-only allow(0x80004, 0, @0x2000a001, 0x11) = Allow
0] read-write allow
0] read-only allow(0x1, 1, @0x2000ace8, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000acf0, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000acf8, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ace0, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x14) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x14) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x14) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow
0] read-write allow(0x40001, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x40002, 0, @0x200087d0, 0x5) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x16) = Allow
0] read-only allow(0x1, 1, @0x2000ad30, 0x25) = Allow
0] read-only allow(0x1, 1, @0x2000ad38, 0x1b) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x13) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x15) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x18) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad40, 0x12) = Allow
0] read-only allow(0x1, 1, @0x2000ad28, 0x29) = Allow
0] read-only allow(0x1, 1, @0x2000ad18, 0x3e) = Allow
0] read-only allow(0x1, 1, @0x2000ad50, 0x1) = Allow

There is some initial setup, but the recurring allows come from RNG (0x40001), CRC (0x40002), and console (0x1).

Console is an interesting case, as we might not care about its performance in a real application, because where would the console output go? Also, many of the consecutive allows are for the same buffer, which we could skip the kernel check in those cases.

Here is the output for the ADC test app:

[0] read-only allow(0x1, 1, @0x20007078, 0x10) = AllowReadOnlySuccess(0x0, 0)
[0] read-only allow(0x1, 1, @0x20007090, 0x22) = AllowReadOnlySuccess(0x20007078, 16)
[0] read-only allow(0x1, 1, @0x20007098, 0x1) = AllowReadOnlySuccess(0x20007090, 34)
[0] read-only allow(0x1, 1, @0x20007080, 0x1b) = AllowReadOnlySuccess(0x20007098, 1)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007080, 27)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007078, 0x24) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007098, 0x1) = AllowReadOnlySuccess(0x20007078, 36)
[0] read-only allow(0x1, 1, @0x20007080, 0x1d) = AllowReadOnlySuccess(0x20007098, 1)
[0] read-only allow(0x1, 1, @0x20007088, 0x18) = AllowReadOnlySuccess(0x20007080, 29)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x0, 0)
[0] read-only allow(0x1, 1, @0x200070c0, 0x65) = AllowReadOnlySuccess(0x20007088, 24)
[0] read-only allow(0x1, 1, @0x200070f0, 0x19) = AllowReadOnlySuccess(0x200070c0, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 25)
[0] read-only allow(0x1, 1, @0x200070f0, 0x19) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 25)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1a) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 26)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1a) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 26)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x20007108, 0x1) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x20007108, 1)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x20007108, 0x1) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1d) = AllowReadOnlySuccess(0x20007108, 1)
[0] read-only allow(0x1, 1, @0x200070f8, 0x18) = AllowReadOnlySuccess(0x200070f0, 29)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f8, 24)
[0] read-only allow(0x1, 1, @0x200070f0, 0x19) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 25)
[0] read-only allow(0x1, 1, @0x200070f0, 0x19) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 25)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1a) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 26)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1a) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 26)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1c) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)
[0] read-only allow(0x1, 1, @0x200070a8, 0x65) = AllowReadOnlySuccess(0x200070f0, 28)
[0] read-only allow(0x1, 1, @0x20007108, 0x1) = AllowReadOnlySuccess(0x200070a8, 101)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1b) = AllowReadOnlySuccess(0x20007108, 1)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070f0, 27)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070e8, 0x24) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x20007108, 0x1) = AllowReadOnlySuccess(0x200070e8, 36)
[0] read-only allow(0x1, 1, @0x200070f0, 0x1d) = AllowReadOnlySuccess(0x20007108, 1)
[0] read-only allow(0x1, 1, @0x200070f8, 0x18) = AllowReadOnlySuccess(0x200070f0, 29)
[0] read-write allow(0x5, 0, @0x200067a0, 0x20) = AllowReadWriteSuccess(0x200067a0, 32)

The ADC libtock-c driver (0x5) does re-allow each time, but it's always the same buffer and therefore (I think) wouldn't need a check after the first time.

- Brad

Vadim Sukhomlinov

unread,

May 11, 2021, 4:15:50 PM5/11/21

to Brad Campbell, Philip Levis, Johnathan Van Why, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

All apps are different though. As for same buffer used each time - what if this buffer would be allowed to another driver in the middle? I think check for overlaps should be at AppSlice::take() time, not on allow. And that we may need to have option where overlap with slices for same driver are fine, but overlaps with others aren't and that all overlaps are ok - "trust me, I'm an engineer" approach :) May be also an API to take and indicate that it's overlaps. I'm not sure what Cell approach really brings beyond complexity in source code. If to drop Rust claims for compiler it will be just a memory access. I'm struggling to find an example where it would be different. I'd just accept possibility of unsoundness, document it and use for benefits when possible. Beyond run-time check nothing will work since you can circumvent everything with unsafe code.

Run-time checks at take() time are pretty cheap - in many cases it will be just a single driver invoked by app so no actual checks will take place.

Johnathan Van Why

unread,

May 11, 2021, 4:24:12 PM5/11/21

to Vadim Sukhomlinov, Brad Campbell, Philip Levis, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

I am vehemently opposed to accepting unsoundness.

Miguel Young de la Sota

unread,

May 11, 2021, 4:28:28 PM5/11/21

to Johnathan Van Why, Vadim Sukhomlinov, Brad Campbell, Philip Levis, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion

> I'd just accept possibility of unsoundness, document it and use for benefits when possible.

As Johnathan says, this is unacceptable for a number of reasons, including Tock's security posture. Opening yourself up to miscompilation means that any pain and suffering gone through to use Rust is lost, and we may well have written C. And as anyone who has been on the receiving end of one of my code reviews knows, not even C is that easy. =)

Vadim Sukhomlinov

unread,

May 11, 2021, 4:43:45 PM5/11/21

to Miguel Young de la Sota, Johnathan Van Why, Brad Campbell, Philip Levis, Leon Schuermann, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion

If you treat it this way - fine, but the price would be efficiency which for embedded world is important aspect. If problem is narrowed to user app modifying buffer after allow() before upcall related to that command - why not to think about syscalls as multithreaded interaction? Kernel is just like another thread, so you need something like Arc with mutex. Implement Sync and Send properly and only allow this kind of types for allow() call. Will be enforced by libtock-rs.

Amit Levy

unread,

May 11, 2021, 4:51:48 PM5/11/21

to tock...@googlegroups.com

> As for same buffer used each time - what if this buffer would be allowed to another driver in the middle?

I don't think this is a problem if we use the `&[Cell<u8>]` version. It's perfectly fine (type-safety-wise) for multiple capsules to have a handle on overlapping process memory of that type.

> I'd just accept possibility of unsoundness, document it and use for benefits when possible.

As Johnathan says, this is unacceptable for a number of reasons, including Tock's security posture. Opening yourself up to miscompilation means that any pain and suffering gone through to use Rust is lost, and we may well have written C. And as anyone who has been on the receiving end of one of my code reviews knows, not even C is that easy. =)

Not to pile on, but unsoundness in the kernel at the system-call boundary is a non-starter (I mean, we have that now with allow, but we have to fix it).

In particular, a process cannot be allowed to interact with the kernel in such a way that could result in unsafe behavior, even if achieving unsafe behavior would require coordinating with a capsule.

Said another way, once the kernel crate has created Rust structs based on system call arguments, it must be the case that these constructs do not violate Rust guarantees and, thus, cannot invoke undefined behavior. To the extent this is not currently enforced, it is a bug and should be fixed.

Aside: the same restrictions do not apply to processes. Whatever kernel interface is provided should allow a processes to use internally safe language constructs, but that is not as much of a requirement---e.g., there might need to be a library layer within a process that enforces additional requirements on the kernel system call boundary to ensure process safety. (I don't anticipate this being an issue in this case, though)

Philip Levis

unread,

May 11, 2021, 5:09:12 PM5/11/21

to Amit Levy, tock...@googlegroups.com

I’d like to raise one edge case I encounter on H1B, with the DCRYPTO engine.

https://github.com/google/tock-on-titan/blob/master/kernel/h1/src/crypto/dcrypto.rs

DCRYPTO is a big number crypto accelerator. It has its own assembly language, etc. It has two memory regions, instructions and data. These are memory mapped in the peripheral address space. Data and instructions in DCRYPTO are both 32 bits.

The edge case that came up is if you want userspace to be able to program DCRYPTO. There are security reasons why you might not want this (particularly, you have a whole separate processor), but suppose you do. The interface to do so is through allows (one data, one instructions), and commands that tell the driver to copy from the allowed buffer into the memory mapped regions for instructions and data.

The issue is that the memory mapped regions are word aligned, but allowed buffers might not be. So you might need to copy from a misaligned buffer into an aligned buffer. This a standard performance edge case for memcpy or memmove. The long and the short of it is that you want copies to be done with word operations (or even better, pairs of ldmia/stmia instructions).

So my question is, if we have a &[Cell<u8>], how would we write code to perform fast copies to/from this buffer? I am wary of “trust the compiler."

Phil

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/865ebe55-9001-4d53-7c0e-67146a7ad240%40amitlevy.com.

Amit Levy

unread,

May 11, 2021, 5:17:50 PM5/11/21

to tock...@googlegroups.com

Another aside on:

Beyond run-time check nothing will work since you can circumvent everything with unsafe code.

I think it's worth re-iterating Tock's trust model here, so we're all on the same page.

## Who shouldn't be trusted for memory safety:

Neither processes nor capsules are trusted for memory safety (capsules are trusted for liveness, e.g. they can block system progress with a `loop {}`).

Processes are "easy" as memory safety is enforced in a similar way as is enforced in most operating systems: through dynamic hardware enforcement of memory accesses (of course most OSs use virtual memory while Tock primarily relies on memory protection of a single address space, but the spirit is the same).

Restricting capsule's use of memory relies on the Rust type-system. Broadly, as long as the Rust type systems semantics are preserved, capsules can only access memory/values they are explicitly granted access to (via references, local variables, etc), through narrow interfaces (e.g. traits or other types). Importantly, capsules are restricted, by the compiler, from using the `unsafe` keyword. In upstream capsules this is enforced with the `#![forbid(unsafe)]` directive at the top of `capsules/src/lib.rs`.

## What does this mean for construct safe abstractions within the kernel?

"You can circumvent everything with unsafe code" is, of course, true, except we restrict who can use unsafe code. It is, therefore, both safety critical, and performance beneficial, to ensure that abstractions provided to capsules by the core kernel adhere to Rust's safety semantics---for example, that they don't exercise undefined behavior.

It's critical for safety because even hand-coded runtime checks might be incorrectly elided in the presence of undefined behavior. This isn't specific to Rust, and is an endemic problem in C ([1] is just one of many examples). If there is undefined behavior, all bets are off, especially as compilers become more aggressive (which is of course important for performance).

It's beneficial to performance because dynamic checks are expensive. System calls in Tock are already fairly expensive because the security model, and the fact that process are not necessarily type-safe, requires us to perform many dynamic checks (undoubtedly there is room for optimizations, but at least some large portion of the overhead is fundamental).

The more we can enforce statically, without relying on runtime checks, the better. Certainly in some cases there is an argument for defense in depth (so static enforcement shouldn't necessarily preclude dynamic checks).

In any case, this rant is an aside to the discussion of the options for preserving safety in this case, and should not be taken as an argument in favor of one of the options Leon laid out vs another.

[1]: https://www.usenix.org/system/files/conference/osdi12/osdi12-final-88.pdf

Miguel Young de la Sota

unread,

May 11, 2021, 5:28:41 PM5/11/21

to Amit Levy, Tock Embedded OS Development Discussion

> So my question is, if we have a &[Cell<u8>], how would we write code to perform fast copies to/from this buffer? I am wary of “trust the compiler."

Memcpy is *guaranteed* to work fine, because Cell<T> is *guaranteed* to be layout-compatible with T. This is a promise the compiler makes. If you want it to be aligned, you should use `Cell<u32>` or whatever and check that userspace has given you an appropriately-aligned buffer before creating the slice[1]. Rust does not believe there is any difference between &u8 and &u32 except that the latter points to four bytes and is well-aligned.

[1]: Alternatively, you can implement Opentitan's pseudoaligned memcpy: https://github.com/lowRISC/opentitan/blob/master/sw/device/lib/base/mmio.c#L34

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/f12d1a37-6bfe-0966-261f-a9a6eae55fdf%40amitlevy.com.

Vadim Sukhomlinov

unread,

May 11, 2021, 6:21:22 PM5/11/21

to Philip Levis, Amit Levy, Tock Embedded OS Development Discussion

Phil, I think that case with DCRYPTO is due to poor choice of abstraction primitives - working with similar engine we exposed only high-level operations to user mode - like ECDSA P256 sign instead of raw access to device - reason is that in order to meet certification requirements crypto implementation should be hardened against side channels, fault injections, etc on one side, and proper key management, hygienic rules of using crypto enforced on other side - so we opted for having crypto service provider as a capsule. Still, you some operations require many memory parameters for input and output and sometimes with result overlapping source parameters to save space. Say, encrypt where output overwrites input - this is perfectly fine as you read block, process, then write.

How &[Cell<u8>] will help? How to cast it from &[Cell<u32>]? User space can still modify content after allow() call. It can't do that at the same time as kernel on single-core platform, but it can do it after kernel code read value to make a decision, some other driver upcall process, and process modify region and return back to kernel. This is a different problem than aliasing of buffers among allows() which can also be reliably solved only at run-time.

You need a special type which would block access when it's allowed() and enable it on upcall - that would be run-time in libtock-rs and will require a flag serving as mutex locked on allow and unlocked on subscribe. This is rather tricky to implement. I think syscalls are much closer to multithreading and the problem is that access to shared regions is not properly guarded. Rust has great means to handle it, we just need to make it usable for the embedded world. Think of allow & command syscalls as thread::spawn() and upcall as thread::join. Make sure whatever passed to 'thread' is guarded against improper use -> properly implement Send trait for some variant of Cell with guard, etc.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/5369DEA2-E739-429C-9025-38F233D1257F%40cs.stanford.edu.

Philip Levis

unread,

May 11, 2021, 7:18:20 PM5/11/21

to Vadim Sukhomlinov, Amit Levy, Tock Embedded OS Development Discussion

> On May 11, 2021, at 3:21 PM, 'Vadim Sukhomlinov' via Tock Embedded OS Development Discussion <tock...@googlegroups.com> wrote:
>
> Phil, I think that case with DCRYPTO is due to poor choice of abstraction primitives - working with similar engine we exposed only high-level operations to user mode - like ECDSA P256 sign instead of raw access to device - reason is that in order to meet certification requirements crypto implementation should be hardened against side channels, fault injections, etc on one side, and proper key management, hygienic rules of using crypto enforced on other side - so we opted for having crypto service provider as a capsule. Still, you some operations require many memory parameters for input and output and sometimes with result overlapping source parameters to save space. Say, encrypt where output overwrites input - this is perfectly fine as you read block, process, then write.

Vadim,

Sure, if you encapsulate DCRYPTO in this way, you can solve this problem. And for the reasons you point out, it’s probably better to encapsulate. My point wasn’t that DCRYPTO in particular had this as a requirement, but rather than there are peripherals which have this twist to them. I.e., I was trying to give a concrete example of a particular design consideration, not argue it’s a requirement for DCRYPTO.

Phil

Alistair Francis

unread,

May 11, 2021, 11:56:49 PM5/11/21

to Leon Schuermann, Brad Campbell, Vadim Sukhomlinov, Johnathan Van Why, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Miguel Young de la Sota

On Wed, May 12, 2021 at 4:33 AM Leon Schuermann
<le...@is.currently.online> wrote:
>
>
> Brad Campbell <bra...@gmail.com> writes:
> > It seems there are two key questions to be answered:
> >
> > 1. Is having overlapping `allow`ed buffers a desirable feature,
> > regardless of any rust safety considerations?
>
> I'd argue that overlapping `allow`ed buffers might be a very desirable
> feature, and I do think that Vadim's crypto usecase is quite convincing,
> along with implementations of network protocols (say, routing a packet
> from 6LoWPAN to Ethernet). Most likely, the arbitration of when a crypto
> operation is finished / a packet is received or transmitted is not
> coupled to allow operations themselves, and so applications could take
> advantage of the fact that -- chosen wisely -- allows to certain
> userspace buffers pointing to the correct frame boundaries in a buffer
> would neither require changing the allowed buffers over time, nor a copy
> in userspace.
>
> This does however collide with the requirement that buffers must not be
> modified while shared with the kernel (related to your question on
> TRD104). At the risk of derailing this discussion: at the time that

That seems like an unenforceable requirement.

It seems reasonable that we can say app functionality relies on the
data not changing. For example if an app says "print this buffer" then
changes the buffer we don't make guarantees on what will be printed.
That could even be a capsule specific thing.

But requiring buffers not to be modified for any safety/soundness
doesn't seem like a viable option.

> seemed like a good requirement to me. However, given we can't really
> reasonable enforce it given the MPU's granularity, it might be worth
> questioning that restriction if it makes things easier here?

Technically RISC-V PMP could enforce this, but that would limit us to
a limited number of allow-able buffers which I don't think we want to
do.

Alistair

>
> > 2. What does "significant" mean in terms of what level of per-allow-call
> > overhead would be acceptable?
>
> That's a good question, which I suppose will have to involve
> benchmarking the actual cost. I'd like to make the following observation
> nonetheless: If we don't support overlapping buffers, and don't support
> modifying allowed buffers, applications will have to make `allow` system
> calls much more frequently. This might emphasize any per-allow cost.
>
> - Leon
>

> --
> You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/87tun986iv.fsf%40silicon.

Philip Levis

unread,

May 16, 2021, 8:14:41 PM5/16/21

to le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, tock-at-...@lists.stanford.edu

(Cc’ing Tock-at-stanford in case folks there find this interesting)

On May 10, 2021, at 7:38 AM, Leon Schuermann <le...@is.currently.online> wrote:

1. Perform runtime-checks to ensure that no two userspace buffers can
  overlap.

  This might have a significant performance & resource overhead, as
  presumably a central data structure must keep information about all
  buffers already shared by userspace. Furthermore, every `allow`
  operation (sharing a new memory region) must walk over the entire
  list to check for an overlapping region. In a sorted data structure,
  it might be possible to perform a binary search instead.

I’ve been poking at this for the past few days. Here are some initial results. The workload (not a very comprehensive one) allows 6 buffers, then removes 2 of them from the allowed set. Allows 0-3 are in increasing addresses, while allows 4 and 5 both have lower addresses than allow #3. These represent insertions (rather than appends) to the list of sorted buffers.

Allow 0: 68 ticks

Allow 1: 175 ticks

Allow 2: 190 ticks

Allow 3: 206 ticks

Allow 4: 251 ticks

Allow 5: 281 ticks

Remove 5: 272 ticks

Remove 1: 177 ticks

There is a ~7 cycle overhead on measurement (the duration between two reads to CYCCNT).

You can see there’s a cost jump between (0-3) and (4-5) because of the insertion.

Allow 0 is special cased, since you can just deterministically use the 0th region and 0th list element. So no searching is needed.

These cycle counts are higher than I expected. I looked at the assembly, and there is a lot of overhead in spilling registers. For example, insert looks like this:

// Returns whether it was inserted successfully

pub fn insert(&mut self, start: u32, size: u32) -> bool {

let end = start + size;

match self.count {

0 => self.insert_first(start, end),

SIZE => false,

_ => self.insert_into(start, end),

}

Calling this for the first insertion (then calls insert_first) takes 68 cycles. But if the call insert directly calls insert_first, that call takes only 18 cycles.

Where do those extra 50 cycles go?

It looks like it’s mostly due to register spilling, stack manipulation, and the fact that inlining prevents good table lookups for the match (using inline(never) hurts performance, though). For example, the assembly pushes 5 registers to the stack and stores an address from the stack into r7 before loading self.count, computing end, and then branching with insert_first with a cbz (compare-and-branch-if-zero) instruction.

What’s interesting is what happens with the cbz instruction. It jumps to a block of assembly in the middle of the function, which then has an unconditional branch later in the function to an instruction that is an unconditional branch to an auto-generated OUTLINED_FUNCTION_243, which seems to be a generated re-used 2-instruction function postamble (so a way, with an additional brach, to save an instruction in exchange for an extra branch).

I tried compiling with opt-level = 3 rather than = “z” to see if these are weird artifacts due to focusing on code size. Level 3 is only a few cycles shorter.

A couple of other observations:

- The idea that helpers like .iter() and .iter().enumerate() will lead to more efficient code seem mistaken when you are dealing with small data structures. The cost of invoking corresponding methods and the support code is much greater than the cost of a bounds check. I am going to do some tests to see when the savings of the check are worth it. If I had to guess, I think it will be around 20 elements.

- For similar reasons, calling supporting methods is also expensive. Directly shifting a small array right 1 element with memmove was >4x as expensive as doing it with a for loop.

The first one goes back to my experience of moving unaligned data to align it: I recall a complex series of idiomatic Rust calls could do it, but were 4-6x slower than a manual byte assembly.

I’ll keep on poking at this. I want to count the instructions and clock cycles to see where those 50 cycles go. I don’t think Allow 0 is a performance-critical use case, but I figure it’s the simplest of the lot so a good place to start.

One note: my implementation does use inline assembly in one place, to leverage the ARM clz instruction to find a free Region data structure (a word stores a bit mask of which are free). So you can find the first free (first 1 bit) in a single instruction rather than iterating through the bits.

Phil

Amit Levy

unread,

May 16, 2021, 8:18:36 PM5/16/21

to tock...@googlegroups.com

> One note: my implementation does use inline assembly in one place, to
> leverage the ARM clz instruction to find a free Region data structure
> (a word stores a bit mask of which are free). So you can find the
> first free (first 1 bit) in a single instruction rather than iterating
> through the bits.

Not a comprehensive response, but FYI, you should be able to get that
without assembly using `u32#leading_zeros`
(https://doc.rust-lang.org/std/primitive.u32.html#method.leading_zeros)
or `u32#trailing_zeros` (or the same methods for any other primitive
integer type). It's implemented in the compiler but I _believe_ it uses
arch specific instructions, such as `clz`, when possible.

Philip Levis

unread,

May 16, 2021, 10:17:27 PM5/16/21

to Amit Levy, tock...@googlegroups.com

On May 16, 2021, at 5:18 PM, Amit Levy <am...@amitlevy.com> wrote:

Not a comprehensive response, but FYI, you should be able to get that without assembly using `u32#leading_zeros` (https://doc.rust-lang.org/std/primitive.u32.html#method.leading_zeros) or `u32#trailing_zeros` (or the same methods for any other primitive integer type). It's implemented in the compiler but I _believe_ it uses arch specific instructions, such as `clz`, when possible.

It does use clz, but using it instead of my assembly is 1-5 cycles slower. My assembly is effectively

r0 stores the free mask

ldr r2, [r0, #0] # Load the free mask

clz r1, r2 # Count 0s in the mask into r1

mov r2, 0x80000000 #

lsr r2, r2, r1 # Put the first set 1 in r2

bic r1, r1, r2 # Remove the bit set in r2 from r1

str r1, [r0 #0] # Store the new free mask back

It looks like leading_zeros adds an instruction, so it looks like this

ldr r2, [r0, #0] # Load the free mask

clz r1, r2 # Count 0s in the mask into r1

and r1, r1, #31 # Mask out bottom 5 bits

mov r2, 0x80000000 #

lsr r2, r2, r1 # Put the first set 1 in r2

bic r1, r1, r2 # Remove the bit set in r2 from r1

str r1, [r0, #0] # Store the free mask back

movs r1, #0

This seems to be because the definition of the function is different than clz. This implementation is such that if it is all 0s (32 leading zeroes), it will return 0. However, CLZ produces 32.

It looks like the Rust version also adds a movs instruction after, clearing out a register.

I don’t know where the extra 3 cycles are coming from.

Vadim Sukhomlinov

unread,

May 16, 2021, 10:58:27 PM5/16/21

to Philip Levis, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, tock-at-...@lists.stanford.edu

I wonder why it should be at allow() time and not at AppSlice::take() time? At take() time it's possible to gracefully handle intended overlaps if needed by having regular take() which would either panic or return None in case of overlap - this way capsule will most likely fail due to incomplete data, and may provide take_with_overlap() which would return slice anyway, but indicate that there is an overlap, so capsule may handle it. Another useful variant of take would be overlap within regions for the same driver - this is most likely intended in cases where data is fully read and processed before it's modified - use case crypto with large inputs & outputs, and output overwriting input to save memory.

To reduce overhead per application lists can be used, so inserts will be faster due to fewer items to look at. Maybe simple inserts in an array with moving elements would be faster for the scale of the problem (i'm not sure what self.insert_into(start, end) implementation is though, so please disregard if n/a.

Vadim

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/1918DE0D-8C42-42CC-BE78-B5D35D40B5D0%40cs.stanford.edu.

Hudson Randal Ayers

unread,

May 16, 2021, 11:13:08 PM5/16/21

to Vadim Sukhomlinov, Philip Levis, le...@is.currently.online, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, tock-at-...@lists.stanford.edu

> I wonder why it should be at allow() time and not at AppSlice::take() time? At take() time it's possible to gracefully handle intended overlaps if needed by having regular take() which would either panic or return None in case of overlap - this way capsule will most likely fail due to incomplete data,

Currently, most uses of AppSlice do not use `take()` to modify the buffer -- map_or and other similar functions are more commonly used, and prevent a better, less error-prone API (with take() it is easy to forget to replace the buffer once you get to more complex state machines). Of course, we could do the check on map_or() / mut_map_or() etc. as well, as those methods are already fallible. I think we want to avoid a design which creates another opportunity for apps to panic the kernel, but given that accessing an AppSlice can already fail if the process does not exist it seems fine to fail in the case an AppSlice overlaps.

Choosing between at allow() time vs at access time just seems like a decision on two points:

Whether higher overhead once per allow is preferable to lower overhead once per access, which probably depends on the access pattern and maximum number of possible allows.
Whether we consider it important to return errors to userspace when overlapping buffers are allowed. Accepting an allow but then not letting a capsule use the allowed buffer is a pretty bad API, so I think we only want to make that the API if the performance win is pretty clear compared to checking at allow() time.

> and may provide take_with_overlap() which would return slice anyway, but indicate that there is an overlap, so capsule may handle it. Another useful variant of take would be overlap within regions for the same driver - this is most likely intended in cases where data is fully read and processed before it's modified - use case crypto with large inputs & outputs, and output overwriting input to save memory.

We cannot provide a `take_with_overlap()` that returns a mutable Rust slice without introducing UB. We could have a take_with_overlap() that provides a slice_of_cells, but it would make handling buffers accessed via take_with_overlap different than buffers used otherwise, which is somewhat confusing (and probably reduces some of the performance win of using overlapping buffers).

Hudson

From: Vadim Sukhomlinov <sukho...@google.com>
Sent: Sunday, May 16, 2021 7:58 PM
To: Philip Levis <p...@cs.stanford.edu>
Cc: le...@is.currently.online <le...@is.currently.online>; Hudson Randal Ayers <hay...@stanford.edu>; Amit Levy <aal...@cs.princeton.edu>; Tock Embedded OS Development Discussion <tock...@googlegroups.com>; Johnathan Van Why <jrva...@google.com>; tock-at-...@lists.stanford.edu <tock-at-...@lists.stanford.edu>

Subject: Re: [tock-dev] Re: Approaches to the AppSlice Aliasing Soundness Issues

Philip Levis

unread,

May 16, 2021, 11:13:23 PM5/16/21

to Vadim Sukhomlinov, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, tock-at-...@lists.stanford.edu

On May 16, 2021, at 7:58 PM, 'Vadim Sukhomlinov' via Tock Embedded OS Development Discussion <tock...@googlegroups.com> wrote:

To reduce overhead per application lists can be used, so inserts will be faster due to fewer items to look at. Maybe simple inserts in an array with moving elements would be faster for the scale of the problem (i'm not sure what self.insert_into(start, end) implementation is though, so please disregard if n/a.

Yeah, I assume there will be per-process lists.

The current data structure is an array of Region objects and an array-based vector of indices (u8s) into the Region array. I had thought that having an array of u8s would allow insertions with just a few instructions (only need a few word shifts). But since memmove cost too much it just does a byte-level copy. This does mean that there are bounds checks when looking up in the array (e.g., self.regions[self.list[position]]), but these turn out to be less expensive that using iter/enumerate.

Honestly, given how simple the functions are, it seems like ~50% of the cycles are being spent spilling registers and in control flow. The problem with this is that it’s entirely compiler-generated; optimizations aren’t assured to be stable across compiler versions.

Vadim Sukhomlinov

unread,

May 17, 2021, 5:01:51 PM5/17/21

to Philip Levis, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Johnathan Van Why, tock-at-...@lists.stanford.edu

Phil, got it. But then you need to insert in an array of u8 indices. And still read ranges. I'd try with simple insertion in sorted array first - it may be simpler and faster for typical number of allow buffers per application. Also Regions are aligned, so moving elements can be done without checks memmove typically does as direction is known for addition/removal of element. Say on our RISC-V core reading 1 byte and reading a 32-bit word has the same latency, so moving Regions would be just twice slower, but you save by avoiding other instructions.

Leon Schuermann

unread,

May 19, 2021, 4:13:18 PM5/19/21

to Miguel Young de la Sota, Johnathan Van Why, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Philip Levis, Vadim Sukhomlinov

Hudson and I have been drawing up some rough sketches of how Rust types
could look like, using a slice of Cells underneath. While we could
potentially hand out a `&[Cell<u8>]` directly, we've introduced some
#[repr(transparent)] wrappers (called views) which allows us to
implement convenience functions (such as the one Miguel proposed[1]).

Furthermore, a `&[Cell<u8>]` will always enable users to write to the
respective memory region. Unfortunately, this won't work for read-only
allows. We'd either need to introduce our very own "ReadOnlyCell" type
over which we'd create a slice, or wrap this in it's own
#[repr(transparent)] view type, limiting access through the type's API.

A promising draft of how something like this might look like can be
found at this playground:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aec17cf82fc4ce05c9cf32edda4ce088

It's still a little rough around the edges. _If_ this would work from a
soundness perspective however, it could be a promising API for Tock and
in most cases almost be a drop-in replacement for the old types (except
when passing the currently exposed Rust slice down). Notably, we could
closely mirror many methods otherwise available on a `&[u8]`, such as
the `copy_from_slice` as shown in the playground.

- Leon

[1]: https://internals.rust-lang.org/t/memcpy-ing-into-a-slice-of-cells/14682

Amit Levy

unread,

May 20, 2021, 12:26:17 PM5/20/21

to Tock Embedded OS Development Discussion

Leon Schuermann <le...@is.currently.online> writes:

This increasingly seems like the right approach to me.

My read is that dynamically checking to see if allowed slices overlap is
both a bit expensive _and_ less flexible (i.e. having overlapping
buffers could be useful in some scenarios), while using a `Cell`-based
(or `Cell`-like) abstraction is _nearly_ free (there are some current
HILs that won't work without an additional copy in the system call
driver, but those HILs are likely broken anyway).

-Amit

> --
> You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/875yze32jr.fsf%40silicon.

Philip Levis

unread,

May 20, 2021, 12:37:52 PM5/20/21

to Amit Levy, Tock Embedded OS Development Discussion

FWIW, TRD 104 says that you can’t pass overlapping buffers and that you have to return INVALID if they are. 4.4:

“When userspace shares a buffer, it can no longer access it.
…
The Tock kernel MUST check that the passed buffer is contained within the calling process's writeable address space. Every byte of the passed buffer must be readable and writeable by the process. Zero-length buffers may therefore have abitrary addresses. If the passed buffer is not complete within the calling process’s writeable address space, the kernel MUST return a failure result with an error code of `INVALID`.

Because a process relinquishes access to a buffer when it makes a Read-Write Allow call with it, the buffer passed on the subsequent Read-Write Allow call cannot overlap with the first passed buffer. This is because the application does not have access to that memory. If an application needs to extend a buffer, it must first call Read-Write Allow to reclaim the buffer, then call Read-Write Allow again to re-allow it with a different size. If userspace passes an overlapping buffer, the kernel MUST return a failure result with an error code of `INVALID`."

Phil

> To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/87o8d5z80t.fsf%40amitlevy.com.

———————

Amit Levy

unread,

May 21, 2021, 10:04:19 AM5/21/21

to Tock Embedded OS Development Discussion

Philip Levis <p...@cs.stanford.edu> writes:

> FWIW, TRD 104 says that you can’t pass overlapping buffers and that you have to return INVALID if they are. 4.4:
>
> “When userspace shares a buffer, it can no longer access it.

Hrm... Yes, well, that's also a good point... :)

Hudson Randal Ayers

unread,

May 21, 2021, 11:52:24 AM5/21/21

to Amit Levy, Tock Embedded OS Development Discussion

> FWIW, TRD 104 says that you can’t pass overlapping buffers and that you have to return INVALID if they are. 4.4:

My understanding was that TRD 104 was written that way because we assumed it to be necessary for soundness in the kernel. Given that no longer seems to be the case, I think that is something acceptable to revisit before actually releasing Tock 2.0

> “When userspace shares a buffer, it can no longer access it.

This has been mentioned elsewhere by Brad, but this requirement is unenforceable by the kernel on most devices currently supported by Tock (not enough MPU regions / granularity) and as a result is just untrue. We could change it to "should no longer access it in order to ensure correct operation", perhaps.

From: tock...@googlegroups.com <tock...@googlegroups.com> on behalf of Amit Levy <am...@amitlevy.com>
Sent: Thursday, May 20, 2021 9:41 AM
To: Tock Embedded OS Development Discussion <tock...@googlegroups.com>
Subject: Re: [tock-dev] Approaches to the AppSlice Aliasing Soundness Issues

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/87im3dz7bn.fsf%40amitlevy.com.

Philip Levis

unread,

May 21, 2021, 12:12:49 PM5/21/21

to Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion

On May 21, 2021, at 8:52 AM, Hudson Randal Ayers <hay...@stanford.edu> wrote:

> FWIW, TRD 104 says that you can’t pass overlapping buffers and that you have to return INVALID if they are. 4.4:

My understanding was that TRD 104 was written that way because we assumed it to be necessary for soundness in the kernel. Given that no longer seems to be the case, I think that is something acceptable to revisit before actually releasing Tock 2.0

Not completely. It was also assuming that this is a user space error. We can revisit Vadim’s thought that there might be cases when you want to do this. You *can* pass overlapping read-only buffers. Just not read-write.

> “When userspace shares a buffer, it can no longer access it.

This has been mentioned elsewhere by Brad, but this requirement is unenforceable by the kernel on most devices currently supported by Tock (not enough MPU regions / granularity) and as a result is just untrue. We could change it to "should no longer access it in order to ensure correct operation", perhaps.

Yeah, we can fine-tune this wording. Note that it does not say “The kernel will throw an error if you do.” The issue with using should is possible confusion with the meaning of SHOULD.

Phil

———————

Philip Levis (he/him)
Associate Professor, Computer Science and Electrical Engineering

Philip Levis

unread,

May 21, 2021, 12:17:04 PM5/21/21

to Vadim Sukhomlinov, Johnathan Van Why, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

On May 10, 2021, at 4:31 PM, Vadim Sukhomlinov <sukho...@google.com> wrote:

Overlapping mutable references are quite useful for crypto - say output can be in any location, but the same location as one of the sources can be used to save memory assuming correct implementation where source data is read before destination is modified.
In Rust asm it is specified 'inlateout', though used in different contexts. Above use is common in C code, but Rust explicitly prohibits it (can't borrow shared reference while borrowing it as mutable). This requires additional Rust wrappers around C code to handle this case, where only one mutable reference exists, but passed as a pointer twice to C code. But I'm not sure if it is a good idea to support it in Tock.

Can you give a C or Rust code example of what you mean here? I *think* you’re saying that you want to allow() a large region, but then allow() parts of it to do things like fill in encrypted parts of it through system calls.

Hudson Randal Ayers

unread,

May 21, 2021, 12:32:30 PM5/21/21

to Philip Levis, Amit Levy, Tock Embedded OS Development Discussion

> Yeah, we can fine-tune this wording. Note that it does not say “The kernel will throw an error if you do.” The issue with using should is possible confusion with the meaning of SHOULD.

I submitted https://github.com/tock/tock/pull/2590 as a starting point for that discussion.

From: Philip Levis <p...@cs.stanford.edu>
Sent: Friday, May 21, 2021 9:12 AM
To: Hudson Randal Ayers <hay...@stanford.edu>
Cc: Amit Levy <am...@amitlevy.com>; Tock Embedded OS Development Discussion <tock...@googlegroups.com>

Subject: Re: [tock-dev] Approaches to the AppSlice Aliasing Soundness Issues

Leon Schuermann

unread,

Jun 4, 2021, 4:17:37 PM6/4/21

to Miguel Young de la Sota, Vadim Sukhomlinov, Johnathan Van Why, Hudson Randal Ayers, Amit Levy, Philip Levis, Tock Embedded OS Development Discussion

It's been a while since this discussion came up, and there seem to be a
few great and vastly different approaches on how to solve the current
AppSlice aliasing soundness issues.

As discussed on some Tock calls already, I think it might be a good idea
to have a synchronous discussion on how to proceed here. I'd like to
invite anyone interested to participate in the following poll to figure
out a timeslot that works best:

https://doodle.com/poll/v4x39kcbz5cnhrst

(The times should be in PT and are chosen to work for American and
European timezones. If this doesn't work for you, we can try to find
something other options as well.)

I'd propose to leave the poll open until Monday 12:00 AM PT and send an
update once the time is settled on.

Thanks!

- Leon

Vadim Sukhomlinov

unread,

Jun 4, 2021, 10:48:03 PM6/4/21

to Philip Levis, Johnathan Van Why, le...@is.currently.online, Hudson Randal Ayers, Amit Levy, Tock Embedded OS Development Discussion, Miguel Young de la Sota

Phil, sorry for the late reply. Example for such code would be:
encrypt(key : &Key, input: &[u8], output: &mut [u8]) called as encrypt(&Key, &buf, &mut buf); - while Rust prevents combination of mutable and immutable borrowing, if underlying function is provided by C function like encrypt(struct Key * key, const uint8_t *input, size_t input_len, uint8_t * out, size_t out_len ); which doesn't enforce such restriction, one can develop 'safe' Rust wrapper
encrypt_in_place(key : &Key, input_output: &mut [u8]).

If this is applied to large structures, the benefits of memory saving are significant. Internal processing of encrypt() will be safe - read 16 bytes, process, write 16 bytes - no real issue with writing to the same buffer if needed, while providing the ability to write to another buffer.

If encrypt() is exposed as syscall, it will have allow_ro(&Key), allow_ro(&buf), allow_rw(&mut buf) sequence. But since it disallowed, probably will use 'safe' wrapper with single mutable buffer, though it's less convenient in some cases.

Leon Schuermann

unread,

Jun 7, 2021, 5:44:14 PM6/7/21

to Hudson Randal Ayers, Philip Levis, Tock Embedded OS Development Discussion

Leon Schuermann <le...@is.currently.online> writes:
> As discussed on some Tock calls already, I think it might be a good idea
> to have a synchronous discussion on how to proceed here. I'd like to
> invite anyone interested to participate in the following poll to figure
> out a timeslot that works best:
>
> https://doodle.com/poll/v4x39kcbz5cnhrst
>
> (The times should be in PT and are chosen to work for American and
> European timezones. If this doesn't work for you, we can try to find
> something other options as well.)

It looks like Wed, 2021-06-09 12:00 PM (noon) Pacific Time it is! I'll
send a meeting invite to everyone who voted. If anyone else wants to
participate in this discussion, please send me an email, I'll happily
send invites.

> I'd propose to leave the poll open until Monday 12:00 AM PT and send an
> update once the time is settled on.

Well, that should've been 12:00 PM PT, my bad.

Thanks,

- Leon!

Jett ✈ Rink

unread,

Jun 23, 2021, 5:23:16 PM6/23/21

to Leon Schuermann, Hudson Randal Ayers, Philip Levis, Tock Embedded OS Development Discussion

Hey Leon,

What was the outcome of the meeting? Are the semantics of ReadWriteAppSlice/ReadOnlyAppSlice going to change further from what is on tock/master right now?

-Jett

--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/87h7i9gxjq.fsf%40silicon.

Hudson Randal Ayers

unread,

Jun 23, 2021, 5:41:37 PM6/23/21

to Jett ✈ Rink, Leon Schuermann, Philip Levis, Tock Embedded OS Development Discussion

The outcome of that discussion was merged into TRD 104 in https://github.com/tock/tock/pull/2617 , I believe.

From: Jett ✈ Rink <jett...@google.com>
Sent: Wednesday, June 23, 2021 2:23 PM
To: Leon Schuermann <le...@is.currently.online>
Cc: Hudson Randal Ayers <hay...@stanford.edu>; Philip Levis <p...@cs.stanford.edu>; Tock Embedded OS Development Discussion <tock...@googlegroups.com>
Subject: Re: [tock-dev] Re: Approaches to the AppSlice Aliasing Soundness Issues

Leon Schuermann

unread,

Jul 19, 2021, 5:09:48 PM7/19/21

to Jett Rink, Hudson Randal Ayers, Philip Levis, Tock Embedded OS Development Discussion

'Jett ✈ Rink' via Tock Embedded OS Development Discussion

<tock...@googlegroups.com> writes:
> Hey Leon,
>
> What was the outcome of the meeting? Are the semantics
> of ReadWriteAppSlice/ReadOnlyAppSlice going to change further from what is
> on tock/master right now?

Sorry for the late reply. The approach as discussed and developed on
this thread has been integrated into tock/master, through
https://github.com/tock/tock/pull/2632.

- Leon

Reply all

Reply to author

Forward