## A hash over the application's flash memory
Tock could generate a kernel-internal application identifier by
calculating a cryptographic hash of the entire application's flash
memory region, or parts of it. This way, the application itself is its
own persistent ID. No tooling update would be required. This would only
work if the persistent nonvolatile storage region was outside of the
hashed memory region, or on external storage (SD card, EEPROM).
This is motivated by existing cryptographic devices such as TPMs, which
show the behavior of clearing application data during a firmware update.
Possible issues include:
- Change of the application ID by writing to its own memory location.
- Persisting storage during application upgrades.
- Running multiple instances of an identical binary, which are not
considered to build an application compound.
- Hash function in software (#1728)
- Identification of a compound of multiple application binaries (if that
were to be called a "Tock application" as per definition from [1]).
Side note: For the specific use case of storing sensitive information
that must not survive an application upgrade, it may be better to use
this approach not for identifying the application but using the
cryptographic hash along with user provided secrets for a transparent
encryption of the associated nonvolatile storage region (less snake
oil).
--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/7e8bad13-0e18-4919-a951-be429ae696fe%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/87zhbnnj61.fsf%40silicon.is.currently.online.
--
You received this message because you are subscribed to the Google Groups "Tock Embedded OS Development Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tock-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CAJqTQ1gr_TfmFMX-G6LNYRPj%2BafMy85Sm2Hx160rnGEZxQ9h5A%40mail.gmail.com.
Hi all,
It looks to me that we don't need an application id to add persistent storage support in Tock. It would be a sufficient condition, but it's not a necessary condition.
From a high-level point of view, what we need is the following relation:
has_access: Application -> Permission -> Storage -> bool
enum Permission {
Read,
Write,
Erase,
}
(The permission part may be dropped if we believe permissions should always be either none or all, but I'll keep it for the rest since it doesn't hurt.)
We have `has_access(application, permission, storage)` only if the owner of the storage agrees that the application may access it with that permission. How this is done is discussed later.
If we have `has_access(application, permission, storage)` then the application may access the storage with that permission. How this is done is discussed later.
What seems like the simplest way to implement this would be for each storage to come with a public key. This is how a storage is identified.
We extend the Tock Binary Format Header (TBF header) with the following:
- A new kind of header (either a different version or some other solution) that specifies a storage instead of an application. It contains the following fields:
- The storage identity (i.e. its public key).
- The storage location (e.g. start and end address).
- For application headers (header version 2), we add a list of storages in a similar way as writable flash regions. Each element of this list would contain the following fields:
- The identity of the storage.- (optional: The location of the storage if the application does not support a storage at arbitrary location.)
- A signature (with the storage private key) of the permission and the application binary. In particular, each application update needs a new approval from the storage owner. Different applications may access the same storage if the owner of the storage allows it.
When the kernel boots and goes through the linked list of TBF headers, it not only creates the processes, but also creates the storages metadata. This means that in addition to the list of processes, the kernel also stores the list of storages.
A storage in the kernel would contain its identity and location. When the kernel creates a process, it checks for each storage that the storage exists and the permission is valid.
Access to a storage is done in 2 places. When the process is created, the MPU is configured to give read access to the storage location (do we know any boards where the flash is not mapped to memory? If yes, this needs to be parametrized). When the application accesses the flash syscall driver, the driver checks the permission. For that, the kernel exposes in `Process` a method to check if the process has a given permission for a given flash slice, essentially looping through the storages where the slice fits and checking the permission for that storage.
Note that if we don't want the kernel to iterate twice through the linked list of TBF headers (first to parse storages then to parse processes), storages need to be defined before they are used. This should be easy to check in tockloader.
This is very high-level, but I think agreeing on the high-level APIs would help parallelize the work on sub-components. Does this high-level picture fit into Tock design principles?
On Wed, May 20, 2020 at 8:55 AM Julien Cretin <julien.cr...@polytechnique.org> wrote:Hi all,
It looks to me that we don't need an application id to add persistent storage support in Tock. It would be a sufficient condition, but it's not a necessary condition.If we keep this proposal scoped to persistent storage, then we will end up in a similar situation to "Tock has multiple distinct types of application ID". We just wouldn't call them "application IDs".
From a high-level point of view, what we need is the following relation:
has_access: Application -> Permission -> Storage -> bool
enum Permission {
Read,
Write,
Erase,
}
(The permission part may be dropped if we believe permissions should always be either none or all, but I'll keep it for the rest since it doesn't hurt.)
We have `has_access(application, permission, storage)` only if the owner of the storage agrees that the application may access it with that permission. How this is done is discussed later.
If we have `has_access(application, permission, storage)` then the application may access the storage with that permission. How this is done is discussed later.
What seems like the simplest way to implement this would be for each storage to come with a public key. This is how a storage is identified.
We extend the Tock Binary Format Header (TBF header) with the following:
- A new kind of header (either a different version or some other solution) that specifies a storage instead of an application. It contains the following fields:
- The storage identity (i.e. its public key).
- The storage location (e.g. start and end address).Defining storage locations as address ranges in flash is extremely coarse and fairly wasteful. These storage regions would need to be aligned to the nearest flash page, and flash pages are quite large (e.g. 2 KiB in the H1 chip). This design also prevents wear levelling: if one storage is accessed heavily its flash pages will wear out much quicker than other. These concerns are some of the motivating reasons why OpenTitan is pursuing a key value store API rather than raw flash access.
- For application headers (header version 2), we add a list of storages in a similar way as writable flash regions. Each element of this list would contain the following fields:
- The identity of the storage.- (optional: The location of the storage if the application does not support a storage at arbitrary location.)
- A signature (with the storage private key) of the permission and the application binary. In particular, each application update needs a new approval from the storage owner. Different applications may access the same storage if the owner of the storage allows it.If I understand this correctly, this means that different processes may access a single storage concurrently. This causes race conditions that will need to be handled. Some of the possible choices for application ID don't have this issue.
When the kernel boots and goes through the linked list of TBF headers, it not only creates the processes, but also creates the storages metadata. This means that in addition to the list of processes, the kernel also stores the list of storages.Where would the kernel store this list? The only dynamic memory allocation supported in the Tock kernel is grant regions, which is per-process. That said, I don't see the need for this list either. Unless I am missing something, the kernel could reference the original storage list when it needs it.
A storage in the kernel would contain its identity and location. When the kernel creates a process, it checks for each storage that the storage exists and the permission is valid.
Access to a storage is done in 2 places. When the process is created, the MPU is configured to give read access to the storage location (do we know any boards where the flash is not mapped to memory? If yes, this needs to be parametrized). When the application accesses the flash syscall driver, the driver checks the permission. For that, the kernel exposes in `Process` a method to check if the process has a given permission for a given flash slice, essentially looping through the storages where the slice fits and checking the permission for that storage.
Note that if we don't want the kernel to iterate twice through the linked list of TBF headers (first to parse storages then to parse processes), storages need to be defined before they are used. This should be easy to check in tockloader.
This is very high-level, but I think agreeing on the high-level APIs would help parallelize the work on sub-components. Does this high-level picture fit into Tock design principles?Let me paint this design in a different light, using different terminology to refer to the same concepts. Instead of a "storage", you just have an "application". The global list of "storages" would be a global list of "applications". The "storage identity (public key)" would become the "application identity (public key)". A process binary can be signed by one or more "applications", indicating that process is part of that application and has access to the application's storage region.
With the renaming, we can then categorize this design based on the axes I outlined in my first message in this thread:
- Form: 5, a public key, with the binary signed by the corresponding key
- Verification time: 3 (app startup), plus maybe 5 (runtime -- this depends on how the runtime permissions check works)
- Use cases supported: 1 (if the kernel only boots applications in the application list, and that list is signed with the kernel image), 2 (storage), and 3 (cryptography -- each application could have its own crypto realm based on the public key).
I have the following concerns with this design:
- It requires crypto. Some Tock users may not want to involve crypto in the application build/deployment process. Not all hardware Tock supports has hardware-accelerated crypto, so this could have a lot of runtime overhead.
- It seems overly complex for what it does. I think we could handle everything in per-app headers, rather than introducing a new type of header into Tock.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CAJqTQ1iaurNeu%2Boj0XcqS5KUz2PJ8XGtCYHRGGkkaiKDEoWhxw%40mail.gmail.com.
Hi Johnathan,On Wed, May 20, 2020 at 9:35 PM 'Johnathan Van Why' via Tock Embedded OS Development Discussion <tock...@googlegroups.com> wrote:On Wed, May 20, 2020 at 8:55 AM Julien Cretin <julien.cr...@polytechnique.org> wrote:Hi all,
It looks to me that we don't need an application id to add persistent storage support in Tock. It would be a sufficient condition, but it's not a necessary condition.If we keep this proposal scoped to persistent storage, then we will end up in a similar situation to "Tock has multiple distinct types of application ID". We just wouldn't call them "application IDs".I guess the issue would be about naming, because if there are distinct types of application IDs it's probably because they have different semantics. I guess what you suggest is to use the finest application ID definition (the one that can differentiate at least as much as all others) so that all other application IDs can be computed from this one. One issue when doing that is that modularity of updates is lost (updating a single application may require to modify its identity in different places even though the actual type of application ID needed would not have changed). So this is mostly a compromise between modularity of updates and multiplicity of application ID semantics. If the considered use-cases only update all applications simultaneously, then modularity of updates is not useful. In that case it's clearly preferable to have a single application ID definition (the finest one).Another point to consider would be that if coming up with a single application ID definition takes much more time than coming up with some other use-cases independently, it may be preferable to do it in 2 steps: first experiment with other application ID use-cases, then unify them with the gained hindsights. I don't know how long it may take to come up with a unified application ID definition with Tock approval stamp. If we believe it's less than 6 months, then it's not worth doing the 2 step scenario.
From a high-level point of view, what we need is the following relation:
has_access: Application -> Permission -> Storage -> bool
enum Permission {
Read,
Write,
Erase,
}
(The permission part may be dropped if we believe permissions should always be either none or all, but I'll keep it for the rest since it doesn't hurt.)
We have `has_access(application, permission, storage)` only if the owner of the storage agrees that the application may access it with that permission. How this is done is discussed later.
If we have `has_access(application, permission, storage)` then the application may access the storage with that permission. How this is done is discussed later.
What seems like the simplest way to implement this would be for each storage to come with a public key. This is how a storage is identified.
We extend the Tock Binary Format Header (TBF header) with the following:
- A new kind of header (either a different version or some other solution) that specifies a storage instead of an application. It contains the following fields:
- The storage identity (i.e. its public key).
- The storage location (e.g. start and end address).Defining storage locations as address ranges in flash is extremely coarse and fairly wasteful. These storage regions would need to be aligned to the nearest flash page, and flash pages are quite large (e.g. 2 KiB in the H1 chip). This design also prevents wear levelling: if one storage is accessed heavily its flash pages will wear out much quicker than other. These concerns are some of the motivating reasons why OpenTitan is pursuing a key value store API rather than raw flash access.I guess you have in mind use-cases with multiple applications using persistent storage. Because if there is a single application running on top of Tock, it's actually as efficient and more flexible for the application to have access to raw flash locations and implement flash-efficient data-structures on top of it, because the persistent storage would have a Rust API instead of a syscall API, so richer types can be used.For multiple applications it's indeed better to share the flash-efficient data-structures in Tock to avoid each application to have the library code in flash (this is actually a general issue of multiple applications use-cases that induce code duplication between applications, for example some libtock code or some basic language libraries). Note also that an alternative to avoid complexifying the kernel code would be to have a single application handling the persistent storage (in that case it may have access to raw flash locations) and providing some IPC interface for other applications. However, there is still the issue that the persistent storage API would be described in terms of simple data (integers and slices) instead of arbitrary Rust data (enums, lists, nested structures, etc).
Note that one problem with multiple applications accessing a single persistent storage, is that some permissions need to be defined regarding capacity (how many words an application may store at a given time) and lifetime (how many words an application can write in the lifetime of the flash).- For application headers (header version 2), we add a list of storages in a similar way as writable flash regions. Each element of this list would contain the following fields:
- The identity of the storage.- (optional: The location of the storage if the application does not support a storage at arbitrary location.)
- A signature (with the storage private key) of the permission and the application binary. In particular, each application update needs a new approval from the storage owner. Different applications may access the same storage if the owner of the storage allows it.If I understand this correctly, this means that different processes may access a single storage concurrently. This causes race conditions that will need to be handled. Some of the possible choices for application ID don't have this issue.This decision is up to the storage owner. The storage owner reviews the application code before signing it.
When the kernel boots and goes through the linked list of TBF headers, it not only creates the processes, but also creates the storages metadata. This means that in addition to the list of processes, the kernel also stores the list of storages.Where would the kernel store this list? The only dynamic memory allocation supported in the Tock kernel is grant regions, which is per-process. That said, I don't see the need for this list either. Unless I am missing something, the kernel could reference the original storage list when it needs it.The kernel would store this list next to the list of processes: https://github.com/tock/tock/blob/b2e4c162aa7e6764e8368dc64ceabadc4cef1d88/kernel/src/sched.rs#L34.The process would access it through this reference: https://github.com/tock/tock/blob/b2e4c162aa7e6764e8368dc64ceabadc4cef1d88/kernel/src/process.rs#L702.There is no need for dynamic allocation. The board would statically allocate the list of storages as this is currently the case for processes.A storage in the kernel would contain its identity and location. When the kernel creates a process, it checks for each storage that the storage exists and the permission is valid.
Access to a storage is done in 2 places. When the process is created, the MPU is configured to give read access to the storage location (do we know any boards where the flash is not mapped to memory? If yes, this needs to be parametrized). When the application accesses the flash syscall driver, the driver checks the permission. For that, the kernel exposes in `Process` a method to check if the process has a given permission for a given flash slice, essentially looping through the storages where the slice fits and checking the permission for that storage.
Note that if we don't want the kernel to iterate twice through the linked list of TBF headers (first to parse storages then to parse processes), storages need to be defined before they are used. This should be easy to check in tockloader.
This is very high-level, but I think agreeing on the high-level APIs would help parallelize the work on sub-components. Does this high-level picture fit into Tock design principles?Let me paint this design in a different light, using different terminology to refer to the same concepts. Instead of a "storage", you just have an "application". The global list of "storages" would be a global list of "applications". The "storage identity (public key)" would become the "application identity (public key)". A process binary can be signed by one or more "applications", indicating that process is part of that application and has access to the application's storage region.That's an interesting way to see it :-)With the renaming, we can then categorize this design based on the axes I outlined in my first message in this thread:
- Form: 5, a public key, with the binary signed by the corresponding key
- Verification time: 3 (app startup), plus maybe 5 (runtime -- this depends on how the runtime permissions check works)
- Use cases supported: 1 (if the kernel only boots applications in the application list, and that list is signed with the kernel image), 2 (storage), and 3 (cryptography -- each application could have its own crypto realm based on the public key).
I have the following concerns with this design:
- It requires crypto. Some Tock users may not want to involve crypto in the application build/deployment process. Not all hardware Tock supports has hardware-accelerated crypto, so this could have a lot of runtime overhead.
The crypto part can be optional. But in that case there is no guarantee that the storage owner agrees that the application may access it. Is it possible to have this property without crypto? I guess you have in mind the use-case where all applications are bundled together and signed by a single person. That person would be responsible for checking all permissions before signing. But then I don't see how the signature would be checked if not by the chip during boot.
- It seems overly complex for what it does. I think we could handle everything in per-app headers, rather than introducing a new type of header into Tock.
I think the main argument against this proposal as I see it, is the multiple applications scenario. This is not something I considered and I don't know how wide-spread those scenarios are (because they are quite wasteful in terms of flash due to code duplication and MPU alignment constraints).
A simple way to modify this proposal to remove the issues you see would be to have a single set of flash locations defined in the board (together with an optional public key). The kernel would provide a syscall API for a flash-efficient data-structure on top of those locations. This permits to remove the need for the new header for storages since there is only one and it's defined in the board. The application would still have a set of permissions in their header (how much capacity and lifetime they are allowed to use) which would be optionally signed (if crypto is desired). As I see it, we don't even need a notion of application ID since the permissions are checked when the process is created and then the process identity is the same until the next power off. Does that make sense to you?
A simple way to modify this proposal to remove the issues you see would be to have a single set of flash locations defined in the board (together with an optional public key). The kernel would provide a syscall API for a flash-efficient data-structure on top of those locations. This permits to remove the need for the new header for storages since there is only one and it's defined in the board. The application would still have a set of permissions in their header (how much capacity and lifetime they are allowed to use) which would be optionally signed (if crypto is desired). As I see it, we don't even need a notion of application ID since the permissions are checked when the process is created and then the process identity is the same until the next power off. Does that make sense to you?I believe I understand you correctly.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CAJqTQ1geV_CzqW4%2BYJ8ggxwE0buamRMmeQZ%3DP08DaHagCHC2Eg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tock-dev/CAHfspH%3Do0RpMekvvVyVHNiRmagXq9cXq-6jUkbpiw88WwvFeqQ%40mail.gmail.com.
Application IDs and the information required to verify them are encoded in a userspace binary's TBF headers. The format of the data in the header is board-specific. Boards that support application IDs are responsible for parsing and validating the application ID header themselves, the core kernel is only responsible for storing the computed ID. Each process may have at most 1 application ID, although each binary may have multiple TBF headers, each of which is supported by a single board. Application IDs are optional.In-RAM data type:
The kernel will store the application ID in the Process struct as a Option<&'static [u8]>. Generally, the slice reference will point into the TBF header itself, although that is up to the board.TLV Element:
Application ID TLV Parsing:We introduce a new TLV element type to store application ID information. The TLV is variable-sized, and its contents are board-specific. It is expected that the TLV's data starts with the application's intended ID, and that additional data used to verify the ID (such as a signature of the ID is a public key) follows the application ID.
As the core kernel is parsing each userspace binary's TBF headers, when it encounters the application ID TLV element, it will call into a board-provided function to interpret the TLV element. To keep this explanation clear, let's call it decode_app_id(). decode_app_id() can return 2 possible values:
1. An application ID (&'static [u8]). When the kernel receives this it will store that ID and skip processing any additional application ID TLV elements it finds.
2. Invalid ID. This indicates the application ID TLV element is not recognized by the board file. This will ignore that TLV element.
If the board does not recognize any application ID TLV elements (either because there are none or because they are all invalid), the application ID for the process will be set to None.Properties of this design:
Hello,
Johnathan Van Why <jrva...@google.com> writes:
> Based on today's core working group call, here is my proposal for
> application IDs.
I think this is a good summary and emphasises the distinction between an
in-RAM identifier useful for allocating resources to apps, and the
(cryptographic) integrity and safety verifications - for whichever
purposes they may be used.
I do however see three issues:
> Application IDs are optional.
Is that really a good idea? I strongly believe once implemented
application IDs will be an integral part of the kernel and resource
assignment to apps. Think a partitioned storage: having _optional_
application IDs will require special handling of these cases in the
kernel. Furthermore, an app will have such drivers fail for a (from an
application code point of view) seemingly arbitrary reason - namely the
application ID not set in the TBF header.
Given the low overhead of such an identifier, as well as the ubiquitous
use cases, I'm in favor of making an application ID mandatory.
> The format of the data in the header is board-specific.
I believe this is a good idea, but in order to make the IDs mandatory
and to be able to explore the different concepts in the upstream Tock
boards, a default format should be provided. That default does not have
to serve any security or verification purposes, just sufficient for
allocating resources to individual apps. If a specific use case exists,
one can always swap the application ID out later.
> *In-RAM data type:*
>
> The kernel will store the application ID in the Process struct as a
> Option<&'static [u8]>. Generally, the slice reference will point into the
> TBF header itself, although that is up to the board.
I'm afraid of using a slice here, as that introduces potentially
unbounded complexity when comparing application IDs. More important,
persistently allocating resources across reboots would require the
kernel to store the once assigned identifier to the resource either in
NVM or with the resource itself. Given that your proposed identifier has
no size limitations, this could prove impossible or difficult at best.
ID size sensitivity: Storage benefits from having a fixed-size ID, as a variable-size ID would require many of the storage layer's data structure elements to have variable size when it would otherwise be unnecessary. A storage layer with ACLs would benefit greatly from small IDs as application IDs will probably be stored repeatedly in the ACLs.Does trust span across boots: Yes. The storage layer must trust data it wrote into non-volatile storage on a previous boot.Requires cryptographic verification: No, cryptographic verification is an optional security improvement.
Secure bootID size sensitivity: I would expect system call filtering to benefit greatly from small IDs, as ACLs would contain repeated application IDs.Does trust span across boots: No. The ACLs can be part of the kernel image or application image and can be cryptographically verified during the boot sequence.
Requires cryptographic verification: No, cryptographic verification is an optional securtity improvement.
ID size sensitivity: Secure boot is not size sensitive. Smaller IDs would improve the size of the kernel, but as the IDs must be cryptographic signatures of some sort I can't think of a way to do so that doesn't increase the size of the TBF headers by at least the same amount. Computing smaller IDs is slower than comparing large IDs.Key derivation
Does trust span across boots: No, the cryptographic verification would occur at each boot.
Requires cryptographic verification: Yes. "Secure boot" without cryptographic verification would have identical security to just erasing unwanted apps.
ID size sensitivity: Smaller IDs would improve key derivation performance, but are not unlikely to be a deal-breaker.IPC
Does trust span across boots: No, if the key verification is performed by dedicated cryptographic hardware (e.g. H1 does this).
Requires cryptographic verification: No. Security will be as strong as the app loading mechanism if unverified IDs are used.
ID size sensitivity: Moderate. Smaller IDs would reduce the size of process binaries that contain application IDs and the RAM usage of processes that manipulate application IDs at runtime.
Does trust span across boots: No, IPC is only between concurrently-running processes.
Requires cryptographic verification: No, it is an optional security improvement.
In-RAM data typeApplication IDs are arbitrary 48-byte sequences. Application IDs and the information required to verify them are encoded in a userspace binary's TBF headers. The format of the data in the header is board-specific. Boards are responsible for parsing and validating the application ID header themselves; the core kernel is responsible for storing the computed ID. Although each process binary may have multiple IDs in its TBF headers, each process has 1 application ID. If a process binary does not have a recognized ID in the TBF headers, a board must "invent" an ID for that process to load.
TLV ElementThe kernel will store the application ID in the Process struct as a Option<&'static [u8; 48]>. Generally, the slice reference will point into the TBF header itself, although that is up to the board.
TLV Element ParsingWe introduce a new TLV element type to store application ID information. The TLV is variable-sized, and its contents are board-specific. In most cases, the TLV will contain the application ID, and in many cases it will also contain ID verification information.
The board must provide two functions to the core kernel:
- decode_app_id, which validates an application ID TLV entry. If the ID is valid it returns the ID, otherwise it returns None.
- invent_app_id, which either invents and returns an application ID for the process or returns None.
When the core kernel tries to load a process, it will use the first application ID that decode_app_id returns (evaluated in the same order as the TLV entries in the TLB header). If it does not find an ID TLV entry or decode_app_id always returns None, it will call invent_app_id to invent an ID for the process. If invent_app_id returns None the kernel will skip loading that process.
The system call filter ACLs can contain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The ACLs can then refer to applications by their short ID rather than their application ID, in order to compress the list size. At runtime, the system call filter will need to search the map to find an application's short ID, then perform authorization checks using the short ID. For performance, the system call filter can cache the short ID in the process' grant region.Storage ACL Compression
The storage system can maintain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The storage system can then use the short IDs internally, performing lookups as necessary to associate the short IDs with processes at runtime. It can use the grant region to cache the short ID lookup results for a process.
Application IDs are arbitrary K-byte sequences, where K is a compile-time constant defined by the board. Application IDs and the information required to verify them are encoded in a userspace binary's TBF headers. Different types of app IDs are stored in different TLV entry types. Boards must decide what app ID TLV entry types they support, and may be responsible for parsing and verifying application IDs. The core kernel is responsible for storing the computed application ID. Although each process binary may have multiple IDs in its TBF headers, each process has 1 application ID. If a process binary does not have a supported ID type in the TBF headers, a board must "invent" an ID for that process, or the process will fail to load.
We will introduce a new type AppId which is a #[repr(C,align(4))] wrapper around [u8; K]. The kernel will store the application ID in the Process struct as a &'static AppId. Generally, the reference will point into the TBF headers, although that is up to the board.
We introduce a new TLV element for unverified IDs. The length field of the TLV element is fixed at 8. The data contained inside is the app ID with no verification data. For apps that are released publicly, the ID should be chosen in a manner that minimizes the chance of collision with other apps (e.g. generated randomly or by a truncated cryptographic hash of the app's name).
When a Tock board (either in-tree or out-of-tree) wants to implement cryptographically-verified app IDs, the board authors must introduce a new TLV element type to store their application ID and and any verification information that may be necessary.
The board must tell the core kernel what types of application ID it is willing to accept. Additionally, it must provide the invent_app_id function, which either invents and returns a new application ID or returns None.
When the core kernel tries to load a process, it will search for the first application ID TLV element of a type the board accepts, and verify that TLV element. If verification fails, then the process will fail to load. If the process binary's TBF headers do not contain a form of app ID that the board accepts, then the core kernel will call invent_app_id to create an app ID for the process. If invent_app_id returns None the process will fail to load.
The system call filter ACLs can contain a map from K-byte application IDs to short IDs (e.g. 32-bit integers). The ACLs can then refer to applications by their short ID rather than their application ID, in order to compress the list size. At runtime, the system call filter will need to search the map to find an application's short ID, then perform authorization checks using the short ID. For performance, the system call filter can cache the short ID in the process' grant region.Storage ACL Compression
The storage system can maintain a map from K-byte application IDs to short IDs (e.g. 32-bit integers). The storage system can then use the short IDs internally, performing lookups as necessary to associate the short IDs with processes at runtime. It can use the grant region to cache the short ID lookup results for a process.
Johnathan Van Why <jrva...@google.com> writes:
> I tried to word the proposal in a way that makes it clear that any of
> those options are fine. I didn't mean to imply a particular choice of
> implementation when I used a capital K for key length, I just used a
> capital K to be consistent with me complexity analysis a few posts
> back
That makes sense. Maybe I was misled by the 'in-RAM data type' section
of the proposal into thinking it would describe a precise implementation
strategy already. Handling the ID as an opaque fixed length byte
sequence makes sense however, and it is probably best to keep such
implementation details as discussed in my last email out of the
design. Thanks for the feedback.
Using a trait-based type system, it might work to couple the
mapping-function from application IDs to short IDs to both the target
subsystem and the ID type, by introducing traits such as
trait StorageId: AppId {
fn storage_id(&self) -> [u8; 8];
}
which can then be used to define the mapping for each individual
subsystem, on an ID type granularity.
Would - under the assumption that you are talking about a "dynamic"
(non-persistent, `f(app_id) -> short_id` signature) app id to short ID
mapping - this solve the issue? Do I understand your proposal and the
issue correctly?
"'Johnathan Van Why' via Tock Embedded OS Development Discussion"
<tock...@googlegroups.com> writes:
> Were you thinking the storage IDs would be computed by taking a 32-bit hash
> of the full app ID? That doesn't work from a security perspective: an
> adversary could produce their own app ID that hashes to the same short ID
> as a target app in order to impersonate that target app to subsystems using
> short IDs.
Indeed that was my initial assumption. I would've hoped that something
along the lines of 64-bit short IDs provide sufficient collision
resistance, given that on security critical systems, apps are verified
beforehand and thereby collisions would've been under the control of the
app's signee anyways. I do however understand the issues with this
approach, and hence don't want to pursue it.
This might make it harder or impossible to have dynamic resource
allocations in peripherals without writable persistent storage. To
circumvent this, either those peripherals do - under consideration of
the security implementations (having collisions) - use a one-way
function to map from application IDs to resources on their own, or use a
static mapping introduced while compiling the kernel. Rereading the
proposal, the way it's written makes this sufficiently clear.
Alistair Francis <alist...@gmail.com> writes:
> As Phil has pointed out we actually have some more use cases here. It
> is possible that you want a second app to access the data. Phil gave
> the example of migrating the data from app A to app B.
>
> It's also possible that we would want to split or merge apps. So what
> used to be two separate apps with their own storage will be merged
> into 1 single app.
This is true. For storage, this single, one-to-one mapping of
applications to respective ids is likely insufficient to cover all of
the persistent storage use cases we want to support.
I'm convinced that this application ID will nonetheless benefit Tock,
since (1) it will enable the cryptographic verification use cases are
required by Jonathan and others, while being agnostic to the precise
mechanisms and (2) it will be sufficient for resource allocations where
a one-to-one mapping is desirable.
Furthermore, this application ID can then be used by the individual
subsystems to grant access using more complex mechanisms by use of more
complex authorization mechanisms, which can grant access based on a
(principal, request, object)
granularity. An example for such a mechanism could be an access control
list. This proposal introduces the principal (app) identification
mechanism only.
> The same problem as above. An appID doesn't fix the syscall filtering
> problem. How do we then specify the filters for each app?
Using a separate table, which this proposal does not want to
introduce. This explains the thread subject, which was my initial idea
with this discussion: "Nonvolatile application storage, part 1:
application ID". In a subsequent thread we can think about the precise
ACL design.
> 48-bytes for every application is a lot of space. That will have to be
> stored in flash and parsed in RAM at some point.
This is referring to an outdated version of the proposal. The new
version. To cite the proposal v3:
On Tue, Oct 27, 2020 at 5:18 PM 'Johnathan Van Why' via Tock Embedded
OS Development Discussion <tock...@googlegroups.com> wrote:
>
> Use Case Analysis
>
> First, I want to perform some more analysis of the uses cases for application IDs that I have identified so far:
>
> Storage: Processes may want to store data into non-volatile storage that can be accessed by future processes that are part of the same application.
As Phil has pointed out we actually have some more use cases here. It
is possible that you want a second app to access the data. Phil gave
the example of migrating the data from app A to app B.
It's also possible that we would want to split or merge apps. So what
used to be two separate apps with their own storage will be merged
into 1 single app.
>
> Key derivation
>
> ID size sensitivity: Smaller IDs would improve key derivation performance, but are not unlikely to be a deal-breaker.
> Does trust span across boots: No, if the key verification is performed by dedicated cryptographic hardware (e.g. H1 does this).
> Requires cryptographic verification: No. Security will be as strong as the app loading mechanism if unverified IDs are used.
>
> IPC
>
> ID size sensitivity: Moderate. Smaller IDs would reduce the size of process binaries that contain application IDs and the RAM usage of processes that manipulate application IDs at runtime.
>
> Does trust span across boots: No, IPC is only between concurrently-running processes.
> Requires cryptographic verification: No, it is an optional security improvement.
>
>
> Proposal v2 Differences
>
> Based on Leon's concerns and the above analysis, I made a second proposal (below). Here are the differences between the second proposal and my first proposal from June 5:
>
> Application IDs are a fixed size rather than variable-size, to make them easier to store in a filesystem.
> Application IDs are no longer optional. I added a mechanism to allow boards to invent application IDs for processes that don't have a recognized ID.
> I expanded the proposal to discuss ACL storage to alleviate concerns about too-large application IDs for some use cases.
>
>
> Proposal v2
>
> Summary
>
> Application IDs are arbitrary 48-byte sequences. Application IDs and the information required to verify them are encoded in a userspace binary's TBF headers. The format of the data in the header is board-specific. Boards are responsible for parsing and validating the application ID header themselves; the core kernel is responsible for storing the computed ID. Although each process binary may have multiple IDs in its TBF headers, each process has 1 application ID. If a process binary does not have a recognized ID in the TBF headers, a board must "invent" an ID for that process to load.
48-bytes for every application is a lot of space. That will have to be
stored in flash and parsed in RAM at some point.
Also, this comes back to the same problem that we don't trust the TBF headers.
Is the idea here that each board will parse headers differently?
Doesn't that now mean that apps are board specific instead of being
architecture specific like they are now (at least on ARM)?
>
> In-RAM data type
>
> The kernel will store the application ID in the Process struct as a Option<&'static [u8; 48]>. Generally, the slice reference will point into the TBF header itself, although that is up to the board.
>
> TLV Element
>
> We introduce a new TLV element type to store application ID information. The TLV is variable-sized, and its contents are board-specific. In most cases, the TLV will contain the application ID, and in many cases it will also contain ID verification information.
I thought we were trying to avoid variable sized headers?
>
> TLV Element Parsing
>
> The board must provide two functions to the core kernel:
>
> decode_app_id, which validates an application ID TLV entry. If the ID is valid it returns the ID, otherwise it returns None.
> invent_app_id, which either invents and returns an application ID for the process or returns None.
>
> When the core kernel tries to load a process, it will use the first application ID that decode_app_id returns (evaluated in the same order as the TLV entries in the TLB header). If it does not find an ID TLV entry or decode_app_id always returns None, it will call invent_app_id to invent an ID for the process. If invent_app_id returns None the kernel will skip loading that process.
>
>
> Syscall Filter ACL Compression
>
> The system call filter ACLs can contain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The ACLs can then refer to applications by their short ID rather than their application ID, in order to compress the list size. At runtime, the system call filter will need to search the map to find an application's short ID, then perform authorization checks using the short ID. For performance, the system call filter can cache the short ID in the process' grant region.
Where do these ACLs come from? This is one of the major use cases for
the appID and I don't see how the filtering actually happens?
>
> Storage ACL Compression
>
> The storage system can maintain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The storage system can then use the short IDs internally, performing lookups as necessary to associate the short IDs with processes at runtime. It can use the grant region to cache the short ID lookup results for a process.
The same problem here, where does the storage ACL come from?
>
>
> Design notes
>
> A 48 byte key is 384 bits. A 384 bit hash provides 128 bits of collision resistance against an adversary that can run Grover's algorithm.
> I made the board responsible for inventing application IDs so that boards that want to verify every application can refuse to invent application IDs. I'm concerned that if we invent IDs outside the board file we'll end up with issues where a board that wants to verify IDs accidentally launches apps with unverified IDs.
> A board can invent IDs by compiling a fixed list of app IDs into its image and returning those IDs in order of app (so the first unverified app would use the first fixed ID, the second the second, etc.). The drawback of this approach is that deploying a new process binary (or disabling an existing process binary) could cause an existing process binary's app ID to change and/or move to a different process binary.
If we are just hard coding IDs in order what advantage do we get from
having the IDs in the first place?
> A board with access to an entropy source could allocate a fixed buffer for invented app IDs in RAM and generate the IDs on the fly. This would avoid confusion across reboots by changing every process binary's app ID at every boot.
Won't that slow down startup waiting for entropy?
> A board could hash a process binary and its location in flash to invent an ID.
That seems like a good option, but couldn't the loader do this instead
of the kernel? Generating a hash of flash could be slow on some
boards.
> decode_app_id and invent_app_id will probably return Result with error enums, not Option, but I left other error cases out of the proposal to keep it understandable.
> The system call filter short IDs and storage short IDs cannot be the same, as the ID allocation is different between them. The system call filter ACLs do not need to dynamically allocate short IDs, as the ACLs themselves are static (presumably compiled into the kernel). On the other hand, the storage system will need to allocate short IDs when new applications are loaded.
The more I think about appIDs the less of a reason I see for them.
Why can we not store the syscall and storage ACLs in the TBF header? A
secure system will do a crypto signature check of the app/TBF header
when loading it so we know they haven't been tampered with.
An unsecure system can just trust the headers. Apps can't change their
headers so they can't give themselves more permissions. If every app
lists it's syscalls we can then always enforce syscall filtering which
would be cool. I can auto-generate a list of all syscalls when
building an app so it should be easy to keep track of.
Secure boot needs to be seperate from appIDs anyway using crypto
signatures. For key derivation we could use the signature as an ID or
a hash of the app in flash. IPC could also have it's own ACL
implementation (like the syscall/storage) or use a hash of the app in
flash as an ID.
Also, I think we might need to re-think the threat model and put more
trust in the TBF header. It looks like all approaches end up with at
least some trust in the TBF header.
On Tue, Nov 3, 2020 at 10:23 AM Leon Schuermann
<le...@is.currently.online> wrote:
>
> Alistair Francis <alist...@gmail.com> writes:
> > As Phil has pointed out we actually have some more use cases here. It
> > is possible that you want a second app to access the data. Phil gave
> > the example of migrating the data from app A to app B.
> >
> > It's also possible that we would want to split or merge apps. So what
> > used to be two separate apps with their own storage will be merged
> > into 1 single app.
>
> This is true. For storage, this single, one-to-one mapping of
> applications to respective ids is likely insufficient to cover all of
> the persistent storage use cases we want to support.
>
> I'm convinced that this application ID will nonetheless benefit Tock,
> since (1) it will enable the cryptographic verification use cases are
> required by Jonathan and others, while being agnostic to the precise
Do you have an example of how an appID would be used for cryptographic
verification?
> mechanisms and (2) it will be sufficient for resource allocations where
> a one-to-one mapping is desirable.
So we will support both a 1:1 mapping and a more full featured ACL?
Now there are two ways to allocate resources in the kernel.
>
> Furthermore, this application ID can then be used by the individual
> subsystems to grant access using more complex mechanisms by use of more
> complex authorization mechanisms, which can grant access based on a
>
> (principal, request, object)
>
> granularity. An example for such a mechanism could be an access control
> list. This proposal introduces the principal (app) identification
> mechanism only.
>
> > The same problem as above. An appID doesn't fix the syscall filtering
> > problem. How do we then specify the filters for each app?
>
> Using a separate table, which this proposal does not want to
> introduce. This explains the thread subject, which was my initial idea
> with this discussion: "Nonvolatile application storage, part 1:
> application ID". In a subsequent thread we can think about the precise
> ACL design.
How can we pick an appID mechanism without any idea of what the ACL
will look like? It seems like we are just picking the first part and
hoping the rest will match up later.
For example, if we go with a security manifest that specifies all apps
and permissions. So instead of a linked list of apps we have a
serialised json file and a list of apps (just an example) why do we
need unique appIDs?
--------------------------
| Security Manifest |
| App1: ... |
| App2: ... |
--------------------------
| App1 |
--------------------------
| App2 |
--------------------------
If the entire bundle is signed and checked before loading what does an
appID give us? We could just use the order of the apps.
If we put the ACLs in the TBF headers do we need appIDs either?
It seems like we have settled on appIDs without a clear use case of
what they let us accomplish that we couldn't do without them.
At first I was all on board with appIDs, but the more I think about it
the less and less use cases I see. There are still fundamental
problems like how do we ensure that they are unique? Do we really want
every board to do it's own thing so that app loading is even less
generic?
>
> > 48-bytes for every application is a lot of space. That will have to be
> > stored in flash and parsed in RAM at some point.
>
> This is referring to an outdated version of the proposal. The new
> version. To cite the proposal v3:
> > We will introduce a new type AppId which is a #[repr(C,align(4))]
> > wrapper around [u8; K].
Yep, sorry. I half wrote my reply on Friday, but then lost internet
access so only got around to sending it today.
>
> Refer to the message <87r1pek376.fsf@zirconium> for potential
> implementation strategies of this _generic_ but not _dynamic_
> application ID length. It should allow for efficient static
> implementations, while still being flexible for different needs and
> their respective ID width requirements.
Doesn't that make the elf2tab, tockloader and kernel implementations
more complex though? To support all these different length options?
Won't these different length options break the ability to run any
architecture compiled app on any board (at least of ARM and RISC-V
maybe one day with PIC)?
On Tue, Nov 3, 2020 at 10:56 AM 'Johnathan Van Why' via Tock Embedded
OS Development Discussion <tock...@googlegroups.com> wrote:
>
> On Tue, Nov 3, 2020 at 8:18 AM Alistair Francis <alist...@gmail.com> wrote:
>>
>> On Tue, Oct 27, 2020 at 5:18 PM 'Johnathan Van Why' via Tock Embedded
>> OS Development Discussion <tock...@googlegroups.com> wrote:
>> >
>> > Use Case Analysis
>> >
>> > First, I want to perform some more analysis of the uses cases for application IDs that I have identified so far:
>> >
>> > Storage: Processes may want to store data into non-volatile storage that can be accessed by future processes that are part of the same application.
>>
>> As Phil has pointed out we actually have some more use cases here. It
>> is possible that you want a second app to access the data. Phil gave
>> the example of migrating the data from app A to app B.
>>
>> It's also possible that we would want to split or merge apps. So what
>> used to be two separate apps with their own storage will be merged
>> into 1 single app.
>
>
> Merging apps is a bit tricky under my proposal. To merge app B into app A, you would need to deploy a new version of app B that adds app A to the ACLs for the storage being merged, then deploy the merged app (and remove app B).
I don't see any mention of how this ACL is set up though. Am I missing
something?
>
> We could make this easier by allowing processes to have multiple app IDs, at the expense of added complexity (including dynamic allocation for app ID references).
Then it isn't really an appID and is more a list of permissions (which
I think is much more useful).
>
>>
>>
>> >
>> > TLV Element Parsing
>> >
>> > The board must provide two functions to the core kernel:
>> >
>> > decode_app_id, which validates an application ID TLV entry. If the ID is valid it returns the ID, otherwise it returns None.
>> > invent_app_id, which either invents and returns an application ID for the process or returns None.
>> >
>> > When the core kernel tries to load a process, it will use the first application ID that decode_app_id returns (evaluated in the same order as the TLV entries in the TLB header). If it does not find an ID TLV entry or decode_app_id always returns None, it will call invent_app_id to invent an ID for the process. If invent_app_id returns None the kernel will skip loading that process.
>> >
>> >
>> > Syscall Filter ACL Compression
>> >
>> > The system call filter ACLs can contain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The ACLs can then refer to applications by their short ID rather than their application ID, in order to compress the list size. At runtime, the system call filter will need to search the map to find an application's short ID, then perform authorization checks using the short ID. For performance, the system call filter can cache the short ID in the process' grant region.
>>
>> Where do these ACLs come from? This is one of the major use cases for
>> the appID and I don't see how the filtering actually happens?
>
>
> The ACLs could be compiled into the kernel image or deployed separately from the kernel with their own cryptographic verification method.
Where these come from seems pretty important. Like I mentioned in
another reply if we used a security manifest I don't see the need for
appIDs. Compiling ACLs into the kernel directly seems like a bad idea
as then we need to change the kernel just to change an ACL.
>
> The filtering is performed by a board component called by the kernel, I think the hooks are already in place for that.
>
>>
>>
>> >
>> > Storage ACL Compression
>> >
>> > The storage system can maintain a map from 384-bit application IDs to short IDs (e.g. 32-bit integers). The storage system can then use the short IDs internally, performing lookups as necessary to associate the short IDs with processes at runtime. It can use the grant region to cache the short ID lookup results for a process.
>>
>> The same problem here, where does the storage ACL come from?
>
>
> I am assuming that storage ACL are part of the storage system. I would expect processes to specify the permissions of data when they ask the storage system to write the data into storage.
If an app can specify it's storage settings then we expose ourselves
to a brute force attack. Imagine a compromised app can just keep
guessing 32-bit storage IDs until it can eventually read the secret
data. That doesn't seem like a good idea.
>
>>
>> >
>> >
>> > Design notes
>> >
>> > A 48 byte key is 384 bits. A 384 bit hash provides 128 bits of collision resistance against an adversary that can run Grover's algorithm.
>> > I made the board responsible for inventing application IDs so that boards that want to verify every application can refuse to invent application IDs. I'm concerned that if we invent IDs outside the board file we'll end up with issues where a board that wants to verify IDs accidentally launches apps with unverified IDs.
>> > A board can invent IDs by compiling a fixed list of app IDs into its image and returning those IDs in order of app (so the first unverified app would use the first fixed ID, the second the second, etc.). The drawback of this approach is that deploying a new process binary (or disabling an existing process binary) could cause an existing process binary's app ID to change and/or move to a different process binary.
>>
>> If we are just hard coding IDs in order what advantage do we get from
>> having the IDs in the first place?
>
>
> The advantage is we would be able to support apps that lack app IDs (backwards compatibility), or whose app IDs are not recognized by the board (cross-board compatibility).
>
> The alternative is making app IDs optional, as I did in my v1 proposal, but that idea wasn't popular.
That shouldn't be in the board though. Why does each board need a
different way of handling backwards compatible apps?
>
>>
>> > A board with access to an entropy source could allocate a fixed buffer for invented app IDs in RAM and generate the IDs on the fly. This would avoid confusion across reboots by changing every process binary's app ID at every boot.
>>
>> Won't that slow down startup waiting for entropy?
>
>
> Yes, but a board is already responsible for its own performance.
>
>>
>> > A board could hash a process binary and its location in flash to invent an ID.
>>
>> That seems like a good option, but couldn't the loader do this instead
>> of the kernel? Generating a hash of flash could be slow on some
>> boards.
>
>
> Hmm, I don't think we've ever had application loaders modify TBF data. I think that idea is somewhat awkward. It would require the application loader to understand *every* TBF header type in order to load an app, as it would need to change offsets into the app (as adding an app ID header would grow the TBF headers). It's also not a concept we can keep forever, as an application loader will not be able to modify the TBF headers of signed apps.
Tockloader can already do this. It can change the size of the TBF
header already to modify the header contents.
>
> Ostensibly that seems okay to me, but I think it would be very awkward.
This is the conclusion we reached in the OpenTitan meeting about how
to handle appIDs. My original appID PR added the idea with elf2tab,
but that got a lot of push back. Instead an idea was to do it in the
loader (tockloader) when we create the apps.
As for signed apps it would work then as well as the bundle would be
signed after we have set the IDs for the apps.
>
>>
>> > decode_app_id and invent_app_id will probably return Result with error enums, not Option, but I left other error cases out of the proposal to keep it understandable.
>> > The system call filter short IDs and storage short IDs cannot be the same, as the ID allocation is different between them. The system call filter ACLs do not need to dynamically allocate short IDs, as the ACLs themselves are static (presumably compiled into the kernel). On the other hand, the storage system will need to allocate short IDs when new applications are loaded.
>>
>> The more I think about appIDs the less of a reason I see for them.
>>
>> Why can we not store the syscall and storage ACLs in the TBF header? A
>> secure system will do a crypto signature check of the app/TBF header
>> when loading it so we know they haven't been tampered with.
>
>
> Who would sign the header? It can't be the application author, because they might be malicious. For OpenTitan, we generally won't trust the application loader, or even fully trust the flash memory itelf.
In the case of OpenTitan it would be whoever signs the kernel and the
apps, so I'm assuming the vendor. The header is just part of the app.
>
>> An unsecure system can just trust the headers. Apps can't change their
>> headers so they can't give themselves more permissions. If every app
>> lists it's syscalls we can then always enforce syscall filtering which
>> would be cool. I can auto-generate a list of all syscalls when
>> building an app so it should be easy to keep track of.
>>
>> Secure boot needs to be seperate from appIDs anyway using crypto
>> signatures. For key derivation we could use the signature as an ID or
>> a hash of the app in flash. IPC could also have it's own ACL
>> implementation (like the syscall/storage) or use a hash of the app in
>> flash as an ID.
>
>
> Using a hash of a process binary as an app ID for IPC will make updating that app very difficult, as other apps would suddenly stop recognizing it.
Ah good point. We can use the Java style name that Tock already uses
for IPC then.
>
>>
>>
>> Also, I think we might need to re-think the threat model and put more
>> trust in the TBF header. It looks like all approaches end up with at
>> least some trust in the TBF header.
>
>
> I disagree, on the basis that:
>
> We don't need to trust the TBF headers stored in non-volatile storage.
But the appID is stored in the TBF headers in non-volatile storage and
you plan on using that?
But now what happens if the government wants to add a second app that
can also do these trusted operations?
>
> Before the kernel loads an app claiming to be app 1, it needs to verify that app is actually app 1 so that app 2 cannot lie and impersonate app 1. It does this by validating the app's signature against app 1's public key. System call filtering lists would then prevent app 2 from doing operations that only app 1 can do, such as controlling the machine's boot sequence.
Ok, but I'm still missing something. The appID is in the TBF header.
The kernel checks the signature to ensure that app1 is signed by the
government's public key then loads it. Why do we then need an appID?
Just for the ACL?
>
>>
>>
>> > mechanisms and (2) it will be sufficient for resource allocations where
>> > a one-to-one mapping is desirable.
>>
>> So we will support both a 1:1 mapping and a more full featured ACL?
>> Now there are two ways to allocate resources in the kernel.
>
>
> These are identification, authentication, and authorization mechanisms, not resource allocation mechanisms.
>
>>
>> >
>> > Furthermore, this application ID can then be used by the individual
>> > subsystems to grant access using more complex mechanisms by use of more
>> > complex authorization mechanisms, which can grant access based on a
>> >
>> > (principal, request, object)
>> >
>> > granularity. An example for such a mechanism could be an access control
>> > list. This proposal introduces the principal (app) identification
>> > mechanism only.
>> >
>> > > The same problem as above. An appID doesn't fix the syscall filtering
>> > > problem. How do we then specify the filters for each app?
>> >
>> > Using a separate table, which this proposal does not want to
>> > introduce. This explains the thread subject, which was my initial idea
>> > with this discussion: "Nonvolatile application storage, part 1:
>> > application ID". In a subsequent thread we can think about the precise
>> > ACL design.
>>
>> How can we pick an appID mechanism without any idea of what the ACL
>> will look like? It seems like we are just picking the first part and
>> hoping the rest will match up later.
>
>
> ACLs are built on top of app IDs, not the other way around.
Agreed, but they should at least be considered when we pick an appID mechanism.
>
>>
>> For example, if we go with a security manifest that specifies all apps
>> and permissions. So instead of a linked list of apps we have a
>> serialised json file and a list of apps (just an example) why do we
>> need unique appIDs?
>>
>> --------------------------
>> | Security Manifest |
>> | App1: ... |
>> | App2: ... |
>> --------------------------
>> | App1 |
>> --------------------------
>> | App2 |
>> --------------------------
>>
>> If the entire bundle is signed and checked before loading what does an
>> appID give us? We could just use the order of the apps.
>
>
> Who signs the bundle? OpenTitan's cryptographic use cases of app IDs require a higher level of trust than we have in the application loader, so it can't be the application loader.
I didn't realise that OT will support different vendors shipping apps.
That does make the problem harder.
>
>>
>> If we put the ACLs in the TBF headers do we need appIDs either?
>
>
> That would require trusting the TBF headers, which we don't want to do, as they might be malicious.
But the appID is in the same header.
>
> We don't need cryptographically-verified app IDs if we trust TBF headers.
>
>>
>> It seems like we have settled on appIDs without a clear use case of
>> what they let us accomplish that we couldn't do without them.
>
>
> I listed 5, and I've yet to see another efficient way to accomplish that.
I still don't fully see how appIDs link to all of them.
>
>>
>> At first I was all on board with appIDs, but the more I think about it
>> the less and less use cases I see. There are still fundamental
>> problems like how do we ensure that they are unique? Do we really want
>> every board to do it's own thing so that app loading is even less
>> generic?
>
>
> I never said they need to be unique. Boards can't all be the same as they will use different cryptographic verification mechanisms. OpenTitan will likely perform cryptographic verification that other platforms can't afford.
That's fine, but why not just have a sign TLV entry that OT uses?
enum Permissions { ReadOnly, ReadWrite }
Owner: [u8]
Name: [u8]
Other users with access: [(app: [u8], Permissions)]
enum Filter { AllowAll, AllowOnly(&[&[u8]]) }
static ACLs: &[(driver: usize, Filter)] = ...;
On Fri, Nov 20, 2020 at 12:31 PM Johnathan Van Why <jrva...@google.com> wrote:
>
> Alistair requested that I explain how my application ID proposal can be used in solutions to each of the five use cases I identified. Here are examples for how to solve each of the use cases (except for Secure Boot, where I instead argue the use case doesn't exist).
Thanks for writing this up.
>
> Proposal v3 Appendix A
>
> Storage
> Note: In this example I omit the short ID mechanism I described earlier to keep the description simpler.
>
> For each piece of data stored in storage, the storage layer would store the following metadata:
>
> enum Permissions { ReadOnly, ReadWrite }
>
>
> Owner: [u8]
>
> Name: [u8]
>
> Other users with access: [(app: [u8], Permissions)]
>
>
> When processes try to access a piece of data in storage, they specify both the owner of the data and the name of the data (different owners can own data with the same name -- this is to avoid the denial-of-service issues I brought up at the core WG call a week or two ago). If the process is the owner of the data (its app ID matches the "owner" byte sequence), then it is automatically granted access. If not, the system scans through the "other users with access" list to see if the app ID matches any of the entries in that list. If so, the permissions in that list are checked against the requested operation (i.e. read or write). If the process' app ID is not found the access is denied.
Where does this permission list come from though? Is it specified by
the app that creates the region? Is it hard coded in the kernel?
So for each piece of data we store an owner, name and list of users
with access? That seems like a lot of overhead for a small flash
storage.
>
> System call filtering
>
> The kernel image contains the following data structure:
>
> enum Filter { AllowAll, AllowOnly(&[&[u8]]) }
> static ACLs: &[(driver: usize, Filter)] = ...;
>
>
> When a process makes a system call, the system call filter scans through ACLs searching for the driver number. If it does not find the driver number, it denies the request. If it finds the driver number, along with Filter::AllowAll, it allows the syscall. If it finds the driver number along with Filter::AllowOnly, it scans through the list inside AllowOnly checking for the app ID. If it finds the app ID, the syscall is allowed, otherwise it denies the syscall.
The same question. I like this list, but I don't understand where it
comes from. Does it have to be hard coded in the kernel?
I also don't see why this wouldn't work with the method I mentioned.
For syscall filtering for example. Each TBF header lists the syscalls
the app is allowed to do. The kernel can then use this list exactly as
you mention above. We now have an easy way for apps to specify
permissions. Although note, this is not secure enough for OpenTitain.
For OpenTitan and other secure use cases the kernel can also have it's
own list, exactly like you mention above. Then the kernel can compare
the app TBF header permissions with the one it already has. This
provides the same security you are mentioning here, but also allows
boards/users who don't need the extra complexity to just use apps.
>
> Secure boot
>
> I no longer think this use case is important. For secure boot to be useful, you would need a kernel that is cryptographically verified by a bootloader, but not verify the apps with the same bootloader. I would expect anyone wanting secure boot in Tock to just verify the entire image (kernel + apps) via the bootloader.
>
> Key derivation
>
> When a process asks for the encryption key named "key1", the kernel feeds the following data into the key derivation function: the hardware's secret key, "key1", and the process' app ID. This produces an encryption key that is unique to that hardware, the name "key1" (so the app can ask for multiple different encryption keys), and the application.
Agreed.
>
> IPC
>
> When a process receives a message via the IPC capsule, the IPC capsule writes the application ID of the process that sent the message into an allow-ed buffer. The app can then use that information as it wishes, such as to separate data between apps (e.g. an app that implements UDP can route packets to other apps based on the port number) or only accept messages from another app it trusts.
>
> When a process sends a message via the IPC capsule, it specifies the app ID of the process that it wants to send the message to. The IPC capsule routes the message appropriately.
>
> IPC is the one use case where having distinct app IDs (no two processes concurrently executing on a Tock system may have the same app ID) is beneficial. If we allow duplicate app IDs, here is a few ways the IPC capsule can route requests:
>
> It could broadcast the message to all processes with that app ID
This seems like a way for a malicious app to eavesdrop on messages.
> It could return an error, forcing processes that want to use IPC to have their own app ID
The app that is sending data can't really handle that error though.
This will allow a DOS from a malicious appID.
> It could identify processes by a combination of app ID and a second, non-cryptographically-verified identified encoded in the TBF header.
I think IPC is hard to do without unique IDs enforced on the board, at
least to do securley.
On Thu, Dec 3, 2020 at 12:40 PM Johnathan Van Why <jrva...@google.com> wrote:
>
> On Thu, Dec 3, 2020 at 11:39 AM Alistair Francis <alist...@gmail.com> wrote:
>>
>> On Fri, Nov 20, 2020 at 12:31 PM Johnathan Van Why <jrva...@google.com> wrote:
>> >
>> > Alistair requested that I explain how my application ID proposal can be used in solutions to each of the five use cases I identified. Here are examples for how to solve each of the use cases (except for Secure Boot, where I instead argue the use case doesn't exist).
>>
>> Thanks for writing this up.
>>
>> >
>> > Proposal v3 Appendix A
>> >
>> > Storage
>> > Note: In this example I omit the short ID mechanism I described earlier to keep the description simpler.
>> >
>> > For each piece of data stored in storage, the storage layer would store the following metadata:
>> >
>> > enum Permissions { ReadOnly, ReadWrite }
>> >
>> >
>> > Owner: [u8]
>> >
>> > Name: [u8]
>> >
>> > Other users with access: [(app: [u8], Permissions)]
>> >
>> >
>> > When processes try to access a piece of data in storage, they specify both the owner of the data and the name of the data (different owners can own data with the same name -- this is to avoid the denial-of-service issues I brought up at the core WG call a week or two ago). If the process is the owner of the data (its app ID matches the "owner" byte sequence), then it is automatically granted access. If not, the system scans through the "other users with access" list to see if the app ID matches any of the entries in that list. If so, the permissions in that list are checked against the requested operation (i.e. read or write). If the process' app ID is not found the access is denied.
>>
>> Where does this permission list come from though? Is it specified by
>> the app that creates the region? Is it hard coded in the kernel?
>
>
> It is specified by the app that creates the region.
Ok, this seems like a good idea. Can the list of users be changed?
>
>>
>> So for each piece of data we store an owner, name and list of users
>> with access? That seems like a lot of overhead for a small flash
>> storage.
>
>
> Yes.
That is a large amount of overhead. Assuming a 64-bit appID, 64-bit
name and 1 other user we are at around 25 bytes of overhead for every
object. Not including the key (for a KV store), CRCs or anything else.
>
>>
>> >
>> > System call filtering
>> >
>> > The kernel image contains the following data structure:
>> >
>> > enum Filter { AllowAll, AllowOnly(&[&[u8]]) }
>> > static ACLs: &[(driver: usize, Filter)] = ...;
>> >
>> >
>> > When a process makes a system call, the system call filter scans through ACLs searching for the driver number. If it does not find the driver number, it denies the request. If it finds the driver number, along with Filter::AllowAll, it allows the syscall. If it finds the driver number along with Filter::AllowOnly, it scans through the list inside AllowOnly checking for the app ID. If it finds the app ID, the syscall is allowed, otherwise it denies the syscall.
>>
>> The same question. I like this list, but I don't understand where it
>> comes from. Does it have to be hard coded in the kernel?
>
>
> Yes, it would be hardcoded in the kernel.
So now apps can not be updated independently from the kernel. That
seems like a large downside.
>
>>
>> I also don't see why this wouldn't work with the method I mentioned.
>>
>> For syscall filtering for example. Each TBF header lists the syscalls
>> the app is allowed to do. The kernel can then use this list exactly as
>> you mention above. We now have an easy way for apps to specify
>> permissions. Although note, this is not secure enough for OpenTitain.
>>
>> For OpenTitan and other secure use cases the kernel can also have it's
>> own list, exactly like you mention above. Then the kernel can compare
>> the app TBF header permissions with the one it already has. This
>> provides the same security you are mentioning here, but also allows
>> boards/users who don't need the extra complexity to just use apps.
>
>
> That doesn't work if certification constraints prevent you from deploying a new kernel for each version of your app (how does the kernel know which app corresponds to each entry in its internal permissions list?).
I don't understand. What certification constraints allow you to
hardcode the list in the kernel (like described above), but not check
that list against an app?
The kernel would know exactly the same way you describe above, using the appID.
For example a CTAP app is exposed over USB and has access to sensitive
secrets. What if a compromise over USB allows the app to change the
permissions, then a malicious app can read the secrets.
Actually, on top of that, what if a malicious attacker just changes
the app code to allow more permissions in the offline flash? I'm
assuming apps will be signed, which should prevent that, but at which
point why not move it to the TBF header so it isn't run-time chanable?
>
>>
>> >
>> >>
>> >> So for each piece of data we store an owner, name and list of users
>> >> with access? That seems like a lot of overhead for a small flash
>> >> storage.
>> >
>> >
>> > Yes.
>>
>> That is a large amount of overhead. Assuming a 64-bit appID, 64-bit
>> name and 1 other user we are at around 25 bytes of overhead for every
>> object. Not including the key (for a KV store), CRCs or anything else.
>>
>> >
>> >>
>> >> >
>> >> > System call filtering
>> >> >
>> >> > The kernel image contains the following data structure:
>> >> >
>> >> > enum Filter { AllowAll, AllowOnly(&[&[u8]]) }
>> >> > static ACLs: &[(driver: usize, Filter)] = ...;
>> >> >
>> >> >
>> >> > When a process makes a system call, the system call filter scans through ACLs searching for the driver number. If it does not find the driver number, it denies the request. If it finds the driver number, along with Filter::AllowAll, it allows the syscall. If it finds the driver number along with Filter::AllowOnly, it scans through the list inside AllowOnly checking for the app ID. If it finds the app ID, the syscall is allowed, otherwise it denies the syscall.
>> >>
>> >> The same question. I like this list, but I don't understand where it
>> >> comes from. Does it have to be hard coded in the kernel?
>> >
>> >
>> > Yes, it would be hardcoded in the kernel.
>>
>> So now apps can not be updated independently from the kernel. That
>> seems like a large downside.
>
>
> The system call filtering use case only makes sense if some apps (that the kernel knows about up front) are privileged relative to other apps (which the kernel may not know about up front).
>
> Even if the app list is hardcoded in the kernel, privileged apps can be updated independently from the kernel as long as their app ID remains the same.
True, but also only if their syscalls don't change.
>
> It seems to me implementing that isolation in a storage layer requires the storage layer to understand application IDs. The process-based isolation that we've used so far does not work for storage because storage must enforce isolation across reboots. I think any definition of a "storage ID" that attempts to provide that isolation would in fact be an application ID in disguise.
I agree that they end up being similar. The only possible differences
between a "storage ID" and application ID would be uniqueness
constraints, lengths and the ability to change owners. A storage ID
could change (to change the app that owns the data) while maintaining
the same appID.
I don't think that's very important though, as long as we have the
ability to provide permissions to others I think we are mostly ok
here.
I'm guessing there will be a translation between the full appID and a
short ID used inside the board?