Re: PKEY-support in PartitionAlloc

40 views
Skip to first unread message

Kentaro Hara

unread,
Sep 7, 2022, 11:48:06 AM9/7/22
to Stephen Röttger, memory-s...@chromium.org, Kostya Serebryany, platform-arc...@chromium.org, Igor Sheludko, Sergei Glazunov, V8 Security, Bartek Nowierski, Benoit Lize, Takashi Sakamoto

I strongly support this proposal.

At the moment, it's not clear to me how widely pkey will be available in the next few years and I was wondering if we should support pkey in PA now or we should revisit in the future. If there is a use case in V8 and you are willing to implement it, why not? :D I believe that the per-partition pkey support can be implemented without adding code complexity.

A few questions:

* reads from unprotected memory (pkey 0) should be rare and need to be handled as untrusted data

Do you really need to allow read from pkey 0? Will the code refactoring in PA make it unnecessary?

Why are you adding the pkey support to PA instead of V8's allocator if your intended use case is a V8 internal thing? I'm just curious -- technically it's possible to export PA's pkey supported partition to V8.


On Thu, Sep 8, 2022 at 12:00 AM Stephen Röttger <sroe...@google.com> wrote:
Hi,

tl;dr: would it make sense to add support for a pkey-protected partition to PA?

Some background:
We're working on CFI for v8 (go/v8-cfi) and one of the bigger challenges is making sure that our compilers don't generate code that leads to arbitrary control-flow even assuming that the attacker has an arbitrary read/write primitive in the renderer address space.
We're planning to add a verification step to the compiler that performs certain checks on the generated code before copying it into executable memory. This will include for example that the code can't include syscall instructions and that we can track the control flow. We explicitly don't care if the code will corrupt memory since we assume that the attacker can already do this.
We want to do this verification step in-process with the Intel Memory Protection Keys. I.e. we will create an isolated mode that you can switch in/out of where all data is protected with a pkey:
* a pkey-protected stack
* some pkey-protected globals
* a pkey-protected heap (or heaps)

When we're in this isolated mode, we have some stricter requirements about what the code should do.
* reads from unprotected memory (pkey 0) should be rare and need to be handled as untrusted data
  * in particular, we can't write to a pointer coming from the unprotected memory

So as mentioned above, we need a pkey-protected heap and I'm trying to figure out if it makes sense to add pkey-support to PA.
I imagine it could look like this:
* a separate PartitionAllocator with a new PartitionOption to control pkey support
* when changing page permissions, also set the requested pkey
* all allocator metadata should also live in pkey-protected memory

I implemented a quick proof-of-concept of this in crrev.com/c/3879104.

With some hacks, I can completely remove access to pkey 0 in that example and do an Alloc() and a Free() call.
Some issues I ran into:
* there is global state that is shared between PartitionAllocators and that I had to pkey_mprotect:
  * sentinel_slot_span_ (this sounds like it could be const / read-only?)
* PartitionAlloc calls out to external libraries, at least in debug mode. There's no guarantee that the code won't touch pkey 0 memory.
  * DebugMemset calls memset which on my machine reads from global variables (this case seems benign for our use case, but will break if disabling read access to pkey 0)
  * Lock debug code calls pthread_self() which reads from the TLS.

So to make this work, it would require:
* moving the required global state into the partition (or making it read-only after initialization)
* avoiding calls to shared libraries when in pkey-mode (in theory we could allow some benign-looking calls, but then we have no guarantee that they don't introduce CFI bypasses)

What do you think? Does this sound like a feature that PartitionAlloc could support?

Cheers,
Stephen


--
Kentaro Hara, Tokyo

Dave Tapuska

unread,
Sep 7, 2022, 12:03:22 PM9/7/22
to Kentaro Hara, Stephen Röttger, memory-s...@chromium.org, Kostya Serebryany, platform-arc...@chromium.org, Igor Sheludko, Sergei Glazunov, V8 Security, Bartek Nowierski, Benoit Lize, Takashi Sakamoto
Are pkeys already used in any code in the renderer already? (I see V8 does use them) Just curious how many pkeys are left to be allocated, could the "one per-partition" be problematic?

dave.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CABg10jwi9h6MZbfcSsfm1Y3Hewp%3D2Uzp2%2B9AWZmwh1rENB6v6Q%40mail.gmail.com.

Stephen Röttger

unread,
Sep 8, 2022, 3:53:58 AM9/8/22
to Dave Tapuska, Kentaro Hara, memory-s...@chromium.org, Kostya Serebryany, platform-arc...@chromium.org, Igor Sheludko, Sergei Glazunov, V8 Security, Bartek Nowierski, Benoit Lize, Takashi Sakamoto, Michael Lippautz
On Wed, Sep 7, 2022 at 6:03 PM Dave Tapuska <dtap...@chromium.org> wrote:
Are pkeys already used in any code in the renderer already? (I see V8 does use them) Just curious how many pkeys are left to be allocated, could the "one per-partition" be problematic?

We only use pkeys in v8 so far for protecting executable memory and AssemblerBuffers. PA wouldn't use a separate pkey per partition, but rather have the caller specify which pkey to use at partition initialization.
 

dave.

On Wed, Sep 7, 2022 at 11:48 AM 'Kentaro Hara' via platform-architecture-dev <platform-arc...@chromium.org> wrote:

I strongly support this proposal.

At the moment, it's not clear to me how widely pkey will be available in the next few years and I was wondering if we should support pkey in PA now or we should revisit in the future. If there is a use case in V8 and you are willing to implement it, why not? :D I believe that the per-partition pkey support can be implemented without adding code complexity.

A few questions:

* reads from unprotected memory (pkey 0) should be rare and need to be handled as untrusted data

Do you really need to allow read from pkey 0? Will the code refactoring in PA make it unnecessary?

If the code is hermetic then you can make it work without pkey 0 read access. Though note that this also includes .rodata. In the demo, I'm iterating over all the elf program headers and am tagging the RO segments with a pkey for that reason.

Why are you adding the pkey support to PA instead of V8's allocator if your intended use case is a V8 internal thing? I'm just curious -- technically it's possible to export PA's pkey supported partition to V8.

We'll need both in the end. A pkey PagedSpace for js heap objects (e.g. code objects) and a pkey system allocator for various other objects we need for the verification. The PagedSpace also keeps some metadata on the heap, so we can also use pkey-PA for those.

Does it sound feasible to you to move the global state into the PartitionAllocator or do you think that could lead to performance or other issues?
Do you have a perf benchmark I can use to test any changes?

I saw that Bartek is OOO, I'll wait for his return before I continue experimenting with this.

Kostya Serebryany

unread,
Sep 9, 2022, 7:37:21 PM9/9/22
to Stephen Röttger, Dave Tapuska, Kentaro Hara, memory-s...@chromium.org, platform-arc...@chromium.org, Igor Sheludko, Sergei Glazunov, V8 Security, Bartek Nowierski, Benoit Lize, Takashi Sakamoto, Michael Lippautz
In addition to everyone's reasons, I support the idea of 
pkey-based domains in PA for one more reason: 
we need to ensure that similar future hardware features work for us, before it's too late to change them. 
 * Arm's security overlays (similar to pkeys)
 * go/va-thread-keys, which we are about to propose to RISC-V. 

Bartek Nowierski

unread,
Sep 20, 2022, 10:04:14 PM9/20/22
to Kostya Serebryany, Stephen Röttger, Dave Tapuska, Kentaro Hara, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, Sergei Glazunov, V8 Security, Benoit Lize, Takashi Sakamoto, Michael Lippautz
Sorry for the late reply...

I implemented a quick proof-of-concept of this in crrev.com/c/3879104.

With some hacks, I can completely remove access to pkey 0 in that example and do an Alloc() and a Free() call.
Some issues I ran into:
* there is global state that is shared between PartitionAllocators and that I had to pkey_mprotect:
Sorry, I don't see you doing this in your CL. Anyway, am I reading it correctly that this is merely meant as a hack and we need a proper solution to split the below global state per-partition?
 
  * sentinel_slot_span_ (this sounds like it could be const / read-only?)
Indeed read-only, and should be able to easily make it per-partition anyway.
These for tracking virtual address pool usage. Multiple partitions can allocate from the same pool. I don't see what we could do about this on 32-bit systems (are pkeys even available there?), but on 64-bit systems we could create a separate pool just for this. Note, that we can't pre-reserve an arbitrary amount of memory on Android so that may not fly there, unless the pool is meant to be small?

Note also that by switching pools you'll lose BRP protection. If you want to keep it, we'd have to add an additional check to IsManagedByPartitionAllocBRPPool() that is on a very hot path.

 
  * reservation_offset_tables_
On 64-bit, these are per pool, so if we split the pools the problem will go away. Otherwise, we could make a per-root copy... it's 16kB.
 
* PartitionAlloc calls out to external libraries, at least in debug mode. There's no guarantee that the code won't touch pkey 0 memory.
  * DebugMemset calls memset which on my machine reads from global variables (this case seems benign for our use case, but will break if disabling read access to pkey 0)
  * Lock debug code calls pthread_self() which reads from the TLS.
Not sure what to do about those.

Your mention of TLS reminded me you'll be losing thread-cache support for this partition, which helps performance. Are these allocations currently handled by malloc/new? If not, then nothing to worry about.


There is also a PartitionRoot object itself. You'd have to make sure to allocate it from a pkey-protected memory, so I guess get your own page and use placement `new`.


Cheers,
Bartek

Stephen Röttger

unread,
Sep 21, 2022, 12:21:53 PM9/21/22
to Bartek Nowierski, Kostya Serebryany, Dave Tapuska, Kentaro Hara, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, Sergei Glazunov, V8 Security, Benoit Lize, Takashi Sakamoto, Michael Lippautz
On Wed, Sep 21, 2022 at 4:04 AM Bartek Nowierski <bar...@google.com> wrote:
Sorry for the late reply...

I implemented a quick proof-of-concept of this in crrev.com/c/3879104.

With some hacks, I can completely remove access to pkey 0 in that example and do an Alloc() and a Free() call.
Some issues I ran into:
* there is global state that is shared between PartitionAllocators and that I had to pkey_mprotect:
Sorry, I don't see you doing this in your CL. Anyway, am I reading it correctly that this is merely meant as a hack and we need a proper solution to split the below global state per-partition?

That's happening in the PkeyProtectPartitionAllocMemory() in the papku.cc. But yes, it's just a hack. If it's part of the partition state then you'd just have the PartitionAllocator object living in a pkey-protected section (like the IsolatedGlobals at the top of that file).
 
 
  * sentinel_slot_span_ (this sounds like it could be const / read-only?)
Indeed read-only, and should be able to easily make it per-partition anyway.
 
  * AddressPoolManager::singleton_
  * PartitionAddressSpace::setup_
These for tracking virtual address pool usage. Multiple partitions can allocate from the same pool. I don't see what we could do about this on 32-bit systems (are pkeys even available there?), but on 64-bit systems we could create a separate pool just for this. Note, that we can't pre-reserve an arbitrary amount of memory on Android so that may not fly there, unless the pool is meant to be small?

I don't know of any 32 bit devices with pkey support and I found this lwn article from 2015 that said "it will only be available in future 64-bit Intel processors". Similarly, since ARM doesn't support pkeys we probably don't have to worry about Android atm?

Note also that by switching pools you'll lose BRP protection. If you want to keep it, we'd have to add an additional check to IsManagedByPartitionAllocBRPPool() that is on a very hot path.

Do you mean losing BRP protection for the pkey partition or for all partitions?

  * reservation_offset_tables_
On 64-bit, these are per pool, so if we split the pools the problem will go away. Otherwise, we could make a per-root copy... it's 16kB.
 
* PartitionAlloc calls out to external libraries, at least in debug mode. There's no guarantee that the code won't touch pkey 0 memory.
  * DebugMemset calls memset which on my machine reads from global variables (this case seems benign for our use case, but will break if disabling read access to pkey 0)
  * Lock debug code calls pthread_self() which reads from the TLS.
Not sure what to do about those.

It seems ok if these calls are limited to debug mode. We could just skip the calls for pkey-partitions.
But I imagine this could introduce some headaches on your side since this requirement for the code to be hermetic is quite unusual. So whenever you introduce a new call to external code in the future, it would need to be guarded by an if (!pku_partition).

Your mention of TLS reminded me you'll be losing thread-cache support for this partition, which helps performance. Are these allocations currently handled by malloc/new? If not, then nothing to worry about.
Yeah, these allocations shouldn't be performance sensitive since changing the pkru state is already not super cheap.

There is also a PartitionRoot object itself. You'd have to make sure to allocate it from a pkey-protected memory, so I guess get your own page and use placement `new`.

That's just embedded in the PartitionAllocator class right (this)? In my demo, I put it in a `struct alignas(PAGE_SZ) IsolatedGlobals { }` which I pkey-tag after initialization.

Stephen Röttger

unread,
Sep 23, 2022, 4:57:48 AM9/23/22
to Bartek Nowierski, Kostya Serebryany, Dave Tapuska, Kentaro Hara, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, Sergei Glazunov, V8 Security, Benoit Lize, Takashi Sakamoto, Michael Lippautz
I prototyped these changes in https://chromium-review.googlesource.com/c/chromium/src/+/3913506 and it seems to work.
I.e. that includes the following:
* move reservation_offset_table_ inside the pool
* introduce a new pkey pool
* pad pools_ array so that the last element (pkey pool) lands on a new page
  => this is not very elegant but saves having conditional checks in the code. Not sure if this is a hot path?
* pkey_mprotect all metadata used by the pkey pool (need to page align some things)
* make the GigaCageSetup data read-only after initialization
* add a debug-only LiftRestrictionScope object that can be used to temporarily remove all pkey restrictions (used in memset and lock debug code)
* make sentinel_slot_span_ const
* disable -fstack-protector
  => the alternative would be having a part of the TLS be tagged with a pkey. E.g. the stack cookie and other read-only data could live on a separate page in the TLS. But this would require compiler support.

What do you think about this? If it makes sense to you, I can clean it up, write tests etc.
Do you have a benchmark I can run to see if the memory layout changes introduce any performance regressions?

Kentaro Hara

unread,
Sep 23, 2022, 7:54:17 AM9/23/22
to Stephen Röttger, Bartek Nowierski, Kostya Serebryany, Dave Tapuska, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, Sergei Glazunov, V8 Security, Benoit Lize, Takashi Sakamoto, Michael Lippautz
I'd suggest you run blink_perf.bindings.* and Speedometer2.

--
Kentaro Hara, Tokyo

Dave Hansen

unread,
Sep 23, 2022, 12:30:19 PM9/23/22
to memory-safety-dev, Kentaro Hara, Bartek Nowierski, Kostya Serebryany, Dave Tapuska, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, glaz...@google.com, V8 Security, Benoit Lize, ta...@google.com, Michael Lippautz, sroe...@google.com
Hi Folks,

I don't know of any Intel CPUs that are 32-bit only and support PKU.  Second, the two Intel 32-bit paging modes ("32-bit Paging" and "PAE Paging") don't even have space defined in their page tables to store pkeys.  Third, the Linux support is 64-bit only (depends on X86_64 at compile-time).

I think it's pretty safe to assume that pkeys are a 64-bit-only thing on x86.  It's _possible_ that AMD did something different with their CPUs, but I really doubt it.

Dave Hansen

unread,
Sep 23, 2022, 1:22:11 PM9/23/22
to memory-safety-dev, Dave Hansen, Kentaro Hara, Bartek Nowierski, Kostya Serebryany, Dave Tapuska, memory-s...@chromium.org, platform-architecture-dev, Igor Sheludko, glaz...@google.com, V8 Security, Benoit Lize, ta...@google.com, Michael Lippautz, sroe...@google.com
Oh, and one other thing.  The 64-bit-only comment is only about what *kernel* is running.  It's perfectly supported to run a 32-bit userspace application on a 64-bit kernel and use pkeys.  The kernel even has a selftest/ for both 32-bit and 64-bit binaries to make sure we don't break either.

Stephen Röttger

unread,
Sep 27, 2022, 7:52:13 AM9/27/22
to Dave Hansen, memory-safety-dev, Kentaro Hara, Bartek Nowierski, Kostya Serebryany, Dave Tapuska, platform-architecture-dev, Igor Sheludko, glaz...@google.com, V8 Security, Benoit Lize, ta...@google.com, Michael Lippautz
Thank you Dave!

On Fri, Sep 23, 2022 at 7:22 PM Dave Hansen <hans...@gmail.com> wrote:
Oh, and one other thing.  The 64-bit-only comment is only about what *kernel* is running.  It's perfectly supported to run a 32-bit userspace application on a 64-bit kernel and use pkeys.  The kernel even has a selftest/ for both 32-bit and 64-bit binaries to make sure we don't break either.

On Friday, September 23, 2022 at 9:30:19 AM UTC-7 Dave Hansen wrote:
Hi Folks,

I don't know of any Intel CPUs that are 32-bit only and support PKU.  Second, the two Intel 32-bit paging modes ("32-bit Paging" and "PAE Paging") don't even have space defined in their page tables to store pkeys.  Third, the Linux support is 64-bit only (depends on X86_64 at compile-time).

I think it's pretty safe to assume that pkeys are a 64-bit-only thing on x86.  It's _possible_ that AMD did something different with their CPUs, but I really doubt it.

On Friday, September 23, 2022 at 4:54:17 AM UTC-7 Kentaro Hara wrote:
I'd suggest you run blink_perf.bindings.* and Speedometer2.

Reply all
Reply to author
Forward
0 new messages