On Wed, Oct 5, 2016 at 8:33 AM, Andrew Lutomirski <
aml...@gmail.com> wrote:
> CPU B has a decent chance of getting an exception, but suppose it gets lucky
> and waits long enough before reading the page that it doesn't get an
> exception. What does it see? What TLB entry is created?
I didn't mention this, but I already said you need page table
population IPIs, and if you're doing that you might as well also have
population IPIs for PGD, PMD, PUD which handles this. Yucky, but with
the "RAW dependencies" change I've mentioned a couple times today it
goes away.
> I think that CPU A is going to need that fence (as a w,w fence) regardless
> of what the final RISC-V memory model says, but there are no relevant fence
> instructions or explicit dependencies at all on CPU B to synchronize with
> it. Unless I'm missing something (and I have no idea how Alpha deals with
> it), there are only two ways to make this safe. Either CPU A needs to send
There's a comment in the kernel source which implies that the TLB fill
PALcode on Alpha contains the necessary acquire fences.
> an IPI before writing a PTE (please, please don't do that. TLB flush IPIs
> are bad enough (and RISC-V should consider following ARM64's lead and adding
> a non-IPI way to handle this), but TLB population IPIs would be far worse),
> or the CPU should implicitly order all accesses involved in TLB fills. That
> is, reads of higher-level paging structures should be ordered before reads
> of lower-level paging structures, and reads of the final PTE should be
> ordered before any reads or writes that use the TLB entry that gets filled.
The page table walk is an address dependency chain so it falls out of
the address dependency change. Would be good to be explicit in the
documentation of course.
> I don't think there's any need to order the first read involved in the TLB
> fill with anything prior -- even x86 permits fully speculative TLB fills,
> and I don't think it's ever caused a problem. *
Here's one way it could break: a single-threaded process on hart A
maps a file, unmaps it and maps something else (Linux elides TLB
shootdown on munmap for single-threaded processes, right?), gets put
to sleep and then migrated to another hart B. If hart B is allowed to
speculatively fill TLBs with no constraints whatsoever, then hart B
might have a stale copy of the process' first mapping, despite never
having seen the SPTBR value before. I think this might be addressable
by forcing SFENCE.VM the first time a given hart schedules a given
process, though.
Speculative instruction caching is potentially worse; a programmer
might assume you can allocate a zeroed page, write instructions into
it, and jump, but the processor is allowed to speculatively cache the
zeros in the page and trap an illegal instruction on the jump if you
don't include a FENCE.I. You can't even fix this up in a SIGILL
handler because you might be using the C extension, and misaligned
32-bit instructions are not fetched atomically so you might wind up
with _half_ of an instruction speculatively cached as zero, which
won't always make it invalid.
Also, the kernel needs to do a global FENCE.I shootdown when it zeros
a page, just in case the page's previous owner wrote sensitive
instructions in it. On Rocket FENCE.I is implemented by throwing out
the entire 32KB first-level I-cache; I think that could get expensive
to do for every hart once per clear_page but I don't feel my intuition
is reliable here.
-s