I was going through the C.LOAD / C.STORE section 12.3 of V2.3-Draft
riscv-isa-manual in order to work out how to re-map RVV onto the
standard ISA, and came across an interesting comments at the bottom of
pages 75 and 76:
" A common mechanism used in other ISAs to further reduce
save/restore code size is load-
multiple and store-multiple instructions. "
Fascinatingly, due to Simple-V proposing to use the *standard*
register file, both C.LOAD / C.STORE *and* LOAD / STORE would in
effect be exactly that: load-multiple and store-multiple instructions.
Which brings us on to this comment:
"For virtual memory systems, some data accesses could be resident in
physical memory and
some could not, which requires a new restart mechanism for partially
executed instructions."
Which then of course brings us to the interesting question: how does
RVV cope with the scenario when, particularly with LD.X (Indexed /
indirect loads), part-way through the loading a page fault occurs?
Has this been noted or discussed before?
Is RVV designed in any way to be re-entrant?
What would the implications be for instructions that were in a FIFO at
the time, in out-of-order and VLIW implementations, where partial
decode had taken place?
Would it be reasonable at least to say *bypass* (and freeze) the
instruction FIFO (drop down to a single-issue execution model
temporarily) for the purposes of executing the instructions in the
interrupt (whilst setting up the VM page), then re-continue the
instruction with all state intact?
Or would it be better to switch to an entirely separate secondary
hyperthread context?
Does anyone have any ideas or know if there is any academic literature
on solutions to this problem?
Greatly appreciated,
l.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDyUE7znMRUMuM4hn9wEo%2B5ehNnmkz6dG3eKC%3DAidwCyjA%40mail.gmail.com.
hi andrew, great reply! really informative and helpful, thank you.
need time to absorb and research (and work out implications) e.g. one
must be that it implies that RVV has a means of restarting vector
operations from a specific index.
On Wednesday, April 18, 2018 at 5:20:54 AM UTC+1, lk...@lkcl.net wrote:hi andrew, great reply! really informative and helpful, thank you.
need time to absorb and research (and work out implications) e.g. one
must be that it implies that RVV has a means of restarting vector
operations from a specific index.oh! I just had a thought. Started reading section 4.6 of Krste's thesis, noted the "IEE85 F.P exceptions" and thought, "hmmm that could go into a CSR, must re-read the section on FP state CSRs in RVV 0.4-Draft again" then i suddenly thought, "ah ha! what if the memory exceptions were, instead of having an immediate exception thrown, were simply stored in a type of predication bit-field with a flag "error this element failed"?Then, *after* the vector load (or store, or even operation) was performed, you could *then* raise an exception, at which point it would be possible (yes in software... I know....) to go "hmmm, these indexed operations didn't work, let's get them into memory by triggering page-loads", then *re-run the entire instruction* but this time with a "memory-predication CSR" that stops the already-performed operations (whether they be loads, stores or an arithmetic / FP operation) from being carried out a second time.
Right now, with only one element being executed at a time, it
basically means that several successive page-faults would occur and
that's clearly far more expensive, particularly on LD.X which is the
sparse case that, if
https://en.wikipedia.org/wiki/Translation_lookaside_buffer is to be
believed, would be a huge 20-40% miss rate on the TLB. Being able to
issue *multiple* page-loads before continuing the instruction would
seem to me to be infinitely preferable.
There's also potentially another reason which makes the case for
allowing more than one vector element operation to be executed at the
same time, and that's when Vector takes over from SIMD.
SIMD's "big advantage" is that the parallelism is in the ALU. XTensa
was bought out right around the time when they developed
massively-wide DSP-SIMD instructions for low-power LTE handsets
(trading width for clock rate, width doubling and power reducing on a
square-law, their implementation was much more power-efficient than
anything else on the market at the time).
Fitting the SIMD paradigm into the Vector one therefore requires two steps:
(1) splitting the (128-bit? 256-bit? 512-bit?) SIMD register down
into smaller blocks (possibly as far as 16-bit if that's the base size
of the SIMD element) and then
(2) permitting the core to issue *multiple* elements as actual
hardware-parallel operations to get back to the same level of
throughput *as* the SIMD operation being replaced, before it was
"split".
Now if that's *not possible* because RVV is *designed* to permit one
and only one element to be executed at once, the opportunity is
definitely lost for Vector to be recommended over SIMD [with the
subsequent O(N^5) opcode proliferation that will result). Yes really
O(N^5) I did the analysis a couple days ago! No wonder SIMD
multimedia architectures end up with thousands of unique opcodes.
On Wed, Apr 18, 2018 at 1:28 AM, Luke Kenneth Casson Leighton <lk...@lkcl.net> wrote:Best case you might find (particularly with sequential loads) that one
page-fault covers them all. If it turns out to be several page-faults
that's great because it's then possible to issue multiple page-loads
at once, then *wait* for them all and only then return from the
Exception to repeat the instruction.In either model, the OS has the flexibility to examine the architectural state and decide to fault in multiple pages at once. But the absolute perf. is so low if you're faulting all the time that this is probably doesn't matter too much.
There's little inherent connection between TLB miss rates (even of that magnitude) and page-fault rates. Vector units optimized for scatter-gather have large TLBs with hardware page-table walkers, so exceptions on these workloads will be exceedingly rare.
This conflates the sequential ISA semantics with the microarchitecture. It is straightforward to build vector machines that can process many elements in parallel, despite the precise-exception model.
l.p.s. i found this btw (expired patent, filed 1996, just after 1995 when US law changed to "filing" priority date not *published* priority date, so it's *not* a submarine patent): https://patentimages.storage.googleapis.com/fc/f6/e2/2cbee92fcd8743/US5895501.pdf - It describes three schemes (corresponding to unit, stride and indexed) where the vector execution gives "hints" to the hardware-TLB RTL, ultimately generating page-faults (when needed) *in advance* via a hardware signal line. kinda neat.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b5518ff3-1d69-4269-b216-8c153a16df3d%40groups.riscv.org.
Andrew Waterman wrote:
> In either model, the OS has the flexibility to examine the
> architectural state and decide to fault in multiple pages at once.
> But the absolute perf. is so low if you're faulting all the time that
> this is probably doesn't matter too much.
This suggests an interesting possibility to me: an "svbadaddr" "vector
CSR" that stores, on a vector LOAD or STORE that causes page fault, the
set of faulting addresses. Each element would be either zero or a
faulting address.
The "svbadaddr" vector would need to be inaccessible
to U-mode to prevent the same problem that length-speculative loads
introduced. On the other hand, a supervisor could work this out by
disassembling the faulting vector-load instruction and extracting the
addresses in software. An "SPROBE.VM" instruction that attempts to load
a page mapping and reports either a physical address or an indication
that a page fault would occur (but does not actually take the nested
page fault trap) might be useful here to avoid walking page tables in
software.
Are vector page faults expected to be frequent enough for such batch
processing to be worthwhile?
It remains necessary to support this case correctly, but attempting to accelerate it seems like spending cleverness beans in the wrong place.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/fa832f03-ba6f-4ae0-a0b1-616785f4ba99%40groups.riscv.org.
-- Jacob
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5AD95EF1.2050900%40gmail.com.
If a vector load encounters lots of faults, you're doing something wrong.A structure that is large enough to generate multiple faults probably should be using superpages - and the whole problem is avoided.That's how you accelerate this particular problem - no complicated state machines required. KISS principle. Cleverness beans saved for something more important.Having said that: you don't want to have to re-execute a vector load with a single fault from the beginning, as opposed to where it faulted. There is "probing" version that sets a predicate register bit on a fault.That actually sounds backwards; wouldn't you want to initialize the predicate to all1, and have a predicated version that loads vector elements that have set predicate bits, and have it clear bits that don't fault?I could be misunderstanding how that is supposed to work...In any case, there is still the issue with dealing with page faults in the non-speculative version (live with re-execution?) .
On Sat, Apr 21, 2018 at 12:10 AM, Andrew Waterman
<wate...@eecs.berkeley.edu> wrote:
>
>
> You're conflating the ISA-level semantics and the microarchitecture.
If you recall we went over that, and established a common frame of
reference in which that conflation as a possible source of confusion
was eliminated. Rather than repeat that (because it took both of us
some time to establish) can we assume the same context? (reminder
below of the discussion)
> Nothing prevents the vector microarchitecture from executing in an
> arbitrarily pipelined, parallel, or out-of-order fashion.
That would be a good summary of the context I wished to establish.
> It's no different than the scalar case, where precise exceptions also impose
> sequential semantics,
Yes! because Simple-V effectively and directly "remaps" Vector
operations *into* the scalar space.
> yet OOO microarchitectures can still speculatively
> reorder instructions.
... except... when RVV mandates "sequential semantics", would you
agree that it places artificial limitations on what the OOO
microarchitecture can and cannot do?
If a vector load encounters lots of faults, you're doing something wrong.A structure that is large enough to generate multiple faults probably should be using superpages - and the whole problem is avoided.That's how you accelerate this particular problem - no complicated state machines required. KISS principle. Cleverness beans saved for something more important.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAF4tt%3DDQKTGuPqpOv%2BG88s1ZvV-Ef7uTOq1LBJcC7s0JLL529A%40mail.gmail.com.
l.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDyeX49vqCm6%2B4FHoNoxwMzn6f%3D9CwObDN56dMiPKrUHnA%40mail.gmail.com.