=== Substantive questions and suggestions ===
1. Is there a reason that one cannot run vsetvl with rs2 = x0 to change
the vector length without setting vtype? I don't have a specific use
case in mind for this, but I was surprised that this wasn't the default
behavior.
3. The spec could call out vmadc/vadc and vmsbc/vsbc as macro-op fusion
candidates (in addition to vmulh/vmul and vrem/vdiv, which are not
suggested as such in the vector spec either).
4. Concerning the interaction between the vector extension and the
hypervisor extension: Are hypervisors expected to emulate vector memory
operations in all MMIO regions? I can understand how vector operations
can accelerate the process of talking to real hardware on a real bus,
but as it stands right now, the RISC-V privileged spec does not define
any PMA that marks a region of memory as being accessible only to
scalar memory operations. Thus if a hypervisor wants to emulate a CLINT
and a guest tries to use a vector-store to blanket the SSIP registers
with ones to send software interrupts to every hart, the hypervisor is
obligated to emulate that vector-store instruction. Maybe a fancy
hypervisor is willing to do all that work, but I can imagine legitimate
use cases where hypervisors don't want to do so for certain regions of
memory. This could be addressed by creating a "scalar-only" or better
yet, an "integer-scalar-only" PMA that allows hypervisors to mark
emulated device I/O regions as being accessible only with the RV32I or
RV64I base instructions. Main memory would not be allowed to use this
PMA. Note that I suggested this possibility in a separate email thread
about the hypervisor spec, but nobody has commented on it.
5. Sections 5.2, 5.3, and 7.8 reserve "encodings" for which
- The instruction uses a register number that's not divisible by EMUL
- The source and destination violate overlap constraints
- The instruction cites a group which would have EMUL > 8.
- The destination is v0 and the instruction is masked, but is not
overwriting the destination with another mask.
- The instruction is a load/store-segment that would utilize vector
registers after v31
How does this reservation work? For most instructions, EMUL depends
on the value of the vtype CSR at runtime. An assembler cannot determine
at compile time whether "vwadd.vv v2, v3, v4" violates any of these
constraints. (It violates none of them if LMUL = 1, and three of them
if LMUL = 8.) But for some instructions it can be determined, e.g.
"vadd.vv v0, v1, v2, v0.t" always violates the fourth rule, and the
load/store-segment violations are easy to check on the spot because
NFIELDS and the vector number are both written in the instruction.
Is it within the scope of this specification to mandate which reserved
encodings should be rejected by assemblers (and disassemblers)? The
specification suggests that the constraint on load/store-segment
instructions will eventually be relaxed by adding extra registers, so
it seems as though assemblers might as well accept instructions
violating this constraint even though it's the easiest violation to
detect!
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7a0128b3ec539042%40raines.redjes.us.
I had two more ideas regarding the vector proposal:
15. Define pseudoinstructions vsetvtypei and vsetvtype to clean up
the common case where the vector length isn't changing.
vsetvli x0, x0, e16, m4, ta, ma # Old
vsetvtypei e16, m4, ta, ma # New
vsetvl x0, x0, a4
vsetvtype a4
16. In section 17.1, consider renaming "precise" traps to "restartable"
traps. This terminology corresponds more closely to other documentation
I've seen for other architectures. A "precise" trap should still be one
where the instruction pointed to by the sepc CSR has so far had no
architecturally-visible effects. This change shouldn't require any
changes to other documentation. Section 1.6 of the base spec calls for
"The execution environment determines for each trap whether it is
handled precisely, though the recommendation is to remain preciseness
where possible"---this language is still good even if we call vector
traps "restartable." Impressively, the privileged spec's only
references to precise traps are for specific cases involving PMP and
PMA violations.
Anthony
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7a01291c37300b67%40raines.redjes.us.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/35703ae8-88ba-d2e9-44e1-043fbadb042f%40gmail.com.
Thank you Bruce for posting this on the list.
I couldn't find a substantive point here.
I also have some concerns about the amount of opcode space used relative to utility by some extensions, but NOT for RVV.
Last time I checked RVV is essentially 1.5 major opcodes (out of 32 possible).
This is very economical for the functionality,
especially compared to the 7 major opcodes used for scalar floating point.
Here we agree completely.
Were XFinX extension proposed and implemented first it would not have consumed 7 major opcodes.
The 4 Multiply-ADD variants each consuming a major opcode would not exist.
Rather, It would have been obvious that a destructive form coupled with the C.mv would create a superior solution.
I am not suggesting I saw this at the time. I did not, but I wish I had.
FinX and DinX would then be contained in a single major opcode.
When scalar float implemented in 32 hardware float registers was
introduced, it expectedly would have used a strategy like RVV has
that reduces the load/store offsets. It would not consume 7 major
opcodes, but rather I expect it would be incorporated into OP-FP
[which probably would have been called just FP}.
This is a prime example of unnecessary opcode space over-allocation because we sought to provide standard industry features in the standard industry way.
We should learn from our mistakes, rather than use them as justification to continue to be wasteful, short sighted and entrenched in old ideas and ways.
We have the potential to be so much more.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/20211019190716.59231EC2837%40serpent.at.major2nd.com.
Regarding the Vector extensions, I'm hoping that ds2horner's technical points were addressed at length in the deliberations of the TG that produced the vector extension proposal, and that there is a record of those deliberations. Could someone tell me where to find that record?
https://github.com/riscv/riscv-v-spec/issues is the current location for discussion and resolution of Vector issues.
It is a very good starting place.
Notes on all the Task Group deliberations are publicly available.
Typically they only summarized the decisions recorded in git hub
and are not explanatory.
The location of the TG notes has changed over the years.
I would like to see the links for each of the TGs past and
present reflected in the member's wiki and/or in the directory
documents housed on google drive:
I'm a relative newcomer to RISC-V, and one of the reasons I wanted to participate was that I saw a design that was doing so many things right. However, I have to say that I too feel uneasy at the amount of opcode space taken up by the fused multiply-add instructions. I see in the spec that giving up static rounding modes was considered and rejected, as was forcing rd == rs3, but I didn't see
Early discussions of design decisions prior to gifting the design to the foundation were not public.
To some extend that goes on until today.
What to exclude is a huge list, and the general justification to exclude is often:
1) no one thought to include the concept/construct. There are new ideas every day.
2) the original design goals were oriented to simplicity and thus efficiency in decode/execution.
3) another design goal was architecture acceptance.
Again a simple design that was useful/acceptable to
academia, RISC purists and early adopters [cf. FPGA]
Much was made of the spartan nature of the design, the
base having only 40 instructions.
A large proliferation of instructions, even if
highly encoded and effective, is readily dismissed with that
mindset.
4) if the perceived cost to pursue an idea was high it would be
dismissed early as only limited time is ever available to a small
team.
(for example) consideration of specifying rs3 by combining the 2 high-order bits of rd with 3 explicit low-order bits, which would have reduced the 4 major opcodes to 1,
Early on opcode pressure was not a concern, it was secondary to the concerns above.
Apparently is still not a major concern to many.
RISCV was a builtin opcode expansion design described in detail in Chapter 21 of volume 1 User Manual.
Bits 0 through 6 of the instruction define instruction lengths of 16,32,48 and 64.
Three further bits in the first 16bit instruction package allow multiple lengths in 16bit increments 192 bits.
This is an excellent design inclusion. However, it may have lulled some to think that it resolves opcode pressure concerns.
It does not. 32bit instruction lengths are prime real-estate that should be reserved for the highest value instructions [including static and dynamic use].
as well as possibly playing nicely with the 3-bit register space for the compressed instructions,
RVC, the 16bit compressed encoding provides a good example of an attempt to rigorously determine the best candidates.
This is explicitly because such encoding consumes so much of the opcode space.
Justification of what to include and exclude is well documented.
while hopefully reducing the number of explicit moves significantly. (Of course, without real investigation, I don't know whether this would actually pay off.)
This is perpetually the problem. What application space will benefit, what future advances will nullify our current investigations.
I believe we have to anticipate were the tool chain and applications will be in 10 years and base decisions on such projections.
If we design to past entrenched limitations and current products
we will fail to effectively allocate our limited opcode space
resource.
I recognize that this is off the Vector topic, and too late to consider in any case, but I would appreciate being directed to the record of the discussions that led to the present specification of these instructions. I would participate happily in a thoughtful discussion about the mission of the RISC-V enterprise,
but isa-dev doesn't feel like the right place for it. Is there an existing group where that discussion would be at home?
I don't believe there is.
There is a "todo" Jira ticket that is accessible only to RISCV memebers:
https://jira.riscv.org/browse/RVTS-671 Fully Document Arch Review ProcessHowever, that ticket is the means by which discussion forums can be established for these discussions.
I expect that relevant aspects will be generally available to the
public without a RISCV membership and
an opportunity in such public lists as this will be available for
input from the general public.
Thank you so much for your inputs and participation.
Welcome!
The FMA changes could be in incorporated into XFinX.
It is not too late for that. Just as the Float load/store space is freed up, those 4 major opcodes can be mostly freed.
Once it is in XFinX then a migration to Revised F,D,Q with the efficient FMA encoding is a much simpler transition.
The necessity is to determine such opportunities now and to re-evaluate the worth of the 32bit opcode space [among other assumptions].
How much is correcting past errors worth for RISCV's future.
We can plan and put in place the mechanisms to effect those transitions/corrections when needed.
The better we plan for them now the more readily we can converge to the optimal ISA [and variants in their respective domains]
We can do better in the "lessons learned" category. That will require recognizing and acknowledging what we did suboptimally.