However we’re looking at starting a P-extension charter (SIMD).
Chuanhua Chang, from Andes Technologies, is driving the effort. I CC-ed him on this email.
I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions. In that sense a hardware loop instruction might be out of its scope.
Personal interest is for video (processing) and audio applications, which have very different needs, and both have odd bit-size requirements.
There are a few instructions from the abandoned B-extensions that might be useful. Bit-stuffing and bit-selection come to mind.
Richard
Chuanhua Chang, from Andes Technologies, is driving the effort. I CC-ed him on this email.
I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions.
In that sense a hardware loop instruction might be out of its scope.
Personal interest is for video (processing) and audio applications, which have very different needs, and both have odd bit-size requirements.
There are a few instructions from the abandoned B-extensions that might be useful. Bit-stuffing and bit-selection come to mind.
On Fri, Apr 6, 2018 at 8:25 AM, Richard Herveille <richard....@roalogic.com> wrote:
Chuanhua Chang, from Andes Technologies, is driving the effort. I CC-ed him on this email.
Thanks Richard, hello Chuanhua.I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions.
So looking up the definition of packed-register instructions, that's basically MMX, right? If so, i have a very direct question, based on this analysis which was co-written by Andrew Waterman [1] whose opinion is especially highly regarded in the RISC-V community.The fundamental basis of RISC-V is to learn from and not repeat the mistakes of past ISAs. Why would it be in the interests of RISC-V to replicate the mistake of polluting RISC-V ISA with several thousand instructions, all of which require heavy setup, teardown and corner-cases?Was this a question that was put to the Technical Committe chair and vice-chair? Were they aware of Andrew's analysis at the time of the call?
In that sense a hardware loop instruction might be out of its scope.
The SIMD/simple-V proposal [2] would make it an implementor's choice as to whether to implement a hardware loop in parallel, sequential, or a combination of both. Broadcom refer to this as "virtual parallelsim" [3].
Personal interest is for video (processing) and audio applications, which have very different needs, and both have odd bit-size requirements.
Interesting. ok, so the others i can think of include crypto, Tensors (AI), image processing, radar, traditional DSP work and so on.There are a few instructions from the abandoned B-extensions that might be useful. Bit-stuffing and bit-selection come to mind.
Ok so the other lesson from the V-Extension proposal is that adding what in effect turns out to be an entirely new (independent) engine is an awful lot of work that makes it extremely difficult for interested parties to consider contributing. Is this something that is intended to be taken into consideration?l.[1] https://www.sigarch.org/simd-instructions-considered-harmful/
[3] https://docs.broadcom.com/docs/12358545 Figure 2 P17 and Section 3 on P16.
Richard
On 06/04/2018, 08:51, "lkcl" <luke.l...@gmail.com> wrote:
On Thu, Apr 5, 2018 at 8:46 AM, Richard Herveille <richard....@roalogic.com> wrote:
However we’re looking at starting a P-extension charter (SIMD).
Hi Richard, would you be happy to elaborate? As you may have noticed i'm interested to leverage RISC-V for 3D and Video processing, minimising the addition of special function blocks unless strictly necessary (Tile-base z-buffers for 3D, DCT for MPEG, that sort of thing).
Also have you seen the "SIMD is Bad" exposition, which explains that the fixed width of SIMD tends to make a dog's dinner mess due to excessive corner-cases.
l.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDx8%2Ba1Nv-3qLPe_H8FPKZrY_ym7Y3t3EuA3%2BtRBZsnKPw%40mail.gmail.com.
On Fri, Apr 6, 2018 at 8:25 AM, Richard Herveille <richard....@roalogic.com> wrote:
Chuanhua Chang, from Andes Technologies, is driving the effort. I CC-ed him on this email.
Thanks Richard, hello Chuanhua.
I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions.
So looking up the definition of packed-register instructions, that's basically MMX, right? If so, i have a very direct question, based on this analysis which was co-written by Andrew Waterman [1] whose opinion is especially highly regarded in the RISC-V community.
The fundamental basis of RISC-V is to learn from and not repeat the mistakes of past ISAs. Why would it be in the interests of RISC-V to replicate the mistake of polluting RISC-V ISA with several thousand instructions, all of which require heavy setup, teardown and corner-cases?
Was this a question that was put to the Technical Committe chair and vice-chair? Were they aware of Andrew's analysis at the time of the call?
I think you’re jumping to conclusions or at least making premature assumptions. Your points are well taken and valid. However we will not be adding thousands of instructions, at least I hope not! MMX is an implementation of packed-register instructions, specifically one that (re)uses the floating point registers for SIMD. That doesn’t mean it is the only possible implementation.
Andes proposed their extensions https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/vYVi95gF2Mo which uses the integer register file.
The current P-extension placeholder (https://github.com/riscv/riscv-isa-manual/blob/master/src/p.tex) suggests it should target the floating point registers.
My personal preference is to use the integer register file instead of the floating point one, as that would also include designs without a floating point unit. The SIMD extensions should be simple; if you want to run high performance 3D video analysis, AI, or machine learning, then the V-extension is what you’re looking for.
However I would also like to see odd sized bits; e.g. 10,12,24 (video/audio). But again, my preference. Nothing has been discussed, let alone decided yet.
In that sense a hardware loop instruction might be out of its scope.
The SIMD/simple-V proposal [2] would make it an implementor's choice as to whether to implement a hardware loop in parallel, sequential, or a combination of both. Broadcom refer to this as "virtual parallelsim" [3].
It shouldn’t be an option; it’s either there or it isn’t. You can’t have parts of an extension be implemented and others not. Making it optional is a compiler’s nightmare.
My personal preference would be to add a hardware loop, but again … nothing has been discussed yet.
Personal interest is for video (processing) and audio applications, which have very different needs, and both have odd bit-size requirements.
Interesting. ok, so the others i can think of include crypto, Tensors (AI), image processing, radar, traditional DSP work and so on.
Yes. Although traditional DSP would mean parallel load/store (load/store in parallel with arithmetic operation), bit reverse addressing, …
Nothing has been discussed yet, but that might be out of the scope of a general purpose CPU like RISC-V. But maybe it isn’t …
There are a few instructions from the abandoned B-extensions that might be useful. Bit-stuffing and bit-selection come to mind.
Ok so the other lesson from the V-Extension proposal is that adding what in effect turns out to be an entirely new (independent) engine is an awful lot of work that makes it extremely difficult for interested parties to consider contributing. Is this something that is intended to be taken into consideration?
Personal preference again … it definitely should. The V-Extensions sounds like a separate out-of-order engine in parallel to the CPU (and it probably is). My preference for P-extension is that these instructions are part of the regular instruction flow, very much like the M-extensions.
Cheers,
Richard
The SIMD/simple-V proposal [2] would make it an implementor's choice as to whether to implement a hardware loop in parallel, sequential, or a combination of both. Broadcom refer to this as "virtual parallelsim" [3].
It shouldn’t be an option; it’s either there or it isn’t. You can’t have parts of an extension be implemented and others not. Making it optional is a compiler’s nightmare.
My personal preference would be to add a hardware loop, but again … nothing has been discussed yet.
Personal interest is for video (processing) and audio applications, which have very different needs, and both have odd bit-size requirements.
Interesting. ok, so the others i can think of include crypto, Tensors (AI), image processing, radar, traditional DSP work and so on.
Yes. Although traditional DSP would mean parallel load/store (load/store in parallel with arithmetic operation), bit reverse addressing, …
Nothing has been discussed yet, but that might be out of the scope of a general purpose CPU like RISC-V. But maybe it isn’t …
There are a few instructions from the abandoned B-extensions that might be useful. Bit-stuffing and bit-selection come to mind.
Ok so the other lesson from the V-Extension proposal is that adding what in effect turns out to be an entirely new (independent) engine is an awful lot of work that makes it extremely difficult for interested parties to consider contributing. Is this something that is intended to be taken into consideration?
Personal preference again … it definitely should. The V-Extensions sounds like a separate out-of-order engine in parallel to the CPU (and it probably is).
My preference for P-extension is that these instructions are part of the regular instruction flow, very much like the M-extensions.
I think you’re jumping to conclusions or at least making premature assumptions. Your points are well taken and valid. However we will not be adding thousands of instructions, at least I hope not! MMX is an implementation of packed-register instructions, specifically one that (re)uses the floating point registers for SIMD. That doesn’t mean it is the only possible implementation.
Andes proposed their extensions https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/vYVi95gF2Mo which uses the integer register file.
The current P-extension placeholder (https://github.com/riscv/riscv-isa-manual/blob/master/src/p.tex) suggests it should target the floating point registers.
My personal preference is to use the integer register file instead of the floating point one, as that would also include designs without a floating point unit.
The SIMD extensions should be simple; if you want to run high performance 3D video analysis, AI, or machine learning, then the V-extension is what you’re looking for.
However I would also like to see odd sized bits; e.g. 10,12,24 (video/audio). But again, my preference. Nothing has been discussed, let alone decided yet.
On Fri, Apr 6, 2018 at 12:59 AM, lkcl <luke.l...@gmail.com> wrote:On Fri, Apr 6, 2018 at 8:25 AM, Richard Herveille <richard....@roalogic.com> wrote:
I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions.
The fundamental basis of RISC-V is to learn from and not repeat the mistakes of past ISAs. Why would it be in the interests of RISC-V to replicate the mistake of polluting RISC-V ISA with several thousand instructions, all of which require heavy setup, teardown and corner-cases?Was this a question that was put to the Technical Committe chair and vice-chair? Were they aware of Andrew's analysis at the time of the call?
I can't officially speak for the Technical Committee, but I believe the thinking is as follows: (a) several RISC-V vendors view packed-SIMD as desirable because it can be added to existing processors at low cost (both in silicon and in design effort);
(b) although vectors are vastly superior to packed-SIMD for general-purpose data parallelism, packed-SIMD is certainly sufficient for some applications;
and (c) since people are going to do packed-SIMD no matter what we think, we'd like to minimize ecosystem fragmentation by standardizing how it is done.
My expectation is that the community will quickly coalesce around vectors for general-purpose processors and for high-throughput embedded processing.
Packed SIMD will play an important role in the most area-constrained (but not necessarily the most energy-efficient) embedded processors.
Hopefully that's nuanced enough to avoid starting a religious war...
On Fri, Apr 6, 2018 at 10:18 AM, Andrew Waterman <and...@sifive.com> wrote:On Fri, Apr 6, 2018 at 12:59 AM, lkcl <luke.l...@gmail.com> wrote:On Fri, Apr 6, 2018 at 8:25 AM, Richard Herveille <richard....@roalogic.com> wrote:I had a call with the Technical Committee chair and vice-chair and they made it very clear that the P-extensions were only about packed-register instructions.
The fundamental basis of RISC-V is to learn from and not repeat the mistakes of past ISAs. Why would it be in the interests of RISC-V to replicate the mistake of polluting RISC-V ISA with several thousand instructions, all of which require heavy setup, teardown and corner-cases?Was this a question that was put to the Technical Committe chair and vice-chair? Were they aware of Andrew's analysis at the time of the call?I can't officially speak for the Technical Committee, but I believe the thinking is as follows: (a) several RISC-V vendors view packed-SIMD as desirable because it can be added to existing processors at low cost (both in silicon and in design effort);... with the resultant huuuge proliferation that we all know and love in both Intel (god knows how many instructions but a count of the number of transistors on modern x86s hits the BILLION mark) and ARC (several THOUSAND for their video SIMD extensions) and many more...... all of which is anathemic to the entire principle on which RISC-V was founded. is it possible to beat these abberant RISC-V vendors about the head with corporately-suitably-sized rubber ducks until they get the general message that just because it's always been done that way doesn't mean it has to be done that way *in RISC-V*. we're supposed to be *learning* from the mistakes of the past, not *repeating* them!(b) although vectors are vastly superior to packed-SIMD for general-purpose data parallelism, packed-SIMD is certainly sufficient for some applications;i would agree with that sentiment, very much. However.... looking *at* the proposal that Chuanhua put forward, 95% of it is actually DSP *not* actual SIMD. Which tends to support the thought, "well, if you're going to implement packed-SIMD, why not split out the SIMD bit from the DSP bit?"and (c) since people are going to do packed-SIMD no matter what we think, we'd like to minimize ecosystem fragmentation by standardizing how it is done.Whilst I appreciate that it's a "realistic" perspective, I cannot help but view it as "caving in to external pressure". You've done absolutely amazing things, attracted huge numbers of extremely skilled people round the RISC-V banner, which is based on certain guiding design principles that everyone I've encountered on here I can pretty much say is in awe of, and whole-heartedly supports.Can I therefore recommend standing up to those vendors and say, politely and diplomatically (which counts me out for sure), "scuse me, we don't do it that way in RISC-V if there's any choice in the matter"? Unless anyone else would like to step forward I will be more than happy to take responsibility for coordinating an effort to create a "parallelism extension" that would be suitable for both P and V, augmenting both to provide *both* the "V" in vector *and* the "packed" in P-SIMD, and have begun writing that up here:My expectation is that the community will quickly coalesce around vectors for general-purpose processors and for high-throughput embedded processing.I don't know if you saw my analysis of the [draft 2.3] V-extension, but my feeling is that it's simply too much. The fact that there's only one implementor (Hwacha) and that's ended in 2017 *and hasn't published the RTL* should speak volumes. By spliiting out the actual parallelism, many implementors I think would feel *much* more comfortable with a *general-purpose* parallelism that applies *across all extensions*, whether they be G, M, P-without-the-parallelism or V-without-the-paralleism.
Hwacha doesn't implement the V extension, and in fact the Hwacha ISA is a totally different ISA style from the V extension. So that's just a distraction.
The reason there are no implementations of the V extension because the ISA is still being designed.
--l.
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/4edaba55-e753-4b28-b310-c8e5a3921481%40groups.riscv.org.
Vector units have a tendency to dwarf their companion scalar units. I expect that area-constrained implementations will not support RVV.
lkcl wrote:
(hi andrew thanks for responding, just picking up on richard's reply first)
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5AC83993.10001%40gmail.com.
Packed SIMD will play an important role in the most area-constrained (but not necessarily the most energy-efficient) embedded processors.
Yes I was particularly impressed with the zero-overhead loop mechanism proposed by Chuanhua, and noted (in the simple_v_extension page link above) that it is a really, really easy way to potentially keep a DSP pipeline 100% occupied even in a simple in-order non-super-scalar architecture where it really matters most: the inner loop of an algorithm. Really damn good idea that, which I've never seen in any architecture before (and over the past 30 years I've studied a *lot* of different architectures).