I hope I won't derail this thread, but I'm interested in how you choose here.
Wouldn't the more important question be which of the two models you can
best support in the tool chain (ie. which is easier for the compiler to exploit)?
Thanks
Tommy
Why is SIMD inherently tighter in the FU matrix than a VPU? Assuming an SSE-style SIMD
with private registers, it would be fairly decoupled except for the explicit instructions to transport
data between RISC-V registers and the co-processor.
> But how good or bad a compromise depends on the workload. If we have a predominantly
> non-vector workload and some occasional SSE type code, then it is probably not worth the hassle
> of a VP.
Yes, I meant to write that; some workloads (eg. media) may favor short, low latency SIMD vectors,
whereas VP supposedly have higher throughput at the cost of vector latency. Maybe it would be
possible to design a hybrid that could support both models.
Tommy
At the ISA level, a major distinction is that code generation for
vector machines is ignorant of the hardware vector length, whereas
subword-SIMD codegen requires knowledge of the vector length for
correctness. In the former case, binary code can leverage longer
vectors that future implementations might provide.
But back to the matter at hand: we intend to release the Hwacha source
code in 2015. We don't, at UCB, have plans to implement a RISC-V
subword-SIMD extension.
And as another example, the Cray-1 vector unit was much more tightly
coupled than the Intel or ARM SIMD extensions of current
implementations.
A common fallacy is that packed-SIMD extensions are lower latency than
true vectors for short application vectors. The opposite is usually
true, especially if the length and alignment of the short vectors is
not known at compile time.
| But back to the matter at hand: we intend to release the Hwacha
| source code in 2015. We don't, at UCB, have plans to implement a
| RISC-V subword-SIMD extension.
But we probably will have some thoughts on the ISA spec.
Krste
Very interesting paper, thanks for the pointer.
I do have an issue with the implicit premise though. I worked in the CUDA group for 3.5 years and
while toy examples can be elegant in SPMD, such as CUDA, realistic example quickly becomes
so convoluted that I suspect you would have been better off with different model instead.
Example: http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
Related, many problems map astonishingly poorly to the SPMD model.
I'm sure I'm ignorant of academic literature comparing different models however.
I look forward to your vector ISA and hope that it works well for arbitrary vector lengths.
Regards,
Tommy