Luke Kenneth Casson Leighton wrote:
> On Sun, Apr 8, 2018 at 9:46 PM, Andrew Waterman
> <
wate...@eecs.berkeley.edu> wrote:
>
>
>> The cleanest way to do this sort of thing is to multithread the scalar
>> processor and the vector unit. Then the programming model remains
>> conventional: each thread appears to have its own vector unit.
>>
>
> see below: each thread would, under certain workloads, still have
> less than 100% utilisation. *sustained* less than 100% utilisation.
>
>
>> And no ISA changes are necessary.
>>
>
> ok so allow me to come up with a use-case which may demonstrate.
> there may be more. let's say the maximum vector length is 8. let's
> say that the vector being processed is 6 wide, or 3 wide. 3 wide is
> audio 24-bit or perhaps 3D is XYZ. so these are not obscure scenarios
> they're quite likely.
>
I think that I see a small misunderstanding of vectors here. No
surprise, since this is exactly the misunderstanding that SIMD marketing
efforts often promote.
For example, use packed RGB888 image data. An application processing
this data with RVV would configure vectors with U8 element type and use
total vector lengths that are multiples of 3, since the data is 3-tuples
of U8 elements. For RGBA8888, the element type is still U8, but now the
elements can be grouped into 4-tuples. For some operations, like adding
RGB888 buffers together, the tuple boundaries are insignificant. For
other operations, like alpha-compositing RGBA8888 data, the tuple
boundaries are significant and the application must check that the
effective vector length is a multiple of the tuple size and round the
application vector length down if needed.
This suggests possible "vector-tuple" operations, like "tuple insert"
(insert an alpha channel in RGB888 data, or unpack 24-bit RGB888 to
aligned 32-bit pixels), "tuple drop" (remove alpha channel from RGBA8888
data or pack aligned 32-bit pixels to 24-bit RGB888 or extract any
subset of channels from any of these by dropping all other channels),
and "tuple splat" (extend an alpha channel to prepare for scaling the
RGB pixel values). These would probably use vector predicates as
control inputs.
-- Jacob