On Thu, May 13, 2010 at 4:45 AM, Pete Wilson <
pe...@kivadesigngroupe.com> wrote:
> I think that eschewing intrinsics for stuff like SIMD is a Very Good
> Idea.
>
> Register-SIMD is an idea whose hardware implementation is disarmingly
> simple, but whose programming model poses difficulties.
I'll just note that "Everyone has an unconscious tendency to assume
the kind of programs they write are the kinds of programs everyone
writes". In theory, the kind of short-vector SIMD instructions are
what you want for image processing, image analysis and statistical
modelling that I do; for a lot of things, including writing compilers
or text-servers, they are almost completely useless. Unfortunately
Intel SSE has developed piecemeal implementing the operations they
believed programmers wanted (eg, on an Atom CPU I can use a vector
minimum operation on 16-bit integers but there's no vector minimum on
32-bit integers until SSE4) rather than a complete set of operations,
and the instruction set "limitation" that each operation can only
refer to two registers (removed in SSE5-something) gives some weird
instructions. There are much fewer, but not zero, implementation
idiocies in the better designed ARM-NEON SIMD instructions which makes
programming with them simpler. They are undoubtedly very difficult to
use for general computation, but they were designed for a different
part of the simple data parallel/low chip power budget space.
(The only reason that I'm checking about Go is I'm just checking
various next-generation languages to see if there's a better language
than C++ for building a big interactive image processing and analysis
library. Go is clearly not the appropriate medium for the kind of code
that I will be writing, but I can entirely understand that simplicity
and machine independence are more important for Go. General purpose
languages tend to develop the sprawl of C++, so staying focussed is
good.)
> Given the concepts of goroutines and channels, we'd probably be better
> off supposing the existence of different, but equally-simple, hardware
> that supported explicit parallelisation of loops (into goroutines) or
> the automatic vectorisation thereof (into goroutines).
>
> The hardware is straightforward - a swarm of execution units (like in
> SIMD, but lots), but with simple hardware surrounding each unit
> collection, forming a very simple processor (probably with HW support
> for lowcost messaging) with appropriate interprocessor interconnect.
> As described, one spends a bit too much power fetching instructions,
> but there are ways round that.
>
> And when there's no 'vector' work to do, the machines are rather
> useful general-purpose processors - unlike the idiot SIMD units...
I'll just note that I'm not aware of any current chip, particularly
any designed with low power usage in mind, that implements this kind
of circuitry.
Regards,
Orthochronous