George Neuner <
gneu...@comcast.net> writes:
>On Sun, 18 Jul 2021 15:55:24 GMT,
an...@mips.complang.tuwien.ac.at
>(Anton Ertl) wrote:
>
>>George Neuner <
gneu...@comcast.net> writes:
>>
>>>The problem - at least with current hardware - is that programmers are
>>>much better at identifying what CAN be done in parallel than what
>>>SHOULD be done in parallel.
>>
>>You make it sound as if that's a problem with the programmers, not
>>with the hardware. But it's fundamental to programming (at least in
>>areas affected by the software crisis, i.e., not supercomputers), so
>>it has to be solved at the system level (i.e., hardware, compiler,
>>etc.).
>
>It /IS/ a problem with the programmers. The average "developer" now
>has no CS, engineering, or (advanced) mathematics education, and their
>programming skills are pitiful - only slightly above "script kiddie".
>This is an unfortunate fact of life that I think too often is lost on
>some denizens of this group.
One could have an interesting discussion about that, but that's
besides the point wrt the parallelization problem. Even if the
developers have all the education one could wish for, if they have to
produce a maintainable program for a big problem (resulting in a big
program) with the minimal development effort, they will divide the
problem into subproblems and divide the program into parts for dealing
with the subproblems etc. But parallelization with the current cost
structure has to be done for the whole program and cannot be
subdivided in the same way.
>Given the ability to create "parallel" tasks, an /average/ programmer
>is very likely to naively create large numbers of new tasks regardless
>of resources being available to actually execute them.
Yes, if you tell them to create parallel tasks. And good programmers
will do so, too, unless you tell them that efficient parallelization
is more important than maintainability.
>Which maybe is fine if the number of tasks (relatively) is small, or
>if many of them are I/O bound and the use is for /concurrency/. But
>most programmers do not understand the difference between "parallel"
>and "concurrent", and too many don't understand why spawning large
>numbers of tasks can slow down the program.
Sure. That's the way to write parallel programs that is in line with
the divide-and-conquer approach we have established for writing
programs for big problems. So if it slows down programs, the solution
is not to tell the programmers not to do that, but to make systems
that run such programs efficiently. E.g., have hardware where having
many more tasks than hardware threads does not slow down the program.
Or have a compiler and run-time system that combines the many tasks
written by the programmer into so few intermediate tasks that the
overheads of having more tasks than threads play little role. Or
both.
>>Why is it fundamental? Because we build maintainable software by
>>splitting it into mostly-independent parts. Deciding how much to
>>parallelize on current hardware needs a global view of the program,
>>which programmers usually do not have; and even when they have it,
>>their decisions will probably be outdated after a while of maintaining
>>the program.
>>
>>We have similar problems with explicitly managed fast memory, which is
>>why we don't see that in general-purpose computers; instead, we see
>>caches (a software-crisis-compatible variant of fast memory).
>
>We have similar problems with programmer managed dynamic allocation.
>All the modern languages use GC /because/ repeated studies have shown
>that average programmers largely are incapable of writing leak-proof
>code without it.
Good example. Garbage collection is a good solution to the dynamic
memory allocation problem, including for good programmers. Now we
need such a solution for the parallelization problem.
>>Yet another problem of this kind is fixed-point scaling. That's why
>>we have floating-point.
>
>And the same people who, in the past, would not have understood the
>issues of using fixed-point now don't understand the issues of using
>floating point.
Sure, FP has its pitfalls, but it's possible to write, say, a general
FP matrix multiplication subroutine (and the pitfalls of FP typically
don't play much role in that), while for fixed-point you would have to
write one with the right scaling for every application it is used in;
or maybe these days have a templated C++ library, and instantiate it
with the appropriate scalings for each use.
>>So what do we need of the system? Ideally having more parallel parts
>>than needed should not cause a slowdown. This has two aspects:
>>
>>1) Thread creation and destruction should be cheap.
>>
>>2) The harder part is memory locality: Sequential code often works
>>very well on caches because it has a lot of temporal and spatial
>>locality. If the code is split into more tasks than necessary, how do
>>we avoid losing locality and thus losing some of the benefits of
>>caching?
>
>Agreed! But this has little to do with any of my points.
But it has to do with the parallelization problem, and more realistic
solutions for it than perfect programmers with unlimited time on their
hands.