On 2019-02-21 10:15:52 +0000,
already...@yahoo.com said:
> On Wednesday, February 20, 2019 at 8:28:55 PM UTC+2, Stephen Hoffman wrote:
>> On 2019-02-20 14:16:54 +0000,
already...@yahoo.com said:
>>
>>> Did VAX had formally (or semi-formally, as x86) specified memory
>>> ordering rules?
>>
>> Yes, and in some detail. Those rules were documented in DEC Standard
>> 32, and in (some versions of) the VAX Architecture Handbook. The
>> latter series of documentation was less detailed and sometimes too-less
>> detailed. A few versions of the VAX handbook were... bookshelf filler.
>>
>> Here's the canonical DEC Standard 32, the VAX architecture reference:
>>
https://archive.org/details/bitsavers_decvaxarch32Jan90_36555387
>>
>
> That's a document that John mentioned above:
> [This section will be added by a later ECO.]
> I have no idea what is "ECO".
That's DECspeak for "patch", more or less.
Engineering Change Order or ECO was a change initiated from or
distributed from DEC central engineering, as differentiated from Field
Change Order or FCO and which was a change that was initiated from or
that was distributed from DEC field service and that happened to
systems in the field. FCOs usually involved hardware or sometimes
firmware, variously involved site visits and tools, and tended to get
expensive as the numbers of customer sites involved in the required
change increased.
There was once a DEC Dictionary around, and a DEC Software Engineering
Handbook (1988, IIRC) around, so that Digits could keep up on
DECjargon, and on DEC processes and procedures.
Oddly enough, various other processors that were retired a
~quarter-century ago also aren't mentioned.
There's a version of the VAX rules of memory references starting around
page 269 here:
http://www.bitsavers.org/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf
The VAX rules were concisely documented and often rather murky to read,
and little (none?) of the available developer tooling helped the
developer avoid related coding mistakes.
VAX wasn't superscaler and didn't reorder instructions and didn't
reorder memory references. And instructions set those bitflags, which
also made things slow.
Rattle around under the following path for some DEC-internal
discussions of where VAX and VMS were both found lacking, and why, and
what was planned, and for what effectively turned into NT...
http://www.bitsavers.org/pdf/dec/prism/
There are issues and problems discussed there that OpenVMS still
doesn't handle very well, too.
>> We probably won't see another major architecture that was as aggressive
>> as was Alpha.
>>
>
> According to my understanding, the implementation of memory ordering on
> EV6 and EV7 (or, may be, only in EV7 ?) was not really weak. Probably
> pretty similar to today's Intel/AMD with exception of non-coherent
> instruction cache.
> But they preferred to not codify new more strict behavior in Alpha
> architecture books.
All sorts of memory re-ordering and coalescence was permissible with
Alpha, and which meant the use of barriers was necessary.
Two versions of the same discussion:
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf
https://www.linuxjournal.com/article/8212
x86 was, is, and will likely remain nowhere near as aggressive as was Alpha.
>> There was a wonderfully subtle bug a while back, with two adjacent
>> variables were being accessed in some concurrent code, and the
>> variables were getting torn because of that adjacency. Fun with
>> granularity.
>>
>
> Cache line tearing?
> I had a misfortune to suffer from such thing on much smaller devices.
> But in my case it was totally my own fault, because architecture
> explicitly stated no cache coherence between CPU and I/O bus masters.
See the discussion of /ALIGNMENT and /GRANULARITY here:
http://h30266.www3.hpe.com/odl/i64os/opsys/vmsos84/5841/5841pro_075.html
VAX had a version of this, around tearing and natural alignment.
http://www.itec.suny.edu/scsys/vms/ovmsdoc072/72final/6493/6101pro_007.html
But again, VAX was far less aggressive than Alpha.
>> Pragmatically, there's not a whole lot of difference between supporting
>> 2 processors and supporting 8 processors and supporting 16, and 32, etc.
>
> I have to disagree.
> You need a minimum of 4 processors in order to just illustrate a
> difference between iAMD64 memory ordering and sequential consistency.
Disagree all you want. From my own code and from what I've worked on
elsewhere, going from one processor to two processors broke a whole lot
of poorly-synchronized code.
Code that's correctly marked as volatile and correctly generated can
still have issues with scaling.
Memory ordering and code generation are certainly aspects of what can
go wrong with OpenVMS app designs, but these are very far from the only
pitfalls that developers encounter.
Old VAX code was often buggy, and more than a little of what's left—on
VAX, or what's still being built with /STANDARD=VAXC, is usually still
buggy.
Pile security considerations atop all this—security deals with the
sorts of bugs that will effectively spread across multiple sites and
that will increase in frequency and prevalence, and sometimes increase
very quickly, as differentiated from what usually happens with the more
traditional sorts of bugs—and thus vulnerabilities usually get handled
somewhat differently.
>> Yes, there are cases where the since-deprecated quadword-core-limited
>> API references for processors will have to be changed in the app source
>> code, certainly.
>
> I suppose, you are talking about APIs related to affinity?
No, I'm referring to some of the older code around that called system
service APIs and passed around processors (cores, threads) as quadword
masks; see SYI$_ACTIVE_CPU_MASK. et al.
There's code around that assumed 32- or 64-processor configurations
were the limit. Same as the eight-byte-password-hash mess. Parts of
the OpenVMS internals once also used quadwords here, though that's
reportedly been remediated.
>> But it was going from one processor to two processors that tended to
>> expose latent bugs in the app code.
>>
>> More generally, OpenVMS didn't do all that well past about 6 or 8
>> processors, and for many years. Work has been underway for decades to
>> break up system and device locks, but apps and OpenVMS itself has
>> always and will always saturate on something.
>>
>> Eventually.
>
> Sounds pessimistic.
Welcome to parallelism. Very few apps scale linearly with cores.
Adding cores often doesn't scale linearly either, particularly with
system designs that seek to provide cache coherency. Programming and
scheduling gets more interesting as memory access becomes non-uniform,
and OpenVMS has been dealing with NUMA designs for a while. Many apps
dealing with NUMA, not so much. Things get even more interesting when
you're working with a mix of very different types of processors of
different architectures accessing and sharing memory, too. And MPSYNC
state and friends are not at all unusual on larger OpenVMS
multiprocessors.
http://aviral.lab.asu.edu/non-coherent-cache-multi-core-processors/
http://people.ee.duke.edu/~sorin/papers/tr2013-1-coherence.pdf
http://www.archive.ece.cmu.edu/~ece600/lectures/lecture17.pdf
http://www-5.unipv.it/mferretti/cdol/aca/Charts/07-multiprocessors-MF.pdf
etc...