IOSBs, async data and volatile attributes on x86-64 VMS.

Simon Clubley

unread,

Feb 14, 2019, 8:59:39 AM2/14/19

to

Tom's question has recalled a memory of a previous discussion about
whether you need to mark an IOSB memory block as volatile.

The general opinion at the time was that it wasn't needed because
the system service calls used for synchronisation (in fact, _any_ call)
served as a marker to the compiler's code generator that the cached
content of the IOSB was considered to be invalid after control had been
returned back to the program after calling the system service.

In light of the highly aggressive optimisation which takes place in
LLVM these days, I was wondering if that was still always the case
or if the volatile attribute is going to have to be used for certain
IOSB access patterns.

The only possible access pattern I can immediately think of which would
need a volatile attribute is a busy-wait tight loop on the IOSB and I
can't think of any reason why someone would do that when the system
service synchronisation calls are available.

The same question also applies to async data access such as that seen
in ASTs.

I do wonder however if there could be any possible problems if a
sys$qio() later filled in the data buffer (data buffer, not the IOSB)
if the program carried on doing something else while the sys$qio()
was still active in the background.

Is _all_ variable caching, regardless of whether the variable is
directly accessed in a function call, still considered to have been
invalidated at the point the function returns back to the program,
or have more aggressive optimisation techniques changed that ?

No specific reason for asking at this point, I was just curious after
recalling the previous discussion.

Thanks,

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Stephen Hoffman

unread,

Feb 14, 2019, 10:14:16 AM2/14/19

to

On 2019-02-14 13:59:37 +0000, Simon Clubley said:

> Tom's question has recalled a memory of a previous discussion about
> whether you need to mark an IOSB memory block as volatile.

Some related and background reading, and from a recent proposal to
deprecate (most of) volatile:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html

--
Pure Personal Opinion | HoffmanLabs LLC

Dave Froble

unread,

Feb 14, 2019, 12:09:24 PM2/14/19

to

On 2/14/2019 8:59 AM, Simon Clubley wrote:
> Tom's question has recalled a memory of a previous discussion about
> whether you need to mark an IOSB memory block as volatile.
>
> The general opinion at the time was that it wasn't needed because
> the system service calls used for synchronisation (in fact, _any_ call)
> served as a marker to the compiler's code generator that the cached
> content of the IOSB was considered to be invalid after control had been
> returned back to the program after calling the system service.
>
> In light of the highly aggressive optimisation which takes place in
> LLVM these days, I was wondering if that was still always the case
> or if the volatile attribute is going to have to be used for certain
> IOSB access patterns.
>
> The only possible access pattern I can immediately think of which would
> need a volatile attribute is a busy-wait tight loop on the IOSB and I
> can't think of any reason why someone would do that when the system
> service synchronisation calls are available.

Just reading that gave me a headache. Something I'd never consider
doing. But, I'm sure someone has done it ....

--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: da...@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486

John Reagan

unread,

Feb 14, 2019, 12:11:16 PM2/14/19

to

It never was _all_.

Speaking for GEM, at an external call, we determine what that external routine can see. Some is easy to determine, global data for example, may be touched by the external routine. Module static data might be touched if that external routine might call back into routine in this module which then might touch that static (ie, not global) data. Pointer aliasing gets more complicated but you can often at least exclude some things that might not happen. We can keep track of variables which had the "address of" operator applied to it. And so on.

volatile tells GEM "there is a dependency that the compiler cannot see no matter how much we analyze or have other unknown side-effects". You might consider some IO register where "reading" it causes something to happen.

Hoff's link to JF's C++ proposal is a good one. It isn't saying that "volatile" of itself is bad, but the language definition lets you put "volatile" on things that don't make much sense. volatile structures or volatile return values for instance.

To your LLVM question, it is upto the frontends, based on their language rules, to encode some alias-analysis information (GEM has a similar concept but different implementation approach) on the LLVM symbol tables and LLVM IR nodes. I would expect that at the call, we will tell LLVM that variables that have had their address taken (the frontend has been keeping track of uses of "&" or passing arrays to parameters, etc.) may be modified by this routine call.

Stephen Hoffman

unread,

Feb 14, 2019, 1:04:45 PM2/14/19

to

On 2019-02-14 17:11:15 +0000, John Reagan said:

> volatile tells GEM "there is a dependency that the compiler cannot see
> no matter how much we analyze or have other unknown side-effects". You
> might consider some IO register where "reading" it causes something to
> happen.

> ...

> To your LLVM question, it is upto the frontends, based on their
> language rules, to encode some alias-analysis information (GEM has a
> similar concept but different implementation approach) on the LLVM
> symbol tables and LLVM IR nodes. I would expect that at the call, we
> will tell LLVM that variables that have had their address taken (the
> frontend has been keeping track of uses of "&" or passing arrays to
> parameters, etc.) may be modified by this routine call.

The results here are going to be entertaining, with an aggressive optimizer...

So... Arguably...

IOSBs, LKSBs, itemlists and what those itemlists then reference, and
similar OpenVMS-specific constructs are probably best all marked as
volatile, when there are asynchronous system calls in the mix.

This in addition to the classic requirement around not being allocated
in storage that goes out of scope.

The OpenVMS front-ends plan to allow developers to avoid the volatile
specification.

It'd be interesting to see if there's a performance difference between
explicitly-volatile code and what an OpenVMS front-end for a compiler
might assume. The compilers are going to have to assume that any
variables used with a by-ref argument to a function and where that
variable is then accessed for read later in the same code might be
accessed asynchronously from the called function. This if the compiler
doesn't have introspection into the entirety of the called code to make
a determination. And if there is a significant performance difference
available, then the discussion eventually turns to a way to turn that
assumption and that safety off. This is obviously not going to be a
priority either, barring an substantial difference in performance.

I'd be tempted to flag these cases with compiler diagnostics as a way
to steer folks toward correctly-behaving programs, too.

Simon Clubley

unread,

Feb 14, 2019, 2:02:39 PM2/14/19

to

On 2019-02-14, John Reagan <xyzz...@gmail.com> wrote:
>
> To your LLVM question, it is upto the frontends, based on their language rules, to encode some alias-analysis information (GEM has a similar concept but different implementation approach) on the LLVM symbol tables and LLVM IR nodes. I would expect that at the call, we will tell LLVM that variables that have had their address taken (the frontend has been keeping track of uses of "&" or passing arrays to parameters, etc.) may be modified by this routine call.
>

The main potential problem I am seeing is one that normal Unix users
usually don't have to deal with and that is the VMS I/O model is
heavily asynchronous instead of the synchronous model that Unix
normally uses for its traditional I/O model.

That means data (including user buffers and the IOSB) can suddenly
be changed in the process memory space between any obvious
synchronisation points such as the I/O function calls.

That makes the VMS I/O model very close to the truly asynchronous
model for hardware registers (and hardware access was one of the
reasons why volatile was invented in the first place).

Simon Clubley

unread,

Feb 14, 2019, 2:25:37 PM2/14/19

to

On 2019-02-14, Stephen Hoffman <seao...@hoffmanlabs.invalid> wrote:
> On 2019-02-14 17:11:15 +0000, John Reagan said:
>
>> volatile tells GEM "there is a dependency that the compiler cannot see
>> no matter how much we analyze or have other unknown side-effects". You
>> might consider some IO register where "reading" it causes something to
>> happen.
>> ...
>> To your LLVM question, it is upto the frontends, based on their
>> language rules, to encode some alias-analysis information (GEM has a
>> similar concept but different implementation approach) on the LLVM
>> symbol tables and LLVM IR nodes. I would expect that at the call, we
>> will tell LLVM that variables that have had their address taken (the
>> frontend has been keeping track of uses of "&" or passing arrays to
>> parameters, etc.) may be modified by this routine call.
>
> The results here are going to be entertaining, with an aggressive optimizer...
>
> So... Arguably...
>
> IOSBs, LKSBs, itemlists and what those itemlists then reference, and
> similar OpenVMS-specific constructs are probably best all marked as
> volatile, when there are asynchronous system calls in the mix.
>

You know, you are right. The more you think about it the bigger the
potential issues become.

Normally, when I expect data to be changed out from underneath me,
I just mark the variables as volatile and have done with it (in addition
to using synchronisation if the variables can't be accessed in an
atomic way and if that matters for the code in question).

The issue here is that people may not have been using volatile on VMS
in situations that you would expect it to be used in other environments.

I also wonder what other similar issues might be thrown up with AST
related variables when the code is compiled with an LLVM style aggressive
optimiser ?

John Reagan

unread,

Feb 14, 2019, 5:30:12 PM2/14/19

to

Normal Unix users are calling open/read/write/close which are synchronous by default. Normal UNIX users don't have IOSBs.

John Reagan

unread,

Feb 14, 2019, 5:31:42 PM2/14/19

to

Any different than a signal handler going off? Such a handler (or AST routine) will need to be careful about writing global/static data and probably uses volatile today on the Unix box (or some other synchronization mechanism)

Dave Froble

unread,

Feb 14, 2019, 5:50:52 PM2/14/19

to

This reminds me why, if I don't really have anything else to do, I
prefer QIOW. Yeah, yeah, I can imagine, and have done, work while
waiting for say, an I/O to complete.

Stephen Hoffman

unread,

Feb 14, 2019, 7:35:30 PM2/14/19

to

On 2019-02-14 22:49:58 +0000, Dave Froble said:

> This reminds me why, if I don't really have anything else to do, I
> prefer QIOW. Yeah, yeah, I can imagine, and have done, work while
> waiting for say, an I/O to complete.

I haven't seen particularly good documentation for OpenVMS around app
coding for and routines that are considered re-entrant and thread-safe.
Or routines that aren't.

As for doc around mixing threads and ASTs, that can get interesting.
I'm familiar with (most of?) what doc is available on that topic, and
that is not in the mainstream documentation set.

There've also been running discussions around undefined behavior too,
and a few of which have flared up around here on occasion.

What Simon is referencing in this thread involves what happens with an
aggressive optimizer and otherwise undefined behavior, of course. Yet
another facet.

This area may well become defined behavior, once the compiler front
ends work as JR has indicated. Or with the requisite sprinkling of
volatile declarations.

Related reading:
https://en.wikipedia.org/wiki/Thread_safety
https://en.wikipedia.org/wiki/Reentrancy_(computing)
https://doc.rust-lang.org/book/ch16-00-concurrency.html

Simon Clubley

unread,

Feb 15, 2019, 8:22:08 AM2/15/19

to

On 2019-02-14, John Reagan <xyzz...@gmail.com> wrote:
>
> Normal Unix users are calling open/read/write/close which are synchronous by default. Normal UNIX users don't have IOSBs.

That's exactly the point I am making.

John Reagan

unread,

Feb 15, 2019, 8:52:02 AM2/15/19

to

Since both gcc and clang/LLVM have been used to build Linux kernels, embedded systems, Sony Platstation games, etc. I'm also confident that both compiler systems understand the language semantics. The version of Android Pi on my Pixel 2 was built with clang/LLVM.

Stephen Hoffman

unread,

Feb 15, 2019, 11:45:28 AM2/15/19

to

On 2019-02-15 13:52:01 +0000, John Reagan said:

> On Friday, February 15, 2019 at 8:22:08 AM UTC-5, Simon Clubley wrote:
>> On 2019-02-14, John Reagan <xyzz...@gmail.com> wrote:
>>>
>>> Normal Unix users are calling open/read/write/close which are
>>> synchronous by default. Normal UNIX users don't have IOSBs.
>>
>> That's exactly the point I am making.
>

> Since both gcc and clang/LLVM have been used to build Linux kernels,
> embedded systems, Sony Platstation games, etc. I'm also confident that
> both compiler systems understand the language semantics. The version

> of Android Pi on my Pixel 2 was built with clang LLVM.

OpenVMS on x86-64 will very likely not require IOSBs, LKSBs, itemlists
and other related bits marked as volatile when mixed with asynch
$getsyi, $getjpi, $io_perform, $qio and similar calls, even though that
data arguably should be marked volatile.

How much this might effect performance of optimized code? Donno.
Working code with latent bugs is definitely preferable to non-working
code with latent bugs, though.

I'm sure JR and various others reading here have been following the
various undefined behavior (UB) discussions over the years, and
there've been some surprises.

There's been a longstanding conflict between better code optimizations
and code with latent dependencies on UB, too. Itanium compilers saw
little work on code optimization in comparison to what's been happening
on other platforms, so this is something OpenVMS folks have
infrequently encountered in recent years. But then I'm still dragging
old C code off VAX C, too.

I've previously posted links to the following discussion on LLVM and
undefined behavior:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html

And along the way, there have also been the occasional synchronization
surprises....
http://snf.github.io/2019/02/13/shared-ptr-optimization/

OpenVMS has seen analogous synch bugs. Based on the beta of VAX/VMS
V5.0 an aeon or two ago, there's a *lot* of broken I/O completion code
around.

As for how other platforms deal with this, I'm mostly dealing with
higher-level calls and abstractions, or with libdispatch and sometimes
with kqueue, and rarely with epoll or poll or select. Haven't worked
with IOCP, though.

OpenVMS has some sorta-kinda kqueue-like interfaces for specific
notifications from some subsystems, but never really adopted the
approach more broadly.

select has its detractors.
https://idea.popcount.org/2017-01-06-select-is-fundamentally-broken/

Adding to the entertainment possibilities available here, select is
also sockets-only on OpenVMS, and which has caused more than a few bits
of "fun" while porting code. That sockets-only restriction will
probably continue with the new IP stack, as it's the (lack of)
non-socket support that's the problem on OpenVMS.

epoll and its issues:
https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/

$synch probably has similar issues to what's described around epoll
scaling too, particularly around notifications when calls are combined
with EFN$C_ENF.

And event flags—particularly EF0, and EF collisions—are a wonderful
source of subtle app bugs on OpenVMS.

But yes, new compilers will probably mean source code tweaks, even if
the asynch code is implicitly marked as volatile and can be (mostly)
ignored. Removing old workarounds, too.

Simon Clubley

unread,

Feb 15, 2019, 2:36:22 PM2/15/19

to

On 2019-02-15, John Reagan <xyzz...@gmail.com> wrote:
> On Friday, February 15, 2019 at 8:22:08 AM UTC-5, Simon Clubley wrote:
>> On 2019-02-14, John Reagan <xyzz...@gmail.com> wrote:
>> >
>> > Normal Unix users are calling open/read/write/close which are synchronous by default. Normal UNIX users don't have IOSBs.
>>
>> That's exactly the point I am making.
>>

> Since both gcc and clang/LLVM have been used to build Linux kernels, embedded systems, Sony Platstation games, etc. I'm also confident that both compiler systems understand the language semantics. The version of Android Pi on my Pixel 2 was built with clang/LLVM.

You might be misunderstanding me John.

It's not the language semantics I am worried about, it's the VMS
I/O system (and other system services) semantics I am thinking of.
They are highly asynchronous in a way that you don't normally see
on Unix.

On VMS, those asynchronous system services mean things can just appear
in memory without having any obvious synchronising point such as a
subsequent call to a system service.

The question is whether some less strict coding styles which you might
have got away with when using current compilers could bite you when
using the highly aggressive optimiser in LLVM.

The projects you mention above have no doubt been written to those
stricter standards from the beginning.

And before people jump on me for saying that, just remember that this
is the operating system which gave us VAX C and which gave us the
/STANDARD=VAXC qualifier for the next generation of C compiler because
it was expected that broken VAX C code could still be compiled with DEC C
without any changes.

This time however, either VMS programmers need to start using things
like volatile when it matters just like everyone else does, or you need
to try and make sure the compiler still generates the correct code even
when volatile is omitted.

BTW, for those of you wondering just how aggressive the LLVM optimiser
can be, here's a bit of reading for you:

https://llvm.org/docs/Passes.html

This is a list of the standard optimisations provided as a part of
LLVM and is divided into the analysis passes and the transform passes.

Arne Vajhøj

unread,

Feb 17, 2019, 12:40:12 PM2/17/19

to

On 2/14/2019 2:02 PM, Simon Clubley wrote:
> The main potential problem I am seeing is one that normal Unix users
> usually don't have to deal with and that is the VMS I/O model is
> heavily asynchronous instead of the synchronous model that Unix
> normally uses for its traditional I/O model.
>
> That means data (including user buffers and the IOSB) can suddenly
> be changed in the process memory space between any obvious
> synchronisation points such as the I/O function calls.
>
> That makes the VMS I/O model very close to the truly asynchronous
> model for hardware registers (and hardware access was one of the
> reasons why volatile was invented in the first place).

*nix actually have AIO.

And AIO may not be widely used on *nix, but neither is $QIOnonW on VMS.

But more importantly most new developers have experience
with multi-threaded programming.

And while there certainly is a difference between OS kernel
updating something and an application thread updating something,
then developers should be very much aware that some variables
may be updated in parallel.

Arne

Simon Clubley

unread,

Feb 18, 2019, 8:17:40 AM2/18/19

to

On 2019-02-17, Arne Vajhøj <ar...@vajhoej.dk> wrote:
>
> *nix actually have AIO.
>

Oh, you mean the stuff with the abstracted API and where everything that
should be marked as volatile actually _is_ marked as volatile ? :-)

> And AIO may not be widely used on *nix, but neither is $QIOnonW on VMS.
>
> But more importantly most new developers have experience
> with multi-threaded programming.
>
> And while there certainly is a difference between OS kernel
> updating something and an application thread updating something,
> then developers should be very much aware that some variables
> may be updated in parallel.
>

I strongly agree with that you are saying Arne and that should indeed
be the case. Unfortunately, that line of thinking didn't work out so
well last time which is why we have /STANDARD=VAXC. :-(

Let us see how well that works out this time during the GEM based
compiler to LLVM based compiler transition...

Hopefully this time people will be more willing to fix broken code
that just accidentally works in the current compilers instead of
requiring VSI to jump through massive hoops in order to maintain
broken backwards compatibility behaviour.

Arne Vajhøj

unread,

Feb 18, 2019, 8:18:41 PM2/18/19

to

On 2/18/2019 8:17 AM, Simon Clubley wrote:
> On 2019-02-17, Arne Vajhøj <ar...@vajhoej.dk> wrote:
>> And AIO may not be widely used on *nix, but neither is $QIOnonW on VMS.
>>
>> But more importantly most new developers have experience
>> with multi-threaded programming.
>>
>> And while there certainly is a difference between OS kernel
>> updating something and an application thread updating something,
>> then developers should be very much aware that some variables
>> may be updated in parallel.
>>
>
> I strongly agree with that you are saying Arne and that should indeed
> be the case. Unfortunately, that line of thinking didn't work out so
> well last time which is why we have /STANDARD=VAXC. :-(
>
> Let us see how well that works out this time during the GEM based
> compiler to LLVM based compiler transition...
>
> Hopefully this time people will be more willing to fix broken code
> that just accidentally works in the current compilers instead of
> requiring VSI to jump through massive hoops in order to maintain
> broken backwards compatibility behaviour.

????

I cannot follow you.

I was noting that there are millions of programmers out there
for who the concept of data being modified outside of current
execution context is a normal to be expected thing.

I don't really see the that relating to backwards
compatibility and /STANDARD=VAXC.

Arne

Simon Clubley

unread,

Feb 19, 2019, 8:36:39 AM2/19/19

to

The comparison should be obvious.

There are lots of programmers who know how to write proper C code
and know how to fix up existing broken C code.

In any other environment I can think of, people would be expected
to fix their broken code when that broken code was revealed to be
broken by a new compiler.

In the VMS world however, DEC was forced to implement /STANDARD=VAXC
instead because making people fix broken code that wouldn't work
in DEC C apparently isn't the VMS way of doing things.

Likewise, as you mention above, there are lots of people who know
how to correctly write code that handles data being suddenly modified
by something outside of the current code.

Let's see if this time people are simply told to fix any broken
asynchronous code or if VSI will jump through hoops to try and make
this broken code work with LLVM.

The problem here (hopefully) isn't new code because (hopefully) anyone
writing that new code will know how to write it correctly.

The problem is old code that is broken but just happens to work with
the existing compilers.

Dave Froble

unread,

Feb 19, 2019, 10:47:24 AM2/19/19

to

I don't use C, so am not directly involved.

However, I seem to feel that perhaps your term "broken code" just might
not be so accurate. How should that be defined? If code is working, is
it broken? One could suggest that it is not broken.

Then along comes a new compiler, with different rules, and the old code
no longer works. Because it's "broken", or because the new compiler has
different rules?

Allowing code to work with a new compiler could be considered "proper".

Note, I'm not defending VAX-C, I've read too many comments about it.

As for making work for people, well, that's a bit arrogant, don't you think?

already...@yahoo.com

unread,

Feb 19, 2019, 11:16:29 AM2/19/19

to

On Tuesday, February 19, 2019 at 5:47:24 PM UTC+2, Dave Froble wrote:
>
> As for making work for people, well, that's a bit arrogant, don't you think?
>

LLVM leader Chris Lattner is not a bit arrogant. He is a lot arrogant.

Arne Vajhøj

unread,

Feb 19, 2019, 11:31:42 AM2/19/19

to

Not to me.

> There are lots of programmers who know how to write proper C code
> and know how to fix up existing broken C code.
>
> In any other environment I can think of, people would be expected
> to fix their broken code when that broken code was revealed to be
> broken by a new compiler.
>
> In the VMS world however, DEC was forced to implement /STANDARD=VAXC
> instead because making people fix broken code that wouldn't work
> in DEC C apparently isn't the VMS way of doing things.

I believe that VAX C was released before ANSI/ISO C came out.

People wrote code using K&R book and what the available C compiler
supported.

Then came ANSI/ISO C, new C compilers etc..

I don't think code that worked with the old semi-standard and the
old compilers, but does not work with new standard and new compilers
are broken. The code is just old.

And wanting to be able to compile old stuff with new compilers
is not a VMS thing.

It is practically the industry standard.

javac -source and -target
csc -langversion
gcc -std
etc.

> Likewise, as you mention above, there are lots of people who know
> how to correctly write code that handles data being suddenly modified
> by something outside of the current code.
>
> Let's see if this time people are simply told to fix any broken
> asynchronous code or if VSI will jump through hoops to try and make
> this broken code work with LLVM.
>
> The problem here (hopefully) isn't new code because (hopefully) anyone
> writing that new code will know how to write it correctly.
>
> The problem is old code that is broken but just happens to work with
> the existing compilers.

Whether vendors provide backward compatibility and whether
customers update their code to current standard are not
real technical questions - it is about risk and money.

And I still don't see the relation.

You said:

# The main potential problem I am seeing is one that normal Unix users
# usually don't have to deal with and that is the VMS I/O model is
# heavily asynchronous instead of the synchronous model that Unix
# normally uses for its traditional I/O model.
#
# That means data (including user buffers and the IOSB) can suddenly
# be changed in the process memory space between any obvious
# synchronisation points such as the I/O function calls.

I noted that:

# But more importantly most new developers have experience
# with multi-threaded programming.
#
# And while there certainly is a difference between OS kernel
# updating something and an application thread updating something,
# then developers should be very much aware that some variables
# may be updated in parallel.

That is about developer skill sets.

And I really do not see an obvious relation to:
* DEC's decision to add /STAND=VAXC to DEC C
* various DEC customers decision not to convert code
to ANSI/ISO C ASAP
* VSI decision to keep current memory model or
switch to new memory model [not sure that current
status really qualify as a memory model but whatever]
* various VSI customers future decision on what to
do with their code

That was or will be business decisions.

Arne

Stephen Hoffman

unread,

Feb 19, 2019, 11:32:16 AM2/19/19

to

On 2019-02-19 13:36:37 +0000, Simon Clubley said:

> The comparison should be obvious.

Alas, decades of under-maintained source code is a prevalent in many
environments, and that code is also arguably a large chunk of the VSI
business proposition.

> There are lots of programmers who know how to write proper C code and
> know how to fix up existing broken C code.

Alas, OpenVMS C is ancient and limited, and with more than a few holes.

The lack of select and kqueue and libdispatch, and the "fun" around
SSIO, in this realm.

Not so sure about IEEE POSIX aio support, as that's probably mostly
interesting for files given how most network code has been written.

http://blog.lighttpd.net/articles/2006/11/12/lighty-1-5-0-and-linux-aio/
https://blog.libtorrent.org/2012/10/asynchronous-disk-io/
etc

RMS and the XQP have had asynchronous I/O for aeons, though that's been
comparatively rarely used. Sure, there's also been $qio READVBLK and
WRITEVBLK code around, too.

Fewer folks have migrated from $qio to $io_perform, too.

> In any other environment I can think of, people would be expected to
> fix their broken code when that broken code was revealed to be broken
> by a new compiler.

Alas, OpenVMS users will grumble about being made to change their code,
too. I've found more than a few latent bugs dragging old code forward
from VAX C, too.

> In the VMS world however, DEC was forced to implement /STANDARD=VAXC
> instead because making people fix broken code that wouldn't work in DEC
> C apparently isn't the VMS way of doing things.

Alas, compatibility. There's a market in letting folks do the same
thing the same way, though that compatibility inherently and inevitably
comes at the cost of better solutions.

> Likewise, as you mention above, there are lots of people who know how
> to correctly write code that handles data being suddenly modified by
> something outside of the current code.

Alas, comparatively fewer folks around with that experience also know
how to do that on OpenVMS.

Synchronization bugs and memory management bugs have a long history on
OpenVMS, too.

Mixing shared memory and symmetric multiprocessing on Alpha involves
some detailed knowledge of what are some of the most aggressive caching
rules.

x86 caching is far less aggressive, so there's that.

Available doc and tooling here has been limited, as well.

> Let's see if this time people are simply told to fix any broken
> asynchronous code or if VSI will jump through hoops to try and make
> this broken code work with LLVM.

Alas, OpenVMS folks have long been taught to expect compatibility.
Forcing folks to fix source code has been limited. The BACKUP
declarations was one case. There've been a few others. But it's been
rare.

What's been even more rare have been conversion tools. Tools intended
to find and resolve these bugs, or to find and flag problematic code.
SRM_CHECK was one, but those tools have also been rare.

IDE source code conversions such as what I've experienced on some other
platforms are unheard of on OpenVMS.

Tools such as the address sanitizer (ASAN) will be of benefit to
OpenVMS folks, but that's only after the x86-64 port and LLVM are
sorted.

https://clang.llvm.org/docs/AddressSanitizer.html

> The problem here (hopefully) isn't new code because (hopefully) anyone
> writing that new code will know how to write it correctly.

Alas, I still encounter folks writing VAX C code. And folks writing
FORTRAN code and not Fortran code, for that matter.

> The problem is old code that is broken but just happens to work with
> the existing compilers.

See "compatibility" and "undefined behavior" and "latent bugs", among
other previous discussions.

Code generation is a certainly an example of the trade-offs around
generated code performance and developer dependencies on undefined
behavior.

VSI—like most any other commercial software provider—has lots and lots
of choices here, and not enough staff and schedule and budget
available, and lots of trade-offs. That's always been the case.
Always will be, too.

Here? VSI is going to be under some pressure to break little or none of
the existing (and broken) asynchronous I/O source code.

abrsvc

unread,

Feb 19, 2019, 11:35:25 AM2/19/19

to

>
> I don't use C, so am not directly involved.
>
> However, I seem to feel that perhaps your term "broken code" just might
> not be so accurate. How should that be defined? If code is working, is
> it broken? One could suggest that it is not broken.
>

This is not a simple question to answer. I have had a case (recently) where code appeared to work correctly for many years. During a "port" to a newer platform and compiler versions, I found that the code was working by accident and not on purpose. I will state that making the code work on purpose did not involve too much work, but...

Arne Vajhøj

unread,

Feb 19, 2019, 11:40:00 AM2/19/19

to

> I don't use C, so am not directly involved.
>
> However, I seem to feel that perhaps your term "broken code" just might
> not be so accurate. How should that be defined? If code is working, is
> it broken? One could suggest that it is not broken.
>
> Then along comes a new compiler, with different rules, and the old code
> no longer works. Because it's "broken", or because the new compiler has
> different rules?
>
> Allowing code to work with a new compiler could be considered "proper".

It is normal to consider code that:
- are not guaranteed to work correctly by language standard and runtime
specification
- work correctly with a specific compiler and runtime environment
to be broken.

Any update to compiler or runtime environment could cause
the code top stop work correctly.

And the customer can not request a fix from the vendor as it is
the code that is at fault not the compiler and/or runtime environment.

I would not consider code:
* are guaranteed to work by standard/specification version N
* are not guaranteed to work by standard/specification version N+1
to be broken if the project creating the code was targeting
version N.

> Note, I'm not defending VAX-C, I've read too many comments about it.

I believe it was quite common to compile with /NOOPT back in the VAX C days
to avoid various problems. That is not a sign of good quality.

Arne

Arne Vajhøj

unread,

Feb 19, 2019, 11:48:02 AM2/19/19

to

On 2/19/2019 11:32 AM, Stephen Hoffman wrote:
> Synchronization bugs and memory management bugs have a long history on
> OpenVMS, too.
>
> Mixing shared memory and symmetric multiprocessing on Alpha involves
> some detailed knowledge of what are some of the most aggressive caching
> rules.
>
> x86 caching is far less aggressive, so there's that.

Good point.

Both Alpha and Itanium supposedly have weaker guarantees than
x86 and x86-64.

MS got some problems with .NET 2.0 due to that.

VSI is moving in the easy direction.

Arne

Dave Froble

unread,

Feb 19, 2019, 1:39:56 PM2/19/19

to

I got to give you that one.

I had an error on Alpha, used the wrong descriptor definition, and Alpha
didn't throw an error, just ignored it. Itanic was a bit more picky,
and the error needed to be fixed.

Errors happen, and need to be fixed. I'm not sure that is the same
thing as changing rules.

geze...@rlgsc.com

unread,

Feb 19, 2019, 1:40:56 PM2/19/19

to

There are many misunderstandings and misconceptions about optimizers. Asynchronously accessed, both reading and writing, of data is but one of the potential situations. There are also situations where the language specification is correct, but common compiler behavior has not enforced the language specification precisely.

John Reagan of VSI is well aware of these stories, they have figured in campground discussions over the years.

It is often not a question of proficiency or carelessness. Careful, diligent users fall into these traps, often when working with existing code. For example, most FORTRAN compilers used static (non-stack) storage for local variables. Strictly reading the standard, if one wants to preserve a variable between invocations, one needed to use a COMMON or to specify the variable be preserved (using a SAVE statement). A random number generator presumed the preservation behavior and broke when optimization was turned on. Digging through the assembler listing from the code generator uncovered what was happening, at which point the lack of a SAVE statement was uncovered.

Most of the examples referenced have referenced C, but the problem can occur in any language that can invoke asynchronous processing. For example, consider:

CALL X(BUFFER)

in any OpenVMS language. If the implementation of X does an asynchronous call (e.g., SYS$QIO) using BUFFER, then the contents of BUFFER may change at various points between the SYS$QIO and completion of the SYS$QIO operation, signaled by some combination of the IO Status Block, setting of the specified event flag, or queueing of the IO Completion AST. A particularly nasty bug here is for BUFFER to have been allocated on the stack. When the routine containing the CALL X(BUFFER) returns, generally long before the SYS$QIO completes, BUFFER goes out of scope. The results of the operation will still use the now out-of-scope BUFFEER with unpredictable consequences. These bugs can lie latent for decades and then fatally surface unpredictably and unreproduceably.

As has been mentioned, having levels of indirection (e.g., argument lists, parameters, separately compiled modules, dynamically allocated structures) can make it difficult to determine which optimization operations are safe.

The specification of a variable as VOLATILE is in addition to the preceding.

The list of code optimization phases in LLVM is a red herring. The correct question is if variable are properly identified.

- Bob Gezelter, http://www.rlgsc.com

John Reagan

unread,

Feb 19, 2019, 2:00:18 PM2/19/19

to

Finally back at a keyboard after several days of traveling.

- GEM is quite an aggressive optimizer as well (especially on Alpha). The pointer aliasing code is quite the piece of work. I'm not worried by the LLVM passes as long as we generate the correct pointer aliasing metadata.

- VAXC mode is mostly syntactic sugar and a little "don't bother checking the types since the author always right". The 2nd part does make the pointer aliasing code a little less aggressive. See the /ANSI_ALIAS qualifier.

- The early versions of VAX C along with the VCG did have their fair share of optimizer bugs. I think the later versions should be stable.

Simon Clubley

unread,

Feb 19, 2019, 2:11:56 PM2/19/19

to

On 2019-02-19, geze...@rlgsc.com <geze...@rlgsc.com> wrote:
>
> The list of code optimization phases in LLVM is a red herring. The correct question is if variable are properly identified.
>

The list of optimisation phases was to try and show just how much
effort has been put into LLVM in that area. Aggressive optimisers
have a nasty habit of finding latent bugs in code or changing the
results of undefined language behaviour and the LLVM optimiser is
_very_ aggressive.

Simon Clubley

unread,

Feb 19, 2019, 2:14:06 PM2/19/19

to

Only on the hardware. On the software, they are moving in the
other direction with LLVM.

Stephen Hoffman

unread,

Feb 19, 2019, 6:55:31 PM2/19/19

to

On 2019-02-19 19:00:15 +0000, John Reagan said:

> - The early versions of VAX C along with the VCG did have their fair
> share of optimizer bugs. I think the later versions should be stable.

'cept for MD5 and the optimizer, of course.

geze...@rlgsc.com

unread,

Feb 19, 2019, 11:41:37 PM2/19/19

to

Simon,

The listing of phases is a packaging question. The operative question is: What optimizations are performed?

As John R has mentioned elsewhere, it is a matter of emitting the correct metadata.

Simon Clubley

unread,

Feb 20, 2019, 3:26:36 AM2/20/19

to

On 2019-02-19, geze...@rlgsc.com <geze...@rlgsc.com> wrote:
> On Tuesday, February 19, 2019 at 2:11:56 PM UTC-5, Simon Clubley wrote:
>> On 2019-02-19, geze...@rlgsc.com <geze...@rlgsc.com> wrote:
>> >
>> > The list of code optimization phases in LLVM is a red herring. The correct question is if variable are properly identified.
>> >
>>
>> The list of optimisation phases was to try and show just how much
>> effort has been put into LLVM in that area. Aggressive optimisers
>> have a nasty habit of finding latent bugs in code or changing the
>> results of undefined language behaviour and the LLVM optimiser is
>> _very_ aggressive.
>>
>

> Simon,
>
> The listing of phases is a packaging question. The operative question is: What optimizations are performed?
>
> As John R has mentioned elsewhere, it is a matter of emitting the correct metadata.
>

And, once again, it is also a matter of not having a program that
engages in undefined behaviour.

John Reagan

unread,

Feb 20, 2019, 8:46:26 AM2/20/19

to

Then never change architectures, never upgrade compilers, never change your system quotas, etc. Any of which can result in different undefined results.

Every port of OpenVMS has shaken loose broken programs. As somebody already mentioned, the Alpha memory ordering rules exposed lots of broken programs by having subtle and hard to catch timing windows. Ugh. It was a little better for the Itanium port. x86's almost VAX-like memory ordering shouldn't expose anything new I hope.

already...@yahoo.com

unread,

Feb 20, 2019, 9:16:55 AM2/20/19

to

Did VAX had formally (or semi-formally, as x86) specified memory ordering rules?
Or people just relied on actual implementations doing very little re-ordering without any architectural guarantees for the future?

As to x86, I suspect the biggest difference in the field of concurrency relatively to the previous ports would be the sheer size of the systems you will have to support. If I am not mistaken, so far HP and later VSI managed to avoid official support for anything with more than 32 hardware threads. I don't believe that such limitations could be applied in x86 world. If you want to be taken seriously, you will have to support at least 128 hardware threads, but preferably 256.

Dave Froble

unread,

Feb 20, 2019, 10:02:14 AM2/20/19

to

I was happy with VAX. But DEC decided that if Alpha could not plunder
other systems, then Alpha sales should plunder VAX sales.

As for performance, if the pig that is x86 could be made faster, then so
could VAX. Same argument for 64 bits.

I know, shut up Dave ....

John Reagan

unread,

Feb 20, 2019, 10:17:01 AM2/20/19

to

The VAX SRM (ie, DECSTD 032) has Chapter 7 titled "Memory and I/O" which give most of the definition. My online copy has

7.1.2 The Ordering of Reads, Writes, and Interrupts

[This section will be added by a later ECO.]

My 3-ring binder version is at home. It might be newer with more words in it.

You said "semi-formally" with x86. I think the definition in the Intel-64 manuals is pretty detailed.

already...@yahoo.com

unread,

Feb 20, 2019, 10:40:45 AM2/20/19

to

Like:
[This section will be added by a later ECO. May be.]
:-)

> You said "semi-formally" with x86. I think the definition in the Intel-64 manuals is pretty detailed.

It is detailed by examples, but not quite formal. Also, few paragraphs are contradicting each other if you take them by plain value.
I heard that academics at Cambridge and St. Andrews (Owens, Sarkar, Sewell) prepared definition of x86 model that could be considered formal. I didn't read it.

John Reagan

unread,

Feb 20, 2019, 10:53:14 AM2/20/19

to

I have a copy of that paper. Plus knowing the C/C++ atomic ordering rules can increase ones knowledge (or discomfort).

Stephen Hoffman

unread,

Feb 20, 2019, 11:03:28 AM2/20/19

to

On 2019-02-20 04:41:34 +0000, geze...@rlgsc.com said:

> The listing of phases is a packaging question. The operative question
> is: What optimizations are performed?

And which are not.

> As John R has mentioned elsewhere, it is a matter of emitting the
> correct metadata.

Though that may well turn out that we're all taking the slow path, for
some definitions of "correct metadata"...

We'll see, of course.

And I expect the defaults for the early releases will be less aggressive.

That might (will?) change over time, and as experience is gained.

John Reagan

unread,

Feb 20, 2019, 11:19:12 AM2/20/19

to

I went and checked Rob's 3-ring binder DECSTD 032 in his office. It is "Rev D" from 1985 and it is now in Chapter 8 and seems quite complete (at least to me). My online version is older.

already...@yahoo.com

unread,

Feb 20, 2019, 11:33:41 AM2/20/19

to

So, what are the rules? Something equivalent to Sequential Consistency?
Or even stricter?

John Reagan

unread,

Feb 20, 2019, 12:26:39 PM2/20/19

to

I'll have to scan/OCR it since all I have is paper copy. I can scan later this week and put some output in my Dropbox...

already...@yahoo.com

unread,

Feb 20, 2019, 12:56:48 PM2/20/19

to

Would be great, Github even greater.

Stephen Hoffman

unread,

Feb 20, 2019, 1:28:55 PM2/20/19

to

On 2019-02-20 14:16:54 +0000, already...@yahoo.com said:

> Did VAX had formally (or semi-formally, as x86) specified memory
> ordering rules?

Yes, and in some detail. Those rules were documented in DEC Standard
32, and in (some versions of) the VAX Architecture Handbook. The
latter series of documentation was less detailed and sometimes too-less
detailed. A few versions of the VAX handbook were... bookshelf filler.

Here's the canonical DEC Standard 32, the VAX architecture reference:
https://archive.org/details/bitsavers_decvaxarch32Jan90_36555387

Quick comparison of differences around memory re-ordering:
https://en.wikipedia.org/wiki/Memory_ordering

We probably won't see another major architecture that was as aggressive
as was Alpha.

> Or people just relied on actual implementations doing very little
> re-ordering without any architectural guarantees for the future?

There was a lot of app code that was broken by upgrades to faster VAX
processors, and a lot of code also was broken by migrating to VAX
multiprocessors.

Latent app bugs.

Back in the V5 era, I had device drivers that seemingly broke with any
substantive VAX processor upgrade.

And the VAXstation 3520 and VAXstation 3540 series—the symmetric
multiprocessing workstations—exposed yet more latent bugs in
workstation apps.

Some of us were wrestling with VCG and related tooling. One version of
VAX C tossed a gratuitous dereference into some C code I'd written,
when the first entry in the referenced structure was itself a pointer.
That ACCVIO was fun to find. Ended up reading VAX assembler to find
that one. There were various other bugs in VCG, too. As for code
optimization, VCG never could manage to optimize the canonical MD5
source code, for instance.

GEM on Alpha had a bug around the generated EV6 code. Too many
instructions between the load-locked and the store-conditional, and EV6
noticed that where earlier Alpha processors hadn't. See the SRM_CHECK
tool for what was done to remediate that.

Once the generated code was working, then synchronization and memory
ordering and caching and interlocked memory locks got to be interesting.

There was a wonderfully subtle bug a while back, with two adjacent
variables were being accessed in some concurrent code, and the
variables were getting torn because of that adjacency. Fun with
granularity.

Old VAX code can tend to be pretty buggy. There's a *lot* of broken
VAX C code around. Any use of /STANDARD=VAXC should be treated as a
warning flag around code stability.

> As to x86, I suspect the biggest difference in the field of concurrency
> relatively to the previous ports would be the sheer size of the systems
> you will have to support. If I am not mistaken, so far HP and later VSI
> managed to avoid official support for anything with more than 32
> hardware threads. I don't believe that such limitations could be
> applied in x86 world. If you want to be taken seriously, you will have
> to support at least 128 hardware threads, but preferably 256.

Pragmatically, there's not a whole lot of difference between supporting
2 processors and supporting 8 processors and supporting 16, and 32, etc.

Yes, there are cases where the since-deprecated quadword-core-limited
API references for processors will have to be changed in the app source
code, certainly.

But it was going from one processor to two processors that tended to
expose latent bugs in the app code.

More generally, OpenVMS didn't do all that well past about 6 or 8
processors, and for many years. Work has been underway for decades to
break up system and device locks, but apps and OpenVMS itself has
always and will always saturate on something.

Eventually.

Which means figuring out where the app is locking and blocking, and
splitting or sharding or reworking the code.

Which was also where Galaxy on Alpha and (eventually) guests on x86-64
will be interesting.

(Yes, there was an option for guests on Itanium. OpenVMS I64 on HP-VM
on HP-UX wasn't all that popular, from what I could see of it.)

Where OpenVMS on x86-64 and where apps saturate, we'll learn.

(Tooling here isn't as good on OpenVMS as on some other platforms.
DTrace and Instruments are really quite powerful tools, for instance.)

And there's always "fun" around figuring out if the problem is a latent
bug in the app code, or if the problem is in the code generator or a
flaw in the operating system or in some other dependency.

John Reagan

unread,

Feb 20, 2019, 2:02:32 PM2/20/19

to

On Wednesday, February 20, 2019 at 1:28:55 PM UTC-5, Stephen Hoffman wrote:

>
> GEM on Alpha had a bug around the generated EV6 code. Too many
> instructions between the load-locked and the store-conditional, and EV6
> noticed that where earlier Alpha processors hadn't. See the SRM_CHECK
> tool for what was done to remediate that.
>

Not actually "too many instructions" but branches into the locked/store sequence had undefined behavior. It "worked" on pre-EV6, but EV6 and beyond did not.

already...@yahoo.com

unread,

Feb 21, 2019, 5:15:54 AM2/21/19

to

On Wednesday, February 20, 2019 at 8:28:55 PM UTC+2, Stephen Hoffman wrote:
> On 2019-02-20 14:16:54 +0000, already...@yahoo.com said:
>
> > Did VAX had formally (or semi-formally, as x86) specified memory
> > ordering rules?
>
> Yes, and in some detail. Those rules were documented in DEC Standard
> 32, and in (some versions of) the VAX Architecture Handbook. The
> latter series of documentation was less detailed and sometimes too-less
> detailed. A few versions of the VAX handbook were... bookshelf filler.
>
> Here's the canonical DEC Standard 32, the VAX architecture reference:
> https://archive.org/details/bitsavers_decvaxarch32Jan90_36555387
>

That's a document that John mentioned above:

[This section will be added by a later ECO.]

I have no idea what is "ECO".

> Quick comparison of differences around memory re-ordering:
> https://en.wikipedia.org/wiki/Memory_ordering
>

Very shallow. And VAX is not mentioned at all.

> We probably won't see another major architecture that was as aggressive
> as was Alpha.
>

According to my understanding, the implementation of memory ordering on EV6 and EV7 (or, may be, only in EV7 ?) was not really weak. Probably pretty similar to today's Intel/AMD with exception of non-coherent instruction cache.
But they preferred to not codify new more strict behavior in Alpha architecture books.

Cache line tearing?
I had a misfortune to suffer from such thing on much smaller devices. But in my case it was totally my own fault, because architecture explicitly stated no cache coherence between CPU and I/O bus masters.

> Old VAX code can tend to be pretty buggy. There's a *lot* of broken
> VAX C code around. Any use of /STANDARD=VAXC should be treated as a
> warning flag around code stability.
>
> > As to x86, I suspect the biggest difference in the field of concurrency
> > relatively to the previous ports would be the sheer size of the systems
> > you will have to support. If I am not mistaken, so far HP and later VSI
> > managed to avoid official support for anything with more than 32
> > hardware threads. I don't believe that such limitations could be
> > applied in x86 world. If you want to be taken seriously, you will have
> > to support at least 128 hardware threads, but preferably 256.
>
> Pragmatically, there's not a whole lot of difference between supporting
> 2 processors and supporting 8 processors and supporting 16, and 32, etc.
>

I have to disagree.
You need a minimum of 4 processors in order to just illustrate a difference between iAMD64 memory ordering and sequential consistency.

> Yes, there are cases where the since-deprecated quadword-core-limited
> API references for processors will have to be changed in the app source
> code, certainly.
>

I suppose, you are talking about APIs related to affinity?

> But it was going from one processor to two processors that tended to
> expose latent bugs in the app code.
>
> More generally, OpenVMS didn't do all that well past about 6 or 8
> processors, and for many years. Work has been underway for decades to
> break up system and device locks, but apps and OpenVMS itself has
> always and will always saturate on something.
>
> Eventually.
>

Sounds pessimistic.

Jan-Erik Söderholm

unread,

Feb 21, 2019, 7:59:39 AM2/21/19

to

Den 2019-02-21 kl. 11:15, skrev already...@yahoo.com:

>
> I have no idea what is "ECO".
>

https://en.wikipedia.org/wiki/Engineering_change_order

Stephen Hoffman

unread,

Feb 21, 2019, 12:44:22 PM2/21/19

to

On 2019-02-21 10:15:52 +0000, already...@yahoo.com said:

> On Wednesday, February 20, 2019 at 8:28:55 PM UTC+2, Stephen Hoffman wrote:
>> On 2019-02-20 14:16:54 +0000, already...@yahoo.com said:
>>
>>> Did VAX had formally (or semi-formally, as x86) specified memory
>>> ordering rules?
>>
>> Yes, and in some detail. Those rules were documented in DEC Standard
>> 32, and in (some versions of) the VAX Architecture Handbook. The
>> latter series of documentation was less detailed and sometimes too-less
>> detailed. A few versions of the VAX handbook were... bookshelf filler.
>>
>> Here's the canonical DEC Standard 32, the VAX architecture reference:
>> https://archive.org/details/bitsavers_decvaxarch32Jan90_36555387
>>
>
> That's a document that John mentioned above:
> [This section will be added by a later ECO.]
> I have no idea what is "ECO".

That's DECspeak for "patch", more or less.

Engineering Change Order or ECO was a change initiated from or
distributed from DEC central engineering, as differentiated from Field
Change Order or FCO and which was a change that was initiated from or
that was distributed from DEC field service and that happened to
systems in the field. FCOs usually involved hardware or sometimes
firmware, variously involved site visits and tools, and tended to get
expensive as the numbers of customer sites involved in the required
change increased.

There was once a DEC Dictionary around, and a DEC Software Engineering
Handbook (1988, IIRC) around, so that Digits could keep up on
DECjargon, and on DEC processes and procedures.

>> Quick comparison of differences around memory re-ordering:
>> https://en.wikipedia.org/wiki/Memory_ordering
>
> Very shallow. And VAX is not mentioned at all.

Oddly enough, various other processors that were retired a
~quarter-century ago also aren't mentioned.

There's a version of the VAX rules of memory references starting around
page 269 here:
http://www.bitsavers.org/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf

The VAX rules were concisely documented and often rather murky to read,
and little (none?) of the available developer tooling helped the
developer avoid related coding mistakes.

VAX wasn't superscaler and didn't reorder instructions and didn't
reorder memory references. And instructions set those bitflags, which
also made things slow.

Rattle around under the following path for some DEC-internal
discussions of where VAX and VMS were both found lacking, and why, and
what was planned, and for what effectively turned into NT...
http://www.bitsavers.org/pdf/dec/prism/

There are issues and problems discussed there that OpenVMS still
doesn't handle very well, too.

>> We probably won't see another major architecture that was as aggressive
>> as was Alpha.
>>
>
> According to my understanding, the implementation of memory ordering on
> EV6 and EV7 (or, may be, only in EV7 ?) was not really weak. Probably
> pretty similar to today's Intel/AMD with exception of non-coherent
> instruction cache.
> But they preferred to not codify new more strict behavior in Alpha
> architecture books.

All sorts of memory re-ordering and coalescence was permissible with
Alpha, and which meant the use of barriers was necessary.

Two versions of the same discussion:
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf
https://www.linuxjournal.com/article/8212

x86 was, is, and will likely remain nowhere near as aggressive as was Alpha.

>> There was a wonderfully subtle bug a while back, with two adjacent
>> variables were being accessed in some concurrent code, and the
>> variables were getting torn because of that adjacency. Fun with
>> granularity.
>>
>
> Cache line tearing?
> I had a misfortune to suffer from such thing on much smaller devices.
> But in my case it was totally my own fault, because architecture
> explicitly stated no cache coherence between CPU and I/O bus masters.

See the discussion of /ALIGNMENT and /GRANULARITY here:
http://h30266.www3.hpe.com/odl/i64os/opsys/vmsos84/5841/5841pro_075.html

VAX had a version of this, around tearing and natural alignment.
http://www.itec.suny.edu/scsys/vms/ovmsdoc072/72final/6493/6101pro_007.html

But again, VAX was far less aggressive than Alpha.

>> Pragmatically, there's not a whole lot of difference between supporting
>> 2 processors and supporting 8 processors and supporting 16, and 32, etc.
>
> I have to disagree.
> You need a minimum of 4 processors in order to just illustrate a
> difference between iAMD64 memory ordering and sequential consistency.

Disagree all you want. From my own code and from what I've worked on
elsewhere, going from one processor to two processors broke a whole lot
of poorly-synchronized code.

Code that's correctly marked as volatile and correctly generated can
still have issues with scaling.

Memory ordering and code generation are certainly aspects of what can
go wrong with OpenVMS app designs, but these are very far from the only
pitfalls that developers encounter.

Old VAX code was often buggy, and more than a little of what's left—on
VAX, or what's still being built with /STANDARD=VAXC, is usually still
buggy.

Pile security considerations atop all this—security deals with the
sorts of bugs that will effectively spread across multiple sites and
that will increase in frequency and prevalence, and sometimes increase
very quickly, as differentiated from what usually happens with the more
traditional sorts of bugs—and thus vulnerabilities usually get handled
somewhat differently.

>> Yes, there are cases where the since-deprecated quadword-core-limited
>> API references for processors will have to be changed in the app source
>> code, certainly.
>
> I suppose, you are talking about APIs related to affinity?

No, I'm referring to some of the older code around that called system
service APIs and passed around processors (cores, threads) as quadword
masks; see SYI$_ACTIVE_CPU_MASK. et al.

There's code around that assumed 32- or 64-processor configurations
were the limit. Same as the eight-byte-password-hash mess. Parts of
the OpenVMS internals once also used quadwords here, though that's
reportedly been remediated.

>> But it was going from one processor to two processors that tended to
>> expose latent bugs in the app code.
>>
>> More generally, OpenVMS didn't do all that well past about 6 or 8
>> processors, and for many years. Work has been underway for decades to
>> break up system and device locks, but apps and OpenVMS itself has
>> always and will always saturate on something.
>>
>> Eventually.
>
> Sounds pessimistic.

Welcome to parallelism. Very few apps scale linearly with cores.
Adding cores often doesn't scale linearly either, particularly with
system designs that seek to provide cache coherency. Programming and
scheduling gets more interesting as memory access becomes non-uniform,
and OpenVMS has been dealing with NUMA designs for a while. Many apps
dealing with NUMA, not so much. Things get even more interesting when
you're working with a mix of very different types of processors of
different architectures accessing and sharing memory, too. And MPSYNC
state and friends are not at all unusual on larger OpenVMS
multiprocessors.

http://aviral.lab.asu.edu/non-coherent-cache-multi-core-processors/
http://people.ee.duke.edu/~sorin/papers/tr2013-1-coherence.pdf
http://www.archive.ece.cmu.edu/~ece600/lectures/lecture17.pdf
http://www-5.unipv.it/mferretti/cdol/aca/Charts/07-multiprocessors-MF.pdf
etc...

Jan-Erik Söderholm

unread,

Feb 21, 2019, 5:47:33 PM2/21/19

to

Den 2019-02-21 kl. 18:44, skrev Stephen Hoffman:
> On 2019-02-21 10:15:52 +0000, already...@yahoo.com said:
>
>> On Wednesday, February 20, 2019 at 8:28:55 PM UTC+2, Stephen Hoffman wrote:
>>> On 2019-02-20 14:16:54 +0000, already...@yahoo.com said:
>>>
>>>> Did VAX had formally (or semi-formally, as x86) specified memory
>>>> ordering rules?
>>>
>>> Yes, and in some detail. Those rules were documented in DEC Standard
>>> 32, and in (some versions of) the VAX Architecture Handbook. The latter
>>> series of documentation was less detailed and sometimes too-less
>>> detailed. A few versions of the VAX handbook were... bookshelf filler.
>>>
>>> Here's the canonical DEC Standard 32, the VAX architecture reference:
>>> https://archive.org/details/bitsavers_decvaxarch32Jan90_36555387
>>>
>>
>> That's a document that John mentioned above:
>> [This section will be added by a later ECO.]
>> I have no idea what is "ECO".
>

> That's DECspeak...

Not at all, it has nothing at all to do with DEC.
It is common all though the industry.
Of course DEC used it to...

https://en.wikipedia.org/wiki/Engineering_change_order