DCL crashing bug update

Simon Clubley

unread,

Jul 24, 2017, 9:50:34 PM7/24/17

to

I finally found some time to look in some detail at the DCL bug I found
and have managed to get some firm answers as well as a surprisingly
simple reproducer now that I understand the problem better.

There are also some questions for VSI at the end of this post.

On the plus side, I can't think of a way to turn this into a security
vulnerability because it isn't what I originally thought it was. On the
minus side, it turns out that Unix isn't the only operating system with
an in-band null character (ie: 0x00) problem. :-)

Basically, if you craft a specific sequence of command lines with lots
of embedded null characters in one of the command lines and inject
those command lines directly into DCL, then you will get a crash on
VMS Alpha v8.4. I don't know if VAX or IA64 is also affected.

Before going any further, here's a sample command procedure which
demonstrates the problem. READ THE NOTES FULLY BEFORE USING.

=======================================================================
$ set verify
$ !
$ ! WARNING: You run this command procedure and use the output from it at
$ ! your own risk! Do NOT run this command procedure or the bug triggering
$ ! sequence on production systems or on any system which is important to you.
$ !
$ ! While exploring this bug on VMS Alpha running under the FreeAXP emulator,
$ ! I experienced multiple crashes during a reboot. I do not know for sure
$ ! if this has anything to do with the triggering of this bug or not so
$ ! you have been warned.
$ !
$ ! ==> To be safe, I recommend you perform a reboot immediately after you
$ ! have finished examining this bug even though I am having a hard time
$ ! seeing how supervisor mode code could mess up the system to this level
$ ! so for now I am assuming this is probably just an emulator artifact.
$ !
$ ! Run this command procedure to generate simon_test.tmp and then do the
$ ! following:
$ !
$ ! $ recall/erase
$ ! $ recall/input=simon_test.tmp
$ ! $ recall/all
$ !
$ ! At this point, DCL crashes on me when using the VMS Alpha v8.4 hobbyist
$ ! distribution.
$ !
$ ! Simon.
$ !
$ set noverify
$ write sys$output "Generating test file"
$
$ open/write outch simon_test.tmp
$ call generate_line "2ab" 1197 0
$ call generate_line "" 1500 66
$ call generate_line "" 1400 67
$
$ close outch
$ write sys$output "Test file generated"
$ exit
$
$ generate_line:
$ subroutine
$ chr[0,7] = f$integer(p3)
$ oline = p1
$ i = 1
$
$ next_char:
$ if i .gt. p2 then $ goto line_complete
$ oline = oline + chr
$ i = i + 1
$ goto next_char
$
$ line_complete:
$ write/symbol outch oline
$ exit
$
$ endsubroutine
=======================================================================

and here is the crash I get with the above sequence:

[snip]

$ set noverify
Generating test file
Test file generated
$ recall/erase
$ recall/input=simon_test.tmp
$ recall/all

[snip first two command lines which are ok]

3

Improperly handled condition, bad stack or no handler specified.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000000
000000007FFC9CD0
000000007AF96F98
0000000000000012

Register dump:
R0 = 000000007FF9CC30 R1 = 000000007FF9FEA3 R2 = 000000000000000A
R3 = 000000007FF9CC30 R4 = 000000007AEF85F0 R5 = 0000203320200000
R6 = 0000000000000001 R7 = 000000007FFA4F28 R8 = 000000007FF9CDE8
R9 = 000000007FFC9CD1 R10 = 000000007FFA4F28 R11 = 000000007FFCDC18
R12 = 000000007FFCDA98 R13 = 00000000000000FB R14 = 0000000000000000
R15 = 000000000000002F R16 = 000000007FFCEDE2 R17 = 00000000000011CA
R18 = 000000007FFC8CAF R19 = 0000000000000001 R20 = 000000007FFC8CB1
R21 = 000000007AEF84B0 R22 = 000000000000612F R23 = 000000007FF9FEA4
R24 = 000000007FF9FEA6 R25 = 0000000000000001 R26 = 000000007AF97268
R27 = 000000007AEF84B0 R28 = 000000007AF97404 R29 = 000000007FF9CC00
SP = 000000007FF9CC00 PC = 000000007AF96F98 PS = 0000000000000012

My test payloads were generated using HP Pascal programs because it's
a lot easier to work with binary records with nulls embedded in them
in Pascal than in C. The above command procedure is just a Q&D
command procedure so you can easily explore this bug if you wish.

Note that it's only 0x00 which causes the crash. If you generate a
file with the same structure but with 0x00 replaced with 0x01 then
DCL does not crash (although it doesn't show the command line with
the 0x01 characters embedded in it).

The DCL command history buffer is a fixed size wrap-around buffer.
Originally, I thought this was a problem with my fuzzing data causing
DCL to get confused about where the start of the buffer is[*], but my
attempts to embed a valid internal representation of a DCL command
within another larger command always resulted in DCL displaying the
larger command correctly (and ignoring the embedded command) so it
became clear that wasn't the problem.

[*] I quickly found the longword which identifies the next free byte
to write to within the buffer (it's immediately prior to the buffer
data itself) but I couldn't find anything which identified the offset
to the first command within the history buffer area. This is why
I originally thought DCL might be scanning for all the 0x00 characters
which are immediately prior to the first command when the first command
is not at the start of the buffer.

If you want to examine the command history buffer from another process
then running SDA, setting the target PID you want to examine and then
using the following command will get you started:

SDA> examine CTL$AG_CLIDATA;1400

Also note that if you increase the first record from 1200 to 1300 bytes,
I get buffer wrap-around and if the first record is all nulls then the
trailing line length field is not removed from the command history when
the first record is purged from the history. Also note that this does
not occur if 0x01 is used instead of 0x00 (for example).

My questions for VSI are these:

1) I've been concerned for some time that as new functionality was
added to VMS then maybe this new functionality allowed the existing
functionality to be manipulated in ways that could cause problems
because the new functionality was never considered as part of the
original design and security audit process.

DCL clearly has code which assumes it is never going to see a stream
of 0x00 characters because you could probably never get those characters
into the history buffer via the terminal driver. However, $ recall/input
now gives you the ability to directly inject any characters you choose
directly into the DCL command history buffer.

When adding new functionality to VMS, how much work is done to evaluate
the original design assumptions in light of any new functionality ?

2) While this specific problem turned out not to be an actual vector
for a security exploit, does this same style of buffer management
exist elsewhere in VMS and in a form which could cause problems if
the utility/command allows direct injection of command history ?

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Simon Clubley

unread,

Jul 24, 2017, 10:02:22 PM7/24/17

to

On 2017-07-24, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
> $ !
> $ ! While exploring this bug on VMS Alpha running under the FreeAXP emulator,
> $ ! I experienced multiple crashes during a reboot. I do not know for sure
> $ ! if this has anything to do with the triggering of this bug or not so
> $ ! you have been warned.
> $ !
> $ ! ==> To be safe, I recommend you perform a reboot immediately after you
> $ ! have finished examining this bug even though I am having a hard time
> $ ! seeing how supervisor mode code could mess up the system to this level
> $ ! so for now I am assuming this is probably just an emulator artifact.
> $ !

About those crashes. Here's one that occurred this weekend during
a reboot:

============================================================================
Avanti System Machine Check Through Vector 00000660
logout frame address 0x6048 code 0x10000020c

IPRs:
EXC_ADD:0000000000110904 ICCSR: 000044F800000004 HIER: 0000000000000080
HIRR: 0000000000001040 MM_CSR: 00000000000051C0 DC_STAT:0000000000000007
DC_ADDR:0000000000000000 BIU_STAT:00000000000000C1 BIU_ADD:0000000045210000
FILL_SY:0000000000000000 FILL_ADD:0000000000000000 VA: 0000000000106C98
EXC_SUM:0000000000000000 BC_TAG: 0000000000000000
EDSR (Comanche): 00002008--> ,nxm error
DCSR ( Epic): 8000001c-->
SEAR ( SysAddr): 00000000
PEAR ( PciAddr): 00000000
============================================================================

The above repeated itself over and over and I eventually had to
shutdown FreeAXP because the console was not responding.

The next one was a VMS crash which occurred a couple of weeks ago when
I last looked at this problem.

============================================================================
Crashdump Summary Information:
------------------------------
Crash Time: 9-JUL-2017 12:02:55.00
Bugcheck Type: MACHINECHK, Machine check while in kernel mode
Node: AXPA (Standalone)
CPU Type:
VMS Version: V8.4
Current Process: NULL
Current Image: <not available>
Failing PC: FFFFFFFF.800157A4 EXE$SYSTEM_CORRECTED_ERROR_C+00154
Failing PS: 00000000.00001F04
Module: SYS$CPU_ROUTINES_0D02 (Link Date/Time: 31-MAR-2010 02:39:0
7.40)
Offset: 000017A4

Boot Time: 17-NOV-1858 00:00:00.00
System Uptime: 0 00:07:09.49
Crash/Primary CPU: 0./0.
System/CPU Type: 0D02
Saved Processes: 0
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 128 MByte (16384 PFNs, contiguous memory)
Dumpfile Pagelets: 13498 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,bugdump
Paging Files: 0 Pagefiles and 0 Swapfiles installed

Stack Pointers:
KSP = FFFFFFFF.831FBF00 ESP = FFFFFFFF.831FD000 SSP = FFFFFFFF.831F7000
USP = FFFFFFFF.831F7000

General Registers:
R0 = 00000000.0000004B R1 = 00000000.00000210 R2 = FFFFFFFF.8183EE38
R3 = FFFFFFFF.8183F290 R4 = 00000000.00000048 R5 = 00000000.00001F04
R6 = FFFFFFFF.81C16000 R7 = 00000000.00000800 R8 = 00000000.00000000
R9 = 00000000.2009A000 R10 = FFFFFFFF.81C172D0 R11 = FFFFFFFF.8300E000
R12 = FFFFFFFF.81C16E40 R13 = FFFFFFFF.81C16000 R14 = FFFFFFFF.8325ADE8
R15 = FFFFFFFF.81808000 R16 = 00000000.00000215 R17 = 00000000.00004000
R18 = 00000000.00000002 R19 = 00000000.00000002 R20 = 00000000.00000004
R21 = 81C12080.00000000 R22 = 00000000.00000000 R23 = FFFFFFFF.8180A0F8
R24 = 00000000.00000000 AI = 00000000.00000001 RA = FFFFFFFF.80015708
PV = FFFFFFFF.818C07D8 R28 = 00000000.00000000 FP = FFFFFFFF.831FBF00
PC = FFFFFFFF.800157A8 PS = 00000000.00001F04
============================================================================

As already mentioned, I don't know if this is caused by the DCL crash
or (more likely) is an artifact of the emulator itself.

clairg...@gmail.com

unread,

Jul 25, 2017, 6:38:04 AM7/25/17

to

I may be a little confused here but........I executed your command procedure and then the recall commands and got a process crash but the system is still up and running. Are you saying you got a system crash?

BTW: I am on an i2 running 8.4-2L1.

Clair

abrsvc

unread,

Jul 25, 2017, 6:58:39 AM7/25/17

to

I think that the key point here may have been missed in the comments:

"While exploring this bug on VMS Alpha running under the FreeAXP emulator"

I plan on trying this on a real Alpha this week to confirm. I too would expect a process crash not a system issue.

Dan

Volker Halle

unread,

Jul 25, 2017, 7:07:03 AM7/25/17

to

Simon,

I can easily reproduce the process crash on OpenVMS Alpha V8.2 running on a CHARON-AXP A1000 emulated Alpha.

The failing instruction is inside DCL:

SDA> exa/ins 7AF98598
DCL+8A598: LDQ_U R21,(R9)

with R9 pointing to an invalid address

Improperly handled condition, bad stack or no handler specified.
Signal arguments: Number = 0000000000000005

Name = 000000000000000C <<< Access violation
0000000000000000
000000007FFC9C78 <<< invalid address
000000007AF98598 <<< failing PC
0000000000000012

Register dump:
R0 = 000000007FF9CC20 R1 = 000000007AF0E1E0 R2 = 000000000000000A
R3 = 000000007FF9CC20 R4 = 000000007AF0E320 R5 = 0000203320205324
R6 = 0000000000000001 R7 = 000000007FF9CD3C R8 = 000000007FFA4F28
R9 = 000000007FFC9C79<<< R10 = 000000007FFA4F28 R11 = 000000007FFCDBE8
R12 = 000000007FFCDA68 R13 = FFFFFFFFFB03C08C R14 = 00000000000000FB
R15 = 000000000000002F R16 = 000000007FFCED8A R17 = 00000000000011A2
R18 = 000000007FFC8C57 R19 = 0000000000000001 R20 = 000000007FFCED88
R21 = 000000007FFC8C59 R22 = 000000007FFCED88 R23 = 000000007FF9FEA4
R24 = 000000000000612F R25 = 0000003400000000 R26 = 000000007AF98820
R27 = 000000007AF0E1E0 R28 = 000000007AF989AC R29 = 000000007FF9CBF0
SP = 000000007FF9CBF0 PC = 000000007AF98598 PS = 3000000000000012

The MACHINECHK Crash you've seen is most likely NOT related to this Problem.
A MACHINECHK Crash is typically a 'hardware' Problem. In this case (running on the FreeAXP Emulator), it's most likely an Emulator Software Problem.

Did you get a crashdump written ? Did you extract the errlog Information from the Crash (using SDA> CLUE ERRLOG) and analyze CLUE$ERRLOG.SYS with DECevent ? This would tell you more about the underlying reason for the MACHINECHK.

Volker.

Jan-Erik Soderholm

unread,

Jul 25, 2017, 7:09:03 AM7/25/17

to

I think Simon has talked about a "crash in DCL" *in the specific process*
all the time. Not any system related crash. The issue was if it was
possible to execute something in the that process while in SUP mode...

Volker Halle

unread,

Jul 25, 2017, 7:20:55 AM7/25/17

to

Am Dienstag, 25. Juli 2017 03:50:34 UTC+2 schrieb Simon Clubley:
> .... I don't know if VAX or IA64 is also affected.

OpenVMS VAX (V7.3) is not affected in the same way, as it gets %DCL-W-BUFOVF errors, when trying to create the SIMON_TEST.TMP file.

Volker.

Volker Halle

unread,

Jul 25, 2017, 7:41:39 AM7/25/17

to

The DCL process crash happens in [DCL]RECALLSUB GLOBAL Routine dcl$put_segment in the following instruction stream:

SDA> exa/ins 7AF98598-20;20
DCL+8A578: STQ FP,#X0018(SP)
DCL+8A57C: BIS R31,SP,FP
DCL+8A580: BIS R31,R27,R1
DCL+8A584: LDL R9,#XFFD8(R10)
DCL+8A588: ADDL R9,#X02,R9
DCL+8A58C: SUBL R9,R18,R18
DCL+8A590: CMPULT R9,R16,R19
DCL+8A594: CMOVEQ R19,R18,R9
DCL+8A598: LDQ_U R21,(R9) <<< ACCVIO here

I'll leave it to the current DCL maintainer to debug and fix this 'artificially created problem'.

Volker.

VAXman-

unread,

Jul 25, 2017, 8:53:41 AM7/25/17

to

Proving the age old axiom: garbage in; garbage out.

I'll have a look at this in a bit.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.

dgordo...@gmail.com

unread,

Jul 25, 2017, 9:27:57 AM7/25/17

to

On Tuesday, July 25, 2017 at 7:41:39 AM UTC-4, Volker Halle wrote:

>
> I'll leave it to the current DCL maintainer to debug and fix this 'artificially created problem'.
>
> Volker.

Noted. (Although I don't claim to maintain it, I'm just the most likely person to look at it.)

WaysIKnowToCrashDCL++;

Stephen Hoffman

unread,

Jul 25, 2017, 11:05:15 AM7/25/17

to

On 2017-07-25 01:46:31 +0000, Simon Clubley said:

> On the plus side, I can't think of a way to turn this into a security
> vulnerability because it isn't what I originally thought it was. On the
> minus side, it turns out that Unix isn't the only operating system with
> an in-band null character (ie: 0x00) problem. :-)

This is also an example of multiple disjoint parser implementations
processing related data, and of problems that arise when a parser fails
when reading untrusted data. In the present and likely future
environment, more than a little data is untrusted. This includes
files, whole volumes, network connections, and suchlike. Similar
disjoint parser implementations have nailed other platforms. As have
problems in parsing routines presented with data that wasn't quite
entirely as was expected.

Crashes are always interesting, whether an attacker or as a defender.
Whether this local denial-of-service case is exploitable? Donno.
Probable worst case here would be a local privilege escalation, and
that seems somewhat unlikely.

--
Pure Personal Opinion | HoffmanLabs LLC

VAXman-

unread,

Jul 25, 2017, 11:37:22 AM7/25/17

to

FWIW, this is an error within the population of the DCL recall comand buffer;
not in the displaying of it.

David Froble

unread,

Jul 25, 2017, 12:30:01 PM7/25/17

to

If you trap the condition, the proper message might be "don't be an idiot".

:-)

David Froble

unread,

Jul 25, 2017, 12:32:45 PM7/25/17

to

I think Simon wrote:

"About those crashes. Here's one that occurred this weekend during a reboot:"

So, most likely two separate things.

Simon Clubley

unread,

Jul 25, 2017, 3:12:41 PM7/25/17

to

On 2017-07-25, clair...@vmssoftware.com <clairg...@gmail.com> wrote:
>
> I may be a little confused here but........I executed your command
> procedure and then the recall commands and got a process crash but the
> system is still up and running. Are you saying you got a system crash?
>

I may have confused you by using the word crash to describe both
a process level crash and a system level crash. Sorry.

The direct result of running the recall commands with the test data
is just a process level crash and a register dump with the rest of
the system _apparently_ unaffected after the process was terminated.

A process level crash only was the expected result and the rest of
the system apparently continued running ok afterwards.

What was _completely_ unexpected was that I later rebooted the system
for some reason or other and the system crashed during the reboot.

The reboot command is the standard one in the SYSTEM login.com:

$ REBOOT == "@SYS$SYSTEM:SHUTDOWN 0 SHUTDOWN NO YES LATER YES NONE"

I don't normally reboot this system; I normally just do a straight boot
and then shutdown VMS when I've finished. As such, I don't know if there
is a bug in the version of FreeAXP I am using which can cause a crash
during a reboot.

So far, I have only ever seen a system crash during the reboot process
after I have run my recall tests since the last boot and I only see
those infrequently. The examples I posted earlier are from crashes which
occurred after using the reboot command while logged into SYSTEM.

Given the infrequent and random nature of the system level crashes,
I am going to assume this is merely some emulator artifact until
someone reports they have caused a system crash while rebooting their
physical Alpha system.

> BTW: I am on an i2 running 8.4-2L1.
>

So it affects IA64 as well. Thanks for the report.

Given that it affects IA64, should I notify HPE or is this something
you can pass along to them ?

I'm not fully sure what would be an appropriate email address to
use to make sure HPE's VMS Engineering got the problem report.

It's not a security issue so using the HPE security reporting
mechanism would be inappropriate.

Thanks,

Simon Clubley

unread,

Jul 25, 2017, 4:08:50 PM7/25/17

to

On 2017-07-25, Volker Halle <volker...@hotmail.com> wrote:
> Simon,
>
> I can easily reproduce the process crash on OpenVMS Alpha V8.2 running on a
> CHARON-AXP A1000 emulated Alpha.
>

Yet another data point thanks.

>
> The MACHINECHK Crash you've seen is most likely NOT related to this Problem.

I agree. I think it's some emulator artifact.

> A MACHINECHK Crash is typically a 'hardware' Problem. In this case (running
> on the FreeAXP Emulator), it's most likely an Emulator Software Problem.
>
> Did you get a crashdump written ? Did you extract the errlog Information from
> the Crash (using SDA> CLUE ERRLOG) and analyze CLUE$ERRLOG.SYS with DECevent ?
> This would tell you more about the underlying reason for the MACHINECHK.
>

I don't really want to faff around with installing DECevent for what
is almost certainly an emulation problem. I did do a:

anal/error/elv translate clue$errlog.sys/output=a.lis

and the results are below in case they tell you something.

Output file SYS$SYSROOT:[SYSMGR]A.LIS;1 created at 25-JUL-2017 20:08:14.95

Output for SYS$SYSROOT:[SYSMGR]CLUE$ERRLOG.SYS;1

EVENT EVENT_TYPE_____________________________ TIMESTAMP______________ NODE__ EVENT_CLASS____________________________
1 Machine Check 670 - Processor UCE 9-JUL-2017 12:02:55.00 AXPA MACHINE_CHECKS

DESCRIPTION__________________________________ RANGE___ VALUE_____________ TRANSLATED_VALUE_______________________
Hardware Architecture 4 Alpha
Hardware System Type 13
Logging CPU 0
Number of CPU's in Active Set 1
System Marketing Model 1152 AlphaStation 400 4/233
Seconds Since Boot 0
Chip Type 2 EV4 (21064)
Error Sequence Number 0
Operating System Version V8.4

EVENT EVENT_TYPE_____________________________ TIMESTAMP______________ NODE__ EVENT_CLASS____________________________
2 Crash Restart 9-JUL-2017 12:02:55.00 AXPA BUGCHECKS

DESCRIPTION__________________________________ RANGE___ VALUE_____________ TRANSLATED_VALUE_______________________
Hardware Architecture 4 Alpha
Hardware System Type 13
Logging CPU 0
Number of CPU's in Active Set 0
System Marketing Model 1152 AlphaStation 400 4/233
Seconds Since Boot 0
Chip Type 2 EV4 (21064)
Error Sequence Number 1
Operating System Version V8.4

Kernel Stack Pointer 0xFFFFFFFF831FBF00
Executive Stack Pointer 0xFFFFFFFF831FD000
Supervisor Stack Pointer 0xFFFFFFFF831F7000
User Stack Pointer 0xFFFFFFFF831F7000
Register R0 0x000000000000004B
Register R1 0x0000000000000210
Register R2 0xFFFFFFFF8183EE38
Register R3 0xFFFFFFFF8183F290
Register R4 0x0000000000000048
Register R5 0x0000000000001F04
Register R6 0xFFFFFFFF81C16000
Register R7 0x0000000000000800
Register R8 0x0000000000000000
Register R9 0x000000002009A000
Register R10 0xFFFFFFFF81C172D0
Register R11 0xFFFFFFFF8300E000
Register R12 0xFFFFFFFF81C16E40
Register R13 0xFFFFFFFF81C16000
Register R14 0xFFFFFFFF8325ADE8
Register R15 0xFFFFFFFF81808000
Register R16 0x0000000000000215
Register R17 0x0000000000004000
Register R18 0x0000000000000002
Register R19 0x0000000000000002
Register R20 0x0000000000000004
Register R21 0x81C1208000000000
Register R22 0x0000000000000000
Register R23 0xFFFFFFFF8180A0F8
Register R24 0x0000000000000000
Register R25 0x0000000000000001
Register R26 0xFFFFFFFF80015708
Register R27 0xFFFFFFFF818C07D8
Register R28 0x0000000000000000
Frame Pointer 0xFFFFFFFF831FBF00
Current Stack Pointer 0xFFFFFFFF831FBF00
Program Counter 0xFFFFFFFF800157A8
Processor Status <63:00>: 0x0000000000001F04
Interrupt Pending <02>: 0x1
Current Mode <04:03>: 0x0 Kernel
Interrupt Priority Level (IPL) 0x1F
Stack Alignment <61:56>: 0x00
Page Table Base Register (PTBR) 0x00000000000000F2
Privileged Context Block Base (PCBB) 0x0000000001016080
Processor Base Register (PRBR) 0xFFFFFFFF81C16000
Virtual Page Table Base (VPTB) 0xFFFFFEFC00000000
System Control Block Base (SCBB) 0x000000000000096B
Software Interrupt Summary (SISR) <63:00>: 0x0000000000000140
IPL 6 Interrupt Pending
IPL 8 Interrupt Pending
Address Space Number (ASN) 0
AST Enable/AST Summary (ASTEN/ASTSR) <63:00>: 0x0000000000000000
Floating-Point Enable (FEN) <63:00>: 0x0000000000000000
Interrupt Priority Level (IPL) 31
Machine Check Error Summary (MCES) <63:00>: 0x0000000000000000
Bugcheck/Crash Code <31:00>: 0x00000215
Reboot Type <00>: 0x1 COLD
Severity <02>: 0x1 FATAL
Type <31:03>: 0x00000042 MACHINECHK
Current Process ID 0x00010000
Current Process Name .NULL

ERROR_LOG_SUMMARY______________________________________________________

Total number of events: 2
Number of the first event: 1
Number of the last event: 2
Earliest event occurred: 9-JUL-2017 12:02:55.00
Latest event occurred: 9-JUL-2017 12:02:55.00
Number of events by event class:
BUGCHECKS 1
MACHINE_CHECKS 1

Simon Clubley

unread,

Jul 25, 2017, 4:16:12 PM7/25/17

to

On 2017-07-25, Volker Halle <volker...@hotmail.com> wrote:

You are correct that it is an artificially created problem.

Unfortunately, those are also the types of problems that have a habit
of exposing security issues because someone didn't anticipate some
particular input sequence during the design process.

Simon Clubley

unread,

Jul 25, 2017, 4:22:11 PM7/25/17

to

On 2017-07-25, David Froble <da...@tsoft-inc.com> wrote:
>
> I think Simon wrote:
>
> "About those crashes. Here's one that occurred this weekend during a reboot:"
>
> So, most likely two separate things.

My thinking was that maybe DCL crashing did some undetected latent
damage which didn't show up until the reboot.

However, overall I find that very unlikely for what is supervisor mode
code; it's far more likely to be an emulation issue.

dgordo...@gmail.com

unread,

Jul 25, 2017, 6:01:54 PM7/25/17

to

On Tuesday, July 25, 2017 at 4:22:11 PM UTC-4, Simon Clubley wrote:

> On 2017-07-25, David Froble wrote:
> >
> > I think Simon wrote:
> >
> > "About those crashes. Here's one that occurred this weekend during a reboot:"
> >
> > So, most likely two separate things.
>
> My thinking was that maybe DCL crashing did some undetected latent
> damage which didn't show up until the reboot.

Machine Check is a hardware error. Or in your case, emulator.

>
> However, overall I find that very unlikely for what is supervisor mode
> code; it's far more likely to be an emulation issue.

Yes.

I've looked into the recall buffer code. I know what the issue is. I've opened an internal bug report, assigned to me.

To this question:

> When adding new functionality to VMS, how much work is done to evaluate
> the original design assumptions in light of any new functionality ?

/INPUT and /OUTPUT were added in 1993 - 3 years before I originally joined VMS Engineering. "New" is relative.

Simon Clubley

unread,

Jul 26, 2017, 7:29:01 PM7/26/17

to

On 2017-07-25, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
> On Tuesday, July 25, 2017 at 4:22:11 PM UTC-4, Simon Clubley wrote:
>
> I've looked into the recall buffer code. I know what the issue is. I've
> opened an internal bug report, assigned to me.
>

Excellent. :-)

Are you able to tell HPE about it so they can work on a fix in their own
version of the code base ?

If you want me to report it to them directly, can someone associated
with HPE please tell me the best email address to use to contact HPE
VMS Engineering directly ?

It's not a security issue so using the HPE security reporting mechanism
would be inappropriate.

Thanks.

> To this question:
>
>> When adding new functionality to VMS, how much work is done to evaluate
>> the original design assumptions in light of any new functionality ?
>
> /INPUT and /OUTPUT were added in 1993 - 3 years before I originally joined VMS Engineering. "New" is relative.

It was more a general question about how much re-evaluation of any
original design assumptions is done when a new feature is added.

clairg...@gmail.com

unread,

Jul 26, 2017, 8:58:53 PM7/26/17

to

OK, I'll give this a shot. Whenever a code change is proposed, however big or small, bug fix or new feature, we try to evaluate everything that could be affected. Original intent (design) is a factor but the bottom line is how the existing code gets changed. People with relevant expertise review code and discuss. Sounds simplistic but that's what happens, has for years. Obviously, every situation is different but generally that's the way it works. Also, everything is reviewed by the test engineers to see if existing tests or new tests are need to properly exercise the changes.

Stephen Hoffman

unread,

Jul 27, 2017, 10:40:32 AM7/27/17

to

On 2017-07-27 00:58:51 +0000, clair...@vmssoftware.com said:

> OK, I'll give this a shot. Whenever a code change is proposed, however
> big or small, bug fix or new feature, we try to evaluate everything
> that could be affected. Original intent (design) is a factor but the
> bottom line is how the existing code gets changed. People with relevant
> expertise review code and discuss. Sounds simplistic but that's what
> happens, has for years. Obviously, every situation is different but
> generally that's the way it works. Also, everything is reviewed by the
> test engineers to see if existing tests or new tests are need to
> properly exercise the changes.

There's a corollary to this design and review process, too.

Years ago, we were dealing with the sorts of shenanigans arising just
from local and locally-connected folks, and most servers had staff
assigned. Our servers ran code written locally, or purchased from
vendors or third-parties. Our servers communicated locally with other
servers we controlled, over a network we controlled. Then we got
dial-up. Our servers started connecting remotely. The number of
folks accessing our servers got larger, and we were on the receiving
end of war-dialing. Which led to some of us using the "system"
system-wide password mechanism, too. Now we have planetary-scale
access to many of our servers and whether that's directly or
indirectly, and with far fewer staff covering far more servers. Our
servers are parts of massively-connected and networked systems. We're
increasingly running code from open-source archives and pre-built
images from repositories, and it's routine to be running code on
various of our local networked systems that has been loaded from remote
servers and variously from servers we don't control. Some of the
networks we're connecting to or through and even some of the servers
we're accessing can be accessible to attackers. And botnets with
massive resources continuously probing and testing our security
continuously, and with ever more sophisticated networking and reversing
tools available.

For some of us and what we're doing with our servers, we're already in
a position where we can't trust what's running on our servers, and
we're increasingly in an environment when we can't trust ourselves and
what's in our source code repositories and our installed system and
application configurations, and that's an even larger shift. Where we
have to design some of our own systems such that we can't get into them.

All code and all designs that haven't been reviewed in response to
these changes are suspect. Code that hasn't been fuzzed is suspect,
and system and application installations that aren't or can't be
cryptographically verified are all suspect. Why? Because somebody
is inevitably going to review and fuzz the code. If that's not us
reviewing our own code looking for these latent assumptions and these
now-mistakes, and verifying that our hardware and software and network
environments are as intended, well, that can be a Bad Day for us.
Because attackers don't care what combination of vulnerabilities they
chain together — some down-revision printer exploit gets the attacker
network access, network access sniffs SCS packets or private DECnet or
telnet traffic or ARP spoofs and MITM's some HTTP traffic, and those
unencrypted packets expose the known-weak Purdy password hash (yes, VSI
is reportedly addressing that) or cleartext credentials or some other
authentication, and....

I routinely end up working in source code that's ten and twenty years
and variously older. Code that was well-designed and that's been
stable for a decade or two can now be operating in system and network
environments that it was never designed nor intended nor tested for.
Some other code was vulnerable as originally designed, but was
supposedly isolated and all it takes is one bad coffee pot connected on
the supposedly-still isolated network.

Beyond design reviews and testing reviews and source code reviews and
fuzzing, we all certainly remember to actively review the state of our
supposedly-isolated and secure private networks, right? We've all
checked our switch and firewall firmware revisions? For our own
access control and badge readers, we've all retired and replaced any
125 KHz or Mifare NFC hardware, right?

Environments and vulnerabilities and risks change. So must our
responses and our implementations. So must our designs, reviews and
testing.

Simon Clubley

unread,

Jul 30, 2017, 4:09:18 PM7/30/17

to

On 2017-07-25, Volker Halle <volker...@hotmail.com> wrote:

I installed a bare-bones VAX/VMS V7.3 system under SIMH and modified
the command procedure to work with VAX/VMS. VAX isn't affected due
to the following error:

$ reca/era
$ reca/input=simon_test.tmp
%RMS-W-RTB, 0 byte record too large for user's buffer
$

dgordo...@gmail.com

unread,

Jul 31, 2017, 10:08:05 AM7/31/17

to

On Sunday, July 30, 2017 at 4:09:18 PM UTC-4, Simon Clubley wrote:

> I installed a bare-bones VAX/VMS V7.3 system under SIMH and modified
> the command procedure to work with VAX/VMS. VAX isn't affected due
> to the following error:
>
> $ reca/era
> $ reca/input=simon_test.tmp
> %RMS-W-RTB, 0 byte record too large for user's buffer
> $
>

You'll get the same on Alpha if any record in the input file exceeds 4096 bytes.

dgordo...@gmail.com

unread,

Aug 9, 2017, 12:12:01 PM8/9/17

to

On Tuesday, July 25, 2017 at 9:27:57 AM UTC-4, Some Dude from VSI wrote:

> >
> > I'll leave it to the current DCL maintainer to debug and fix this 'artificially created problem'.
> >
> > Volker.
>
> Noted. (Although I don't claim to maintain it, I'm just the most likely person to look at it.)
>
> WaysIKnowToCrashDCL++;

I've coded up changes to the RECALL code to fix the issue. The interesting thing is that after using Simon's file to load up the recall buffer, three up-arrows will successfully recall the three "commands" with no ill effects, nor does a fourth cause a process crash. Going into SDA, CLUE PROCESS/RECALL also displays the three "commands" with no trouble.

DCL expects a single NUL character to terminate the command in the work buffer. Try the following:

$ X[0,32]=0
$ Y = "This is a null ''X' test"
$ SHOW SYMBOL Y

Need to do some more testing including on Alpha.

Simon Clubley

unread,

Aug 9, 2017, 1:38:07 PM8/9/17

to

On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
>
> I've coded up changes to the RECALL code to fix the issue. The interesting thing is that after using Simon's file to load up the recall buffer, three up-arrows will successfully recall the three "commands" with no ill effects, nor does a fourth cause a process crash. Going into SDA, CLUE PROCESS/RECALL also displays the three "commands" with no trouble.
>
> DCL expects a single NUL character to terminate the command in the work buffer. Try the following:
>
> $ X[0,32]=0
> $ Y = "This is a null ''X' test"
> $ SHOW SYMBOL Y
>

[Tested on Eisner, Alpha v8.3]

Now that _is_ interesting, especially given that when DCL was being
designed, someone went to the trouble of storing the length of the
command when the command was placed into the recall buffer.

dgordo...@gmail.com

unread,

Aug 9, 2017, 1:49:31 PM8/9/17

to

On Wednesday, August 9, 2017 at 1:38:07 PM UTC-4, Simon Clubley wrote:

> On 2017-08-09, Simon wrote:

> Now that _is_ interesting, especially given that when DCL was being
> designed, someone went to the trouble of storing the length of the
> command when the command was placed into the recall buffer.
>

Don't confuse the DCL work buffer with the recall buffer. The recall buffer is structured as:

word-of-zero word-of-length command-text word-of-length [...]

This allows the recall buffer to be read forward (RECALL/OUT or down arrow) or backward (up arrow or RECALL/ALL)

The recall buffer layout has changed slightly over the years to accommodate larger commands.

Simon Clubley

unread,

Aug 9, 2017, 2:32:27 PM8/9/17

to

On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:

> On Wednesday, August 9, 2017 at 1:38:07 PM UTC-4, Simon Clubley wrote:
>> On 2017-08-09, Simon wrote:
>
>> Now that _is_ interesting, especially given that when DCL was being
>> designed, someone went to the trouble of storing the length of the
>> command when the command was placed into the recall buffer.
>>
>
> Don't confuse the DCL work buffer with the recall buffer.

Understood. The point I was making was that someone has gone to the
trouble of using counted strings in the recall buffer and I was a bit
surprised they were not therefore being used everywhere in DCL.

Bill Gunshannon

unread,

Aug 9, 2017, 5:44:47 PM8/9/17

to

On 8/9/2017 1:33 PM, Simon Clubley wrote:
> On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
>>
>> I've coded up changes to the RECALL code to fix the issue. The interesting thing is that after using Simon's file to load up the recall buffer, three up-arrows will successfully recall the three "commands" with no ill effects, nor does a fourth cause a process crash. Going into SDA, CLUE PROCESS/RECALL also displays the three "commands" with no trouble.
>>
>> DCL expects a single NUL character to terminate the command in the work buffer. Try the following:
>>
>> $ X[0,32]=0
>> $ Y = "This is a null ''X' test"
>> $ SHOW SYMBOL Y
>>
>
> [Tested on Eisner, Alpha v8.3]
>
> Now that _is_ interesting, especially given that when DCL was being
> designed, someone went to the trouble of storing the length of the
> command when the command was placed into the recall buffer.
>
> Simon.
>

Wait, your saying that something designed specifically for and used
only in VMS and not written in C uses a null terminated string!!

Imagine that. :-)

bill

David Froble

unread,

Aug 9, 2017, 6:05:54 PM8/9/17

to

Simon Clubley wrote:
> On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
>> On Wednesday, August 9, 2017 at 1:38:07 PM UTC-4, Simon Clubley wrote:
>>> On 2017-08-09, Simon wrote:
>>> Now that _is_ interesting, especially given that when DCL was being
>>> designed, someone went to the trouble of storing the length of the
>>> command when the command was placed into the recall buffer.
>>>
>> Don't confuse the DCL work buffer with the recall buffer.
>
> Understood. The point I was making was that someone has gone to the
> trouble of using counted strings in the recall buffer and I was a bit
> surprised they were not therefore being used everywhere in DCL.
>
> Simon.
>

Well, there was the guy that was working on Galaxy, and used null terminated
strings. I seem to recall that Brian almost went into cardiac arrest, but
instead had some rather choice things to say on the subject.

:-)

David Froble

unread,

Aug 9, 2017, 6:08:12 PM8/9/17

to

Good help can be so hard to find ...

:-)

Arne Vajhøj

unread,

Aug 9, 2017, 7:43:40 PM8/9/17

to

On 8/9/2017 6:08 PM, David Froble wrote:

> Bill Gunshannon wrote:
>> Wait, your saying that something designed specifically for and used
>> only in VMS and not written in C uses a null terminated string!!
>>
>> Imagine that. :-)
>

> Good help can be so hard to find ...

Given that Macro-32 has the .ASCIZ directive, then ...

Arne

Robert A. Brooks

unread,

Aug 9, 2017, 8:31:53 PM8/9/17

to

As does MACRO-10 and MACRO-11 . . .

--
-- Rob

VAXman-

unread,

Aug 9, 2017, 8:40:20 PM8/9/17

to

In article <ev1e2b...@mid.individual.net>, Bill Gunshannon <bill.gu...@gmail.com> writes:
>On 8/9/2017 1:33 PM, Simon Clubley wrote:

>> On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
>>>
>>> I've coded up changes to the RECALL code to fix the issue. The interesting thing is that after using Simon's file to load up the recall buffer, three up-arrows will successfully recall the three "commands" with no ill effects, nor does a fourth cause
>>>

>>> DCL expects a single NUL character to terminate the command in the work buffer. Try the following:
>>>
>>> $ X[0,32]=0
>>> $ Y = "This is a null ''X' test"
>>> $ SHOW SYMBOL Y
>>>
>>
>> [Tested on Eisner, Alpha v8.3]
>>
>> Now that _is_ interesting, especially given that when DCL was being
>> designed, someone went to the trouble of storing the length of the
>> command when the command was placed into the recall buffer.
>>
>> Simon.
>>
>

>Wait, your saying that something designed specifically for and used
>only in VMS and not written in C uses a null terminated string!!
>
>Imagine that. :-)

Not quite.

David Froble

unread,

Aug 9, 2017, 8:58:02 PM8/9/17

to

Having it, and using it, are two different things ...

Arne Vajhøj

unread,

Aug 9, 2017, 9:15:15 PM8/9/17

to

On 8/9/2017 8:58 PM, David Froble wrote:
> Arne Vajhøj wrote:
>> On 8/9/2017 6:08 PM, David Froble wrote:
>>> Bill Gunshannon wrote:
>>>> Wait, your saying that something designed specifically for and used
>>>> only in VMS and not written in C uses a null terminated string!!
>>>>
>>>> Imagine that. :-)
>>>
>>> Good help can be so hard to find ...
>>
>> Given that Macro-32 has the .ASCIZ directive, then ...
>

> Having it, and using it, are two different things ...

True.

But I assume it was added because someone wanted to use it.

Arne

Arne Vajhøj

unread,

Aug 9, 2017, 9:20:42 PM8/9/17

to

There is also:
FAO !AZ

Arne

mcle...@gmail.com

unread,

Aug 9, 2017, 9:31:46 PM8/9/17

to

It seems sensible to keep the string length and the string itself for recall, after all we're talking here of a chunk of P1 space and memory is precious.
When the command is recalled from the buffer I guess there's the option of
(a) passing the command + the length to the command interpreter (which would need the length calculated for every newly entered command)
(b) appending CR and LF or whatever to make it look like terminal or DCL input
(c) appending 0 to NULL terminate it
And don't forget that the command interpreter might do some replacing of symbols, which means that the length might change. Is it easier to keep and update a separate counter for length or just to append a character (and maybe copy it when shrinking or expanding the command?

Bill Gunshannon

unread,

Aug 10, 2017, 7:52:21 AM8/10/17

to

On 8/9/2017 8:40 PM, VAX...@SendSpamHere.ORG wrote:
> In article <ev1e2b...@mid.individual.net>, Bill Gunshannon <bill.gu...@gmail.com> writes:
>> On 8/9/2017 1:33 PM, Simon Clubley wrote:
>>> On 2017-08-09, dgordo...@gmail.com <dgordo...@gmail.com> wrote:
>>>>
>>>> I've coded up changes to the RECALL code to fix the issue. The interesting thing is that after using Simon's file to load up the recall buffer, three up-arrows will successfully recall the three "commands" with no ill effects, nor does a fourth cause
>>>>
>>>> DCL expects a single NUL character to terminate the command in the work buffer. Try the following:
>>>>
>>>> $ X[0,32]=0
>>>> $ Y = "This is a null ''X' test"
>>>> $ SHOW SYMBOL Y
>>>>
>>>
>>> [Tested on Eisner, Alpha v8.3]
>>>
>>> Now that _is_ interesting, especially given that when DCL was being
>>> designed, someone went to the trouble of storing the length of the
>>> command when the command was placed into the recall buffer.
>>>
>>> Simon.
>>>
>>
>> Wait, your saying that something designed specifically for and used
>> only in VMS and not written in C uses a null terminated string!!
>>
>> Imagine that. :-)
>
> Not quite.
>

"DCL expects a single NUL character to terminate the command in the work
buffer."

Sounds like it to me. What am I missing?

bill

Bill Gunshannon

unread,

Aug 10, 2017, 7:54:09 AM8/10/17

to

On 8/9/2017 7:43 PM, Arne Vajhøj wrote:

I pointed this out about not only Macro-32 but also Macro-11
(and I would expect Macro for every other DEC box) but people
usually just told me even though it was there didn't mean
anyone ever used it.
:-)

bill

Bill Gunshannon

unread,

Aug 10, 2017, 7:54:32 AM8/10/17

to

On 8/9/2017 8:58 PM, David Froble wrote:

Ta da!!!!

bill

Bob Koehler

unread,

Aug 10, 2017, 9:17:57 AM8/10/17

to

In article <ev1e2b...@mid.individual.net>, Bill Gunshannon <bill.gu...@gmail.com> writes:
>

> Wait, your saying that something designed specifically for and used
> only in VMS and not written in C uses a null terminated string!!
>
> Imagine that. :-)

Must have had one of the old TOPS-20 programmers working on DCL.

Stephen Hoffman

unread,

Aug 10, 2017, 10:36:28 AM8/10/17

to

On 2017-08-09 22:05:53 +0000, David Froble said:

> Well, there was the guy that was working on Galaxy, and used null
> terminated strings.

Alas, C isn't integrated with OpenVMS. And C is simply horrible with
strings and string handling.

Yes, BASIC does somewhat better there. But then BASIC string handling
is very dated.

Macro32 and Bliss both use ASCIZ strings, too. ASCIZ is one of the
types that's been in the calling standard for aeons.

What hasn't yet been added in the calling standard are more modern ways
to do string and related processing: objects.

We're not going to be replacing ASCIZ with ASCID, and ASCID doesn't get
us what we need, and more than a little {C, BASIC, etc} code won't work
when ASCII/MCS is replaced with UTF-8.

So we can look back at a thirty year old problem and deride the sorts
of problems caused by the inevitable coding mistakes and the
increasingly less-than-helpful coding tools we're all working with, or
we can look forward.

Toward 2022. 2027.

At how we avoid the problems caused by ASCIZ. And caused by or
limited by our present string descriptors, for that matter. And by
code that knows the innards of descriptors, and that has no concept of
Unicode and of UTF-8 encodings, and that has no provisions for app
isolation beyond what little that privileges and identifiers can
provide.

Because that's where we're headed.

Lee Gleason

unread,

Aug 10, 2017, 3:58:04 PM8/10/17

to

"Robert A. Brooks" wrote in message news:omg99s$tm6$1...@dont-email.me...

And RSX actually uses ASCIZ in places, for example, $EDMSG.

--

Lee K. Gleason N5ZMR
Control-G Consultants
lee.g...@comcast.net

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

VAXman-

unread,

Aug 10, 2017, 5:05:41 PM8/10/17

to

In article <dV2jB.200859$bO5....@fx16.iad>, "Lee Gleason" <lee.g...@comcast.net> writes:
>
>
>"Robert A. Brooks" wrote in message news:omg99s$tm6$1...@dont-email.me...
>

> And RSX actually uses ASCIZ in places, for example, $EDMSG.

... but there's not an .ASCIZ anywhere in DCL. .ASCIIs and .ASCICs

That doesn't mean that nulls appearing in a string do not have consequence.