Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

It's "Shut her down Scotty, she's sucking mud again!" (was Re: Model 16B or Tandy 6000)

1,203 views
Skip to first unread message

Frank Durda IV

unread,
Aug 27, 2004, 12:37:18 AM8/27/04
to
Neil Morrison <now...@to.me> wrote:
: Or "Beam her up Scotty. She's sucking mud".

Please, it was "Shut her down Scotty, she's sucking mud again!"

Not "Shut 'er" or "Beam her" or any of the others I see used.

Depending on the OS release, it was preceded or followed by
BugChk: SckMud
or
BugHlt: SckMud


I selected the text of that message from an unknown persons post that I had
seen on USENET News around 1983/1984. At the time, I and some of the other
engineers thought it was very funny*, so I saved it for use in a special
message, one of those that only one guy in a zillion would ever get, and
more likely, would be discovered only by disassembling the code.

* Right up there with this description for a newsgroup:
rec.pets.cats.anecdotes I remember once when he climbed out onto the primary...

(Seems like that anecdote would end about there.
Maybe you had to be there.)


Unfortunately, because of a then-unknown hardware glitch in the 16B/6000
hardware, the "Sucking Mud" message started popping out on quite a few
systems, and of course, didn't appear on the systems in-house, which were
a bit more up to date on mods or had fewer-recalled parts than the customer
systems. (Systems we had to use in-house were frequently not even
assembled in the factory.) The message is actually warning of a real
hardware problem that needed attention, but pointing out hardware problems
was an unhealthy thing to do back then.** At Tandy, I was informed that
Software was supposed to hide hardware flaws if at all possible (rather
than have Tandy actually fix the hardware), and you definitely were not
to draw attention to the hardware problems.

"It's cheaper to fix hardware with software", the director of hardware would
constantly say. It was probably printed on his business cards.
Tandy was defintely a vast untapped source of Dilbert material.

**There was quite a fuss when I proved that the Arcnet cards for the
Model II/12/16/16B/6000 had a flaw that garbled data, which was repairable
with a complex mod. The hardware folks seriously wanted us to code
around this random addition or loss of a byte in transmissions.
It did explain why Arcnet had been so notoriously unreliable and
trouble-plagued for its first four years in the Tandy product line...


The drivers on the Z80 for XENIX had a dispatching loop and the error
indicates that the Program Counter had reached the beginning of the loop,
but the Stack Pointer has a different value than it had the last
time we passed this point, something that should not be possible. A coding
error that caused a stack imbalance would usually walk the stack through
memory and crash the system or be unlikely to return to any desired location
in any predictable way since the loop was full of CALL instructions, but
what was happening here was something that happened very rarely, more
frequently on some chassis then others, and in most cases seemed to leave
the system operational, but would you trust it to keep running for long?

Initially I had the message emitted as a warning, and recommended that
the system be rebooted immediately but cleanly in response to that message.
Reports indicated that in 99.999% of the cases if you got in this state,
you could shut down without issue and avoid dirty filesystems.

However, I had so many people give me a hard time about trying to be nice
and printing out the warning when this problem was detected (griping at
me was easier than actually investigating and fixing the hardware problem),
I changed the behavior in the next release from a BugChk to a BugHlt,
which crashed the system on the spot. I also dumped out some additional
registers and such.

This change did the job, and forced the hardware people to actually
investigate and largely fix the problem, because now it wasn't a
strangely-worded message coming out anymore, it was a
make-the-customer-angry crash problem, (accompanied by a quirky message).
It's a shame that this sort of tactic had to be used, but prying re-works
and mods out of the hardware department to fix problems were very hard to
do.


The phrases BugChk and BugHlt were both copied from the DECsystem TOPS-20
operating systems BUGCHK and BUGHLT classifications of system problems.
I used this to achieve some consistent format to operating system error
messages, something that was completely lacking in the Z80 code that existed
when I got the job. Some messages were terse, and some went on for four
lines. I also had a tight RAM footprint to fit all the drivers in.

BugHlt messages (Bug-Halt) always resulted in the system stopping.
BugChk messages (Bug-Check) were a warning that things were odd and it
would a lot better if you shut down and rebooted soon, maybe even start
making backups.
BugInf messages (Bug-Information) was a less-well defined category that
included things that might be problems, or might just be the user trying
to do something with hardware that they haven't attached to the system yet.


This stack imbalance issue was eventually proved to be a hardware problem,
one of many with the 12/16B/6000 systems, although the II/16 had their
share too. The 16B/6000 was particularly hurt by all the faulty 74AS/ALSxx
chips that TI sold us after they had been recalled. Several additional
Bugchk/Bughlt messages were added as more ways the hardware could get
itself "weird" were discovered.

These systems had stuff like RST 38 opcodes being fetched and executed
by the Z80 (a problem, because we didn't have any RST38 opcodes anywhere
in the code), AND if you read the memory location again, the RST38 opcode
value wasn't the value that was really at that memory location. So another
bit of XENIX Z80 recovery code I added in there would go back and try to
execute the same PC location a second time, and kick out a message warning
that this was done and things are okay - for the moment. 99.9999999% of
the time, the system would keep on ticking despite this obvious glitch.


As to the "Shut her down" text, a search tonight of Google and their old
repository of USENET posts finds the very message in a post from July of
1984 in "net.jokes" (this group does not exist today, as this was before
the re-structuring of USENET in 1987/1988). This post is titled
"A TRUE STORY" and was from a Gregory M. Mandas. His name is not familar
to me, and this isn't the precise situation I recall where I first saw this
message, but then it was twenty years ago. Regardless, that post is
undoutbtedly the earliest sample that I can easily search. He also claims
to be quoting another source for that text but doesn't mention the source.


As a point of interest, I started work on the Z80 code for 68000 XENIX just
two months later (September 1984), having escaped the Model 2000 and then
the 2000 XENIX debacles that I had been stuck on since January, so the
timing of this may be about right to have been the actual place where I
got the message.


Frank Durda IV - only this address works:|"The Knights who say "LETNi"
<uhclemLOSE.aug04%nemesis.lonestar.org> | demand... A SEGMENT REGISTER!!!"
You must remove the "LOSE" to mail me. |"A what?"
http://nemesis.lonestar.org |"LETNi! LETNi! LETNi!" - 1983
Copr. 2004, ask before reprinting.

Kenneth Brody

unread,
Aug 27, 2004, 11:38:45 AM8/27/04
to
Frank Durda IV wrote:
>
> Neil Morrison <now...@to.me> wrote:
> : Or "Beam her up Scotty. She's sucking mud".
>
> Please, it was "Shut her down Scotty, she's sucking mud again!"
[...]

> I selected the text of that message from an unknown persons post that I had
> seen on USENET News around 1983/1984. At the time, I and some of the other
> engineers thought it was very funny*, so I saved it for use in a special
> message, one of those that only one guy in a zillion would ever get, and
> more likely, would be discovered only by disassembling the code.

I guess I was one of those "lucky" few that actually saw it in action?

As I recall, the actual text was encrypted in the z80 control code, so a
"strings" wouldn't show it.

[...]


> Unfortunately, because of a then-unknown hardware glitch in the 16B/6000
> hardware, the "Sucking Mud" message started popping out on quite a few
> systems,

Oh well, I guess I wasn't so "lucky" after all. ;-)

[...]


> "It's cheaper to fix hardware with software", the director of hardware would
> constantly say. It was probably printed on his business cards.
> Tandy was defintely a vast untapped source of Dilbert material.

Well, _technically_ he's probably (at least partially) correct. Software
only needs to be written once. But to fix a few thousand pieces of hardware
already out in the field... ;-)

[...]


> The drivers on the Z80 for XENIX had a dispatching loop and the error
> indicates that the Program Counter had reached the beginning of the loop,
> but the Stack Pointer has a different value than it had the last
> time we passed this point, something that should not be possible.

That was the explanation I heard years ago for the "sucking mud" error.
I guess this is a "horse's mouth" thing now?

[...]


> This change did the job, and forced the hardware people to actually
> investigate and largely fix the problem, because now it wasn't a
> strangely-worded message coming out anymore, it was a
> make-the-customer-angry crash problem, (accompanied by a quirky message).
> It's a shame that this sort of tactic had to be used, but prying re-works
> and mods out of the hardware department to fix problems were very hard to
> do.

;-)

[...]


--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+

Mike Yetsko

unread,
Aug 27, 2004, 8:48:21 PM8/27/04
to
> "It's cheaper to fix hardware with software", the director of hardware
would
> constantly say. It was probably printed on his business cards.
> Tandy was defintely a vast untapped source of Dilbert material.

Hmm, would that be the guy who went to a credit card offshoot company
managing the insurance program?

Frank Durda IV

unread,
Aug 27, 2004, 11:15:41 PM8/27/04
to
Kenneth Brody <kenb...@spamcop.net> wrote:
: As I recall, the actual text was encrypted in the z80 control code, so a

: "strings" wouldn't show it.

Not in the original version. I scrambled it and some other stuff in
ubsequent versions so that the crude tests Tower I QA would do looking for it
(a "strings" or hexdump) would not find it. I also responded to a persistent
third party that was now using part of the message as a patch space for
local hacks (you can probably figure out who that was), so additional
countermeasures against undetectable tinkering were added at the same time.
I knew I couldn't stop them from altering the system, but I wanted the
fact that this had occurred to be obvious at the store, repair center
and customer support, so any problem reports on modified systems would never
make it far as my department.

I got really tired of getting yelled at for problem reports on my code
that was doing things the code I wrote would not do, was displaying error
messages the code I wrote didn't have, and so on. Invariably, later we
would find that the customer had applied a few K of patches to extend this
or allow more of that, and now these patched/customized systems were failing,
and this was somehow my fault.

To make matters worse, there was some sort of penalty for having problems
reported against software that impacted the given software managers bonus
(even if the problem reports were totally bogus or misdirected), so they
got quite rabid about getting any.


: Well, _technically_ he's probably (at least partially) correct. Software


: only needs to be written once. But to fix a few thousand pieces of hardware
: already out in the field... ;-)

Sure, and Tandy Systems Software - prior to original release or in subsequent
patches - covered hundreds if not thousands of hardware flaws and defects over
the years in every model ever produced that we did OS or applications for.
However, there was a significant number of hardware problems that were not
fixable or not even vaguely concealable with software, no matter how much
time was spent on it or how many programmers you threw at it.

Even clever tricks like starting an oscilation in the cones of speakers
connected to a system to begin motion in the same direction the cone
would eventually and abruptly take when a large voltage change would come
out of the hardware speaker output as the reault of changing chipset modes,
in an attempt to lesson the audible level of that transition. The
voltage level you ended up at was one beyond any level software could
reach by command or setting. Software could tell the hardware to generate
a signal level between 0 and 1 V (as an example), and the other state
was always 1.5V.

Of course, that jump was so big that it would be heard as a "POP" no
matter what I did, and amplified through enough big speakers in
an auditorium (like at the Tandy Annual Shareholders meeting), everybody in
the room would still hear (or feel) it. "Reducing the POP" was something
I was ordered to waste perhaps two months on, and was something that could
never ever be cured by software.


The head of Tandy hardware would have stated that a software fix should be
developed and used to convince passengers to not notice the rising water in
their cabins on the Titanic, and this would somehow "solve" the problem
of the gaping hole in the side of the ship, or if you prefer, hardware.
Plus, with the software fix, there would be a bonus for the stores! No
defective units would ever be returned!

A Dilbert strip from 15-Apr-1994 speaks on this subject and goes like this:

Dilbert: "You've got to delay the beta trial with customers
until we figure out why it keeps exploding!"

Pointy-hair Manager: "You engineers are such pessimists. Just once,
try to focus on the positive aspects of the trial!"

Dilbert (later, typing): "We don't need to hassle with 'Non-Disclosure'
agreements."

At Tandy, the "Beta Trial" was usually known as "First Customer Ship". :-(


My complaint all along was that when the problem was clearly something
that could never be made to work reliably (or at all) without a hardware
fix, why did we spend months researching a software fix anyway, and then
also have to reluctantly come up with a cut-n-jump that fixed the problem.
Meanwhile, we continued to manufacture the broken design, and wasted customer
money running them in and out of repair centers getting mods and upgrades
that we knew would not address the problem (but probably helped other
broken stuff).

This "code around the hardware bugs" way of doing business had to end (but
it took years to completely die) when Tandy started making "PC-Compatible"
hardware, and the hardware group no longer had the power to just go upstairs
and browbeat some programmer(s) into not using certain opcodes or hardware
functions that didn't really work and it was deemed too late or too hard
to fix them right.

I remember them trying that with Microsoft on the Model 2000, when the early
steps of the 80186 processors could not perform the MOVSB/MOVSW instructions
reliably (and not at all with a REP prefix), and they asked Microsoft
to not use that opcode in anything they did for the 2000. Hysterical
laughter was all we heard back from Microsoft. Tandy ended-up having
to wait for Intel to turn out a new stepping of the chip where more of
the opcodes worked the same as they did on the 8088/8086.

And that was back when Microsoft still feared a few people and things,
like IBM. We would not even get a return telephone call from Microsoft if
that sort of request was attempted today.

Quite a few 2000s still went out with a couple of opcodes still broken,
but those were new 80186/80188 opcodes, and no one was using them - yet.
Eventually the Intel 80186 masks were stable, but by that time, the 2000
was a flop, and the 80186 was doomed to no longer be part of the
PC-compatible world. Homeless, it finally got a job running traffic lights,
running high-end 3COM ethernet cards, USR Courier modems, etc.

The "Can you Code around this" behavior pretty-much died when Tandy started
making machines for Digital Equipment Corporation. DEC would have none
of this "there is a bug, can we hide it" stuff, and they would sometimes
halt production of their machines at Tandys factories completely until a
fix was available and implemented.


: That was the explanation I heard years ago for the "sucking mud" error.


: I guess this is a "horse's mouth" thing now?

Well, you may recall the document I published internally around 1986
that described many (if not all) message that could come out of z80ctl in
gory detail, although I don't think I elaborated on the history of the
"Shut her down" message. A second document was produced by John Elliott IV
at the same time covering interesting kernel messages. Both were meant for
and delivered to Technical Support.

The z80ctl document went into some detail on things known to trigger the
given situation (power supply low, missing mod "TB:6000:8", missing +5V "big
wire" 68000 jumper, etc, and I'm sure that Gary and others down there added
their own findings to that information, and maybe the result made it out
of central unit and to the repair centers in some form.

I came across the original documents in the past few years. It was
originally written in troff (I was pretty good at troff), but it can
probably be made HTMLable, assuming I come across it again. (I've got a
room almost completely full of manuals, binders of memos, software diskettes,
Iomega cartridges and DC-61x0 tapes, full of all sorts of goodies from
those days, all waiting for enough time to do something with them before
they go bad or the bugs eat them.) With little time, it is tough deciding
what to recover first.


Frank Durda IV - only this address works:|"Your company has become synonymous
<uhclemLOSE.aug04%nemesis.lonestar.org> | with incompetence and crime. Stop
You must remove the "LOSE" to mail me. | trying to be all things to all
http://nemesis.lonestar.org | people. Focus on either the
Copr. 2004, ask before reprinting. | incompetence or the crime."-Dogbert

Mike Y

unread,
Aug 28, 2004, 4:11:44 PM8/28/04
to
Hey, to be fair, the software side had it's own toads too.

You know about the director that took the program I wrote while I was in
another department and sent it out
as something someone in his group wrote, right?

He got caught in an incident that went all the way to
Pack (is that how you spelled it?)

Mike


Mike Yetsko

unread,
Aug 28, 2004, 4:20:33 PM8/28/04
to
"Frank Durda IV" <uhclemLO...@nemesis.lonestar.org> wrote in message
news:I34zq...@nemesis.lonestar.org...

> I remember them trying that with Microsoft on the Model 2000, when the
early
> steps of the 80186 processors could not perform the MOVSB/MOVSW
instructions
> reliably (and not at all with a REP prefix), and they asked Microsoft
> to not use that opcode in anything they did for the 2000. Hysterical
> laughter was all we heard back from Microsoft. Tandy ended-up having
> to wait for Intel to turn out a new stepping of the chip where more of
> the opcodes worked the same as they did on the 8088/8086.

I thought that was a 'Tandy' bug, not an Intel bug. Hmm, are you talking
about
the REP MOVSB when the rep takes it over a transition of the hardware memory
cs lines. Or was that only when there were multiple push/pop cycles over
the boundary?

Then again... there was an Intel bug too, but I can't remember exactly what
it was. Unless it was one of the times when the rep would stop without the
proper terminal count.

Then again, if you want to really PO some people, you could talk about the
6MHz 186 parts... Or how about all the 2000's that went out the door with
hard disk that passed all the diags. The diags that wouldn't report a
failure
even if you pulled the drive data cable off in the middle of the test...

Kenneth Brody

unread,
Aug 30, 2004, 1:56:13 PM8/30/04
to
Frank Durda IV wrote:
[... snip interesting history lesson ...]

>
> I came across the original documents in the past few years. It was
> originally written in troff (I was pretty good at troff), but it can
> probably be made HTMLable, assuming I come across it again. (I've got a
> room almost completely full of manuals, binders of memos, software diskettes,
> Iomega cartridges and DC-61x0 tapes, full of all sorts of goodies from
> those days, all waiting for enough time to do something with them before
> they go bad or the bugs eat them.) With little time, it is tough deciding
> what to recover first.

Umm... "the ones where the only drive I have left that can still read this
media is about to go belly up"?

col...@precisehire.com

unread,
Feb 5, 2018, 8:05:06 AM2/5/18
to
Hi Frank,

I thought this quote was related to you. I just made the same comment about the pending government shutdown.

I still have fond memories of working on the doomed VIS project.

-Mark C

michel...@gmail.com

unread,
Aug 29, 2018, 9:07:17 PM8/29/18
to
At KSU in 1985 the lab used a VAX 11-780 and the OS was UNIX. The assignment was to write an AI application as an example of recursion. When my program crashed, the error message was "shut her down, Scotty, she's sucking mud." At first I thought a friend who wanted me to quit and go to dinner was playing a joke. Later I was told it was an error message UNIX generated when the condition was undefined. UNIX is a living language and it just told me someone had a nice sense of humor - something to laugh at before going back to the drawing board.

My first computer was a Tandy 2000 but without graphics or a hard drive so I didn't have the problems mentioned above. It let me code at home and sometimes compile to test before uploading to the VAX where it compiled, ran and printed the output for me to pick up when i got to the lab

My first job was Tandy Training and Support in Chicago and I think I got the job because I owned a 2000. After a year I transferred to N.E. Ohio and soon after the Support department was down sized and we were assigned to our store managers. Mine was an interior decorator.

A salesman would sell a toy computer and a suite of accounting software that the toy couldn't run, When I brought this to the attention of the sales manager who was my boss's boss, his reply was, "Make it work!"

I knew there was something fundamentally wrong with the company's ethics. After reading this thread, I know now it wasn't just the bean counters.


0 new messages