WANTED: Unix administration horror stories !

Arne Asplem

unread,

Oct 8, 1992, 5:35:50 AM10/8/92

to

>>On 7 Oct 92 12:02:46 GMT, ar...@multix.no (Arne Asplem) said:

>> I'm looking for actual horror stories of what have gone wrong because
>> of bad system administration, as an early morning wakeup.

>> I'll summarise to the net if there is any interest.

There has been a lot of interest in creating a summary of the
system administration "horror stories" - but so far I've only got a few
stories, and not any really scary ones :-)

I guess companies/system administrator are afraid of telling about
there real mistakes, and what we see in the press and magazines in just
the top of the iceberg.

I'll keep all references to companies and persons confidential if you want !

-- Arne
--
// Arne Asplem // Office: Gjerdrums vei 12, N-0486 Oslo, Norway //
// Multix A/S // Phone/fax: +47-2-950800 / 950790 //
// // Email: ar...@multix.no //
// NUUG Chairman // Email: ar...@nuug.no //

Tim Smith

unread,

Oct 9, 1992, 6:04:44 AM10/9/92

to

I was working on a line printer spooler, which lived in /etc. I wanted
to remove it, and so issued the command "rm /etc/lpspl." There was only
one problem. Out of habit, I typed "passwd" after "/etc/" and removed
the password file. Oops.

I called up the person who handled backups, and he restored the password
file.

A couple of days later, I did it again! This time, after he restored it,
he made a link, /etc/safe_from_tim.

About a week later, I overwrote /etc/passwd, rather than removing it.

After he restored it again, he installed a daemon that kept a copy of
/etc/passwd, on another file system, and automatically restored it if
it appeared to have been damaged.

Fortunately, I finished my work on /etc/lpspl around this time, so we
didn't have to see if I could find a way to wipe out a couple of
filesystems...

--Tim Smith

Bill Roberts

unread,

Oct 9, 1992, 9:04:12 PM10/9/92

to

My most interesting in the reguard was when I deleted "/dev/null". Of
course it was soon recreated as a "regular file", then permission problems
started to show up.

I was new at the game at the time and couldn't figure out what happened!
It look good to me. I didn't know about "special files" and "mknod" and
major and minor device codes. A friend finally helped out and started
laughing and put me on the right track. That one episode taught me a
lot about my system.
--
| Bill Roberts, Va Beach VA | In the field of observation, chance |
| brob...@waggen.twuug.com | favors the prepared mind. - Pasteur |

Frank T Lofaro

unread,

Oct 10, 1992, 3:20:39 PM10/10/92

to

Well one time I was installing a minimal base system of Linux on a
friends PC, so that we would have all the necessary utlitities to bring
over the rest of the stuff. His 3 1/2 inch disk was dead, so when had to
get the 5 1/4 inch version of the boot/root disk. Too bad that version,
having to fit in 1.2M instead of 1.44, didn't have tar. We could get a
version of tar, but it was in a tar file (nice chicken and egg
scenario). I said, okay, since we don't have tar, we can't use that to
copy the files from floppy to the hard disk, I'll use cp instead (bad
move). It actually seemed to work for a while, then the machine
rebooted! I did it again, the same thing happened. Then I realize cp
wouldn't work on device files! (this is what happens when you try to
install un*x at 3 AM). It just read the contents of the device and made
a file containing such, which is undesireable in any event. (when it
read /dev/port, the device file that references I/O ports, it must've
did something to reboot the machine, that was the file that was causing
the reboots).

I finally got it working by having him get the tar archive of the
linux binaries (including the tar we needed), and untarring it on one of
the public decstations here, so we could ftp tar to his PC using his dos
tcp/ip stuff. A funny aside was that it untarred into ~/bin, and
superseded all his normal commands. We were wondering why everything
wouldn't run. Luckily it wasn't too hard to fix after we realized what
happened.

Mitch Wright

unread,

Oct 11, 1992, 6:02:32 PM10/11/92

to

I guess I should add a story (or maybe not). Anyway, a fellow sysadmin
was looking to free up some much needed disk space. Since it was purely
a production machine I suggested that he go through and "strip" his binaries.
Unfortunately I made the assumption that he knew what strip does and would
use it wisely -- flashes of the Bad News Bears come to mind now.
To make it short, he stripped /vmunix which didn't destroy the system, but
certainly caused some interesting problems.

~mitch

Eiji Hirai

unread,

Oct 11, 1992, 3:27:50 PM10/11/92

to

Some of these stories of pure stupidity rather than of interesting horror
but they did happen.

[ BTW, these happened at a different place at a different time than where I
am now. Don't bother my current employer about it. ]

(1) A consultant we had hired (and not a very good one) was installing Unix
on one our workstations. He was mucking with creating and deleting
/dev/tty* files and made /dev/tty a regular file. Weird things started to
happen. Commands would only print their output if you pressed return twice,
etc. Fortunately, we solved the problem by re-mknod-ing /dev/tty. However,
it took a while to realize what was causing this problem.

(2) I wanted to create a second swap partition on another disk and made the
partition start at sector 0 of the disk! (which sounded ok at the time since
all other regular 'a' partitions started on sector 0) Every time I rebooted,
fsck would complain about missing partition tables - I initially suspected
that the disk was bad but I later realized that swapping was overwriting the
partition table. I had lost an unknown percentage of the financial data for
the institution that I was working for at the time, right when they were
being audited! Yikes! Anyway, we were able to recover the data and life
returned to normal but I did wonder at the time whether I could still keep
my job there.

(3) At the same institution, we were running a system software that had a
serious bug where if anyone had logged out ungracefully, the system wouldn't
let any more users onto the system and users who were logged on couldn't
execute any new commands. (The newest release of the software later on did
fix this bug.) I had to reboot the machine to restore the system to a sane
state. I did a wall <<EOF We need to shutdown blah blah... EOF and then
shutdown. Well, I should've waited since at the precise moment, one of our
users was doing a once-a-year massive conversion of our financial data (talk
about bad luck). I had shutdown in the middle of a very long disk write and
thus, data was lost. We did recover that data and life went on. Moral:
make damn sure that *no one* is doing anything on your system before you
reboot, even if other users are vociferously clamoring for you to reboot.

(4) I heard this from a fellow sysadmin friend. My friend was forced to
work with some sysadmins who didn't have their act together. One day, one
of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".

My friend had to reinstall the entire OS on that machine after his coworker
did this "cleanup". Ahh, the hazards of working with sysadmins who really
shouldn't be sysadmins in the first place.

Moral of all these stories: if I had to hire a Unix sysadmin, the first
thing I'd look for is experience. NOTHING can substitute for down-to-earth,
real-life grungy experience in this field.

--
hi...@cc.swarthmore.edu (Eiji Hirai) : : : : : :: ::: :::: :::::
Unix Geek for Swarthmore College : : : : : :: ::: :::: :::::
Information Services, Swarthmore, PA, US. Copyright 1992 by Eiji Hirai.
I don't speak for Swarthmore College. All Rights Reserved.

Tim Smith

unread,

Oct 12, 1992, 3:12:42 AM10/12/92

to

>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>work with some sysadmins who didn't have their act together. One day, one
>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
>My friend had to reinstall the entire OS on that machine after his coworker
>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>shouldn't be sysadmins in the first place.

This is why God invented chroot. Have everything one level down, and arrange
for login to chroot everyone. Have one link out to the real root in some
place that only competent syadmins know about. Don't tell the incompetent
ones about this. Furthermore, don't tell them that they are really running
chroot'ed down one level.

Alternatively, if you are on a system that can boot a kernel from a
subdirectory, but vmunix in /usr/vmunix. Then, assuming that something
gets mounted on /usr when you go multiuser, vmunix will be safe (unless
this is a System V Unix from before they fixed the namei bug that let
you, under the right conditions, get to directories that had things
mounted on top of them...).

--Tim Smith

David J Stevenson

unread,

Oct 12, 1992, 4:09:44 AM10/12/92

to

In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>...[some deleted]

>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>work with some sysadmins who didn't have their act together. One day, one
>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".

>My friend had to reinstall the entire OS on that machine after his coworker
>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>shouldn't be sysadmins in the first place.

When this happened to a colleague (when I worked somewhere else) he restored
vmunix by copying from another machine. Unfortunately, a 68000 kernel does
not run very well on a Sparc...

--
+---------------------------------------------------+
| David Stevenson d...@jet.uk Tel: +44 235 465028 |
+---------------------------------------------------+
- Disclaimer: Please note that the above is a personal view and should not
be construed as an official comment from the JET project.

Steve McKinty - Sun ICNC

unread,

Oct 12, 1992, 4:22:29 AM10/12/92

to

In article <W1NR...@cc.swarthmore.edu>, hi...@cc.swarthmore.edu (Eiji Hirai) writes:

> (4) I heard this from a fellow sysadmin friend. My friend was forced to
> work with some sysadmins who didn't have their act together. One day, one
> of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
> "Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
> My friend had to reinstall the entire OS on that machine after his coworker
> did this "cleanup". Ahh, the hazards of working with sysadmins who really
> shouldn't be sysadmins in the first place.

Hmm. A colleague of mine did much the same by accident on one of
our test machines. After discovering it, fortunately while the machine
was still up & running, he FTPed a copy of /vmunix from the other lab
system (both running exactly the same kernel).

After rebooting his machine everything (to his relief) worked fine.

--
Steve McKinty
SUN Microsystems ICNC
38240 Meylan, France
email: smck...@france.sun.com BIX: smckinty

Peter da Silva

unread,

Oct 12, 1992, 7:54:15 AM10/12/92

to

Well, we had one system on which you couldn't log in on the console for a
while after rebooting, but it'd start working sometimes. What was happening
was that the manufacturer had, for some idiot reason, hardcoded the names
of the terminals they wanted to support into getty (this manufacturers own
terminals, that I can understand, but also a handful of common types like
adm3a) so getty could clear the screen properly (I guess hacking that into
gettydefs was too obvious or something). If getty couldn't recognise the
terminal type on the command line, it'd display a message on the console
reading "Unknown terminal type pc100". We ignored this flamage, which was
a pity. Cos that was the problem.

It did this *before* opening the terminal, so if it happened to run between
the time rc completed and the getty on the console started the console got
attached to some random terminal somewhere, so when login attempted to open
/dev/tty to prompt for a password it failed.

Moral: always deal with error messages even when you *know* they're bogus.
Moral: never cry wolf.
--
Peter da Silva. <pe...@sugar.neosoft.com>.
`-_-' "Segodnja volka obnimal?"
'U`
Dette kan umulig vaere mitt rom, eftersom jeg ikke puster ammoniakk.

Anselm Lingnau

unread,

Oct 12, 1992, 5:02:13 AM10/12/92

to

In article <1992Oct10....@waggen.twuug.com>, brob...@waggen.twuug.com
(Bill Roberts) writes:

> My most interesting in the reguard was when I deleted "/dev/null". Of
> course it was soon recreated as a "regular file", then permission problems
> started to show up.

Years ago when I was working in the Graphics Workshop at Edinburgh University,
we used to have a small UNIX machine for testing. The machine wasn't used too
much, so nobody bothered to set up user accounts, and so everybody was running
as root all the time. Now one of the chaps who used to come in was fond of
reading fortunes (/usr/games/fortune having been removed from the University's
real machines along with all the other games). Guess what happened when the
machine said

# fortune
fortune: write error on /dev/null --- please empty the bit bucket

Quite a lot of stuff wouldn't work after the chap was done with the machine
for the day. You bet we put up proper accounts after that!

Anselm
--
Anselm Lingnau .................................. lin...@math.uni-frankfurt.de
[Sendmail] can do just about anything. Its main problem is that it can do just
about anything. --- Chris Lewis, *UNIX Email Software Survey FAQ*

Rick Furniss

unread,

Oct 12, 1992, 10:46:33 AM10/12/92

to

Horror stories:
Did this myself many years ago, and have come close to it since.

Murphy's law #?? , preventive maintenence doesnt.

try this one: /etc/dump /dev/rmt/0m /dev/dsk/0s1
Or: tar cvf /dev/root /dev/rmt0

Backups on unix can be one of the most dangerous commands used,
and they are used to prevent rather than cause a problem. If any Unix
utility were a candidate for a warning message, or error checking, this
would be it.

Just in case you didnt catch the HORROR above, the parameters are backworks
causing a TOTAL wipe out of the root file systems.

More systems have been wiped out by admins, than any hacker could do in
a life time.

***** standard DISKclamer *****
personal views of my person only

CPSMEL/IA
C210 N3877Y
ri...@pmafire.inel.gov
ri...@servprod.inel.gov

--

***** standard DISKclamer *****
personal views of my person only

Rich Payne

unread,

Oct 12, 1992, 10:35:45 AM10/12/92

to

In article <1992Oct12.0...@jet.uk> d...@jet.uk (David J Stevenson) writes:
>In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>>...[some deleted]
>>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>>work with some sysadmins who didn't have their act together. One day, one
>>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
>>My friend had to reinstall the entire OS on that machine after his coworker
>>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>>shouldn't be sysadmins in the first place.
>When this happened to a colleague (when I worked somewhere else) he restored
>vmunix by copying from another machine. Unfortunately, a 68000 kernel does
>not run very well on a Sparc...

If it was a Sparc and still running, could you not have re-compiled the
kernal and copied ti back to root?

>--
>+---------------------------------------------------+
>| David Stevenson d...@jet.uk Tel: +44 235 465028 |
>+---------------------------------------------------+
>- Disclaimer: Please note that the above is a personal view and should not
> be construed as an official comment from the JET project.

Rich

pay...@netcom.com

Gary Fowler

unread,

Oct 12, 1992, 10:38:09 AM10/12/92

to

Once I was going to make a new file system using mkfs. The device I wanted to
make it on was /dev/c0d1s8. The device name that I used, however, was
/dev/c0d0s8 which held a very important application. I had always been a little
annoyed by the 10 second wait that mkfs has before it actually makes the file
system. I'm sure glad it waited that time though. I probably waited 9.9
seconds before I realized my mistake and hit that DEL key just in time. That
was a near disaster avoided.

Another time I wasn't so lucky. I was a very new SA, and I was trying to clean
some junk out of a system. I was in /usr/bin when I noticed a sub directory
that didn't belong there. A former SA had put it there. I did an ls on it and
determined that it could be zapped. Forgetting that I was still in /usr/bin, I
did an rm *. No 10 second idiot proofing with rm. Now if some one would only
create an OS with a "Do what I mean, not what I say" feature.

Gary "Experience is what allows you to recognize a mistake the second time you
make it." Fowler

Bill Broadley

unread,

Oct 12, 1992, 11:13:11 AM10/12/92

to

On a old decstation 3100 I was deleting last semesters users to try to
dig up some disk space, I also deleted some test users at the same time.

One user took longer then usual, so I hit control-c and tried ls.
"ls: command not found"

Turns out that the test user had / as the home directory and the remove
user script in ultrix just happily blew away the whole disk.

ftp, telnet, rcp, rsh, etc were all gone. Had to go to tapes, and had
one LONG rebuild of X11R5.

Fortunately it wasn't our primary system, and I'm only a student....

--
Bill 1st> Broa...@neurocog.lrdc.pitt.edu
Broa...@schneider3.lrdc.pitt.edu <2nd 3rd> Broa...@pitt.edu
Novell, AFS just say NO!

John Kochmar

unread,

Oct 12, 1992, 11:33:11 AM10/12/92

to

A long time ago, back when the Apollo 460 was around and I had just
graduated from college, I had the good fortune of being one of two
adminstrators in charge of making a cluster of 460's a part of our
environment. One of the things I was tasked with was geting them onto
our network.

Well, I was young, I had the manuals, and a guy from Apollo tech
support was there to help. How hard could it be, right?

Well, we got out the manuals, configured the system (relying heavily on
the defaults), and within 2 hours, we had that puppy on the network.
Life was good.

About 3 hours later, I get a phone call from a systems programmer /
developer from CMU campus (the SEI is a part of CMU, and we are on their
network.) He told me that if I didn't take the &%@*ing Apollo off the
network, he was going to do hurtful things to me physically.
Life was not so good.

As it turned out, in default mode, the Apollo answered every address
request it saw, even if it is not the machine the request was for.
Kind of a "hey, I'm not who you are looking for, but I'm out here in
case you decide you'd rather talk to me." Apollo considered this a
feature, and they took advantage of it in their OS environment.

However, one of the earlier versions of a heavily network dependant OS
developed at CMU considered this a bug. The OS would issue a request,
and expect only the machine it was looking for to answer it. Of
course, it would assume that if it got an answer to its request, it
must be the machine it expected to talk to. It didn't look at the
address of the answer it got, so if it wasn't the correct machine, most
of the time the OS would hang or panic.

The outcome? Over about 3 hours time, more and more of campus was
talking to our little 460, which had just enough muscle to keep up with
the requests. By the time campus figured out what was going on, we had
an Apollo merrily answering the network requests for hundreds of
machines (the ones that were still up, that is.) This caused the part
of campus who used the new OS going to hell in a bucket, one very busy
Apollo 460, and one very warm ethernet.

Well, we turned off the Apollo, configured it not to chat to all of
campus before putting it back on the ethernet (this time, we did it
while talking with campus, making sure we didn't cause the same
problems we did the last time -- we didn't have a packet monitor at the
time), and campus changed their OS to look at the request response
before assuming it was the correct one. I also learned to think very
carefully about default values before using them.

John
Manager, Systems and Tools admin
SEI Computing Facilities

-----------------------------------------------------------------------------
John Kochmar | Estimated amount of glucose used by an adult human
koc...@sei.cmu.edu | brain each day, expressed in M&Ms: 250
SEI Computing Facilities | -Harper's Index, October 1989

Alan Saunders

unread,

Oct 12, 1992, 2:28:27 AM10/12/92

to

About inexperienced sysadmins .. One such had been on a Sun syasadmin
course, and learned all about security. One of the topics was on file
and group access. On his return, he decided to put what he had learned
into practice, and changed the ownership of all files in /bin, /usr/bin
to bin.bin! I was called in when no one could log in to the system
(of course /bin/login needs to be setuid root!)

Regards .. Alan
--

* Meeeow ! Call Spuddy on (0203) 638780/638693 for FREE mail & Usenet access *

Mike Matthews

unread,

Oct 12, 1992, 2:00:25 PM10/12/92

to

When I had first gotten my NeXTstation, it had the lil' 105M hard drive in
it. I had a 330M external, but alas, no cable for it. (Life was not fun
when I was essentially netbooting off a "test" machine.... ".. um, guys, did
you just reboot is-next?")

Finally got the cable, just in time for the winter holiday (read: no
network). Brought the machine home, and I figured I'd just copy the
configuration files over from the internal to the external (as a nice gesture
to my users so they wouldn't have to change their passwords and everything).

The external was a brand new BuildDisk'd disk (had stock NeXTstep on it).
NeXT keeps the private information of each machine (/dev, /etc, stuff like
that) in a /private directory to make netbooting easier.

Hey, I'll just move /private from the 105M to /private on the external. So I
deleted the external's /private and tried to move it via the workspace.

/dev is in /private.

/dev contains device files. Can't move them.

BUT. The workspace happily deleted all the files it DID copy, so the
internal couldn't boot (no /etc) and the external couldn't boot (no /dev).
This is before the advent of boot floppies so I was stuck for about a week at
home with $5000 of NeXT computer that I couldn't boot.

The moral? *NEVER* move something important. Copy, VERIFY, and THEN delete.
------
Mike Matthews, matt...@oberon.umd.edu (NeXTmail accepted)
------
There has been an alarming increase in the number of things you know
nothing about.

Casper H.S. Dik

unread,

Oct 12, 1992, 3:41:10 PM10/12/92

to

al...@spuddy.uucp (Alan Saunders) writes:

>About inexperienced sysadmins .. One such had been on a Sun syasadmin
>course, and learned all about security. One of the topics was on file
>and group access. On his return, he decided to put what he had learned
>into practice, and changed the ownership of all files in /bin, /usr/bin
>to bin.bin! I was called in when no one could log in to the system
>(of course /bin/login needs to be setuid root!)

That's not true.
% ls -l /bin/login
-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login

This on SunOS 4.1.x with a hacked login.
Root ownership of most files is preferred, BTW.
All commands executed by root should be owned by
root and reside in directories owned by root. Otherwise,
a non-root user can get to root far too easily.
This is especially true in NFS environments.

Casper

PS: In case you're wondering why login need not be set-uid root:
getty/rlogind/telnetd run as root, those execute login.
You cannot remove the set-uid bit from a standard login, as
it will call setreuid(2) and exec the user's shell without checks.
This can be confusing if a user types login in his/her shell.

Dave Butterfield

unread,

Oct 12, 1992, 6:57:02 PM10/12/92

to

cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>>(of course /bin/login needs to be setuid root!)
>
>That's not true.

Whether it's true or not depends on which version of Unix you're
running.
--
Vote for *anybody* but Quayle!

Obi Thomas

unread,

Oct 12, 1992, 10:24:28 PM10/12/92

to

This isn't nearly as bad as some of the stories in this thread, but...

I once mistakenly partitioned my Sun's boot disk so that the swap
partition overlapped the usr partition. The machine ran fine for a long
time (many months), presumably because the swap space was always nearly
empty. Then, one day there was a memory parity error and the system crash
dumped at the *end* of the swap partition. What should have been a simple
reboot after the crash dump turned into a long and painful re-install of
the entire system (Suns cannot boot without a /usr partition).

Now when I partition a disk I sit there with a calculator and make sure
all the numbers add up correctly (offsets, number of cylinders, number of
blocks, and so on).

Tim Pierce

unread,

Oct 12, 1992, 11:59:34 PM10/12/92

to

Comp.unix.admin taken out of the distribution on this one, since it's
not really about Unix.

In article <1992Oct12.1...@pmafire.inel.gov> ri...@pmafire.inel.gov (Rick Furniss) writes:

> Backups on unix can be one of the most dangerous commands used,
>and they are used to prevent rather than cause a problem. If any Unix
>utility were a candidate for a warning message, or error checking, this
>would be it.

This is true on any system, though. Unix is not an exception.

> Just in case you didnt catch the HORROR above, the parameters are backworks
>causing a TOTAL wipe out of the root file systems.

I nearly did this once whilst performing a standalone backup of a VAX
8550 running VMS 5.1. I was fortunate, though: VMS's BACKUP command
requires you to specify a filename for the saveset to save on the
tape. I had mispelled the filename to restore from the tape, so VMS
ignored the whole command.

Whew.

--
____ Tim Pierce / "You are just naive and repressed because
\ / twpi...@unix.amherst.edu / penis envy is here and it's now and it's
\/ (BITnet: TWPIERCE@AMHERST) / all around you." -- Neal C. Wickham

James Cummings

unread,

Oct 12, 1992, 8:57:08 PM10/12/92

to

In article <1992Oct9.1...@u.washington.edu> t...@stein.u.washington.edu (Tim Smith) writes:
|I was working on a line printer spooler, which lived in /etc. I wanted
|to remove it, and so issued the command "rm /etc/lpspl." There was only
|one problem. Out of habit, I typed "passwd" after "/etc/" and removed
|the password file. Oops.
|
|I called up the person who handled backups, and he restored the password
|file.
|
|A couple of days later, I did it again! This time, after he restored it,
|he made a link, /etc/safe_from_tim.
|
|About a week later, I overwrote /etc/passwd, rather than removing it.
|
|After he restored it again, he installed a daemon that kept a copy of
|/etc/passwd, on another file system, and automatically restored it if
|it appeared to have been damaged.

Hmmm.....you were either a very good friend OR he was an unusually
easy going sysadmin. I think I would have fixed the problem by DELETING
YOUR password entry and changing root password....probably the later on
the FIRST occurance.

All this just to remove an old spooler directory???

Jeff DelPapa

unread,

Oct 13, 1992, 2:17:17 AM10/13/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:
>This isn't nearly as bad as some of the stories in this thread, but...
>
>I once mistakenly partitioned my Sun's boot disk so that the swap
>partition overlapped the usr partition. The machine ran fine for a long
>time (many months), presumably because the swap space was always nearly
>empty.

I remember a similar thing once - on a symbolics machine, a customer
declared a file in the FEP filesystem as a paging file, and as part of
the file system (it was one way to solve their disk space crunch) It
was caught before damage was done - we weren't sure if it was because
they hadn't done anything real yet, or simply the machine knew not to
mess with the IRS (the customer).

<dp>

David J Stevenson

unread,

Oct 13, 1992, 4:14:10 AM10/13/92

to

In <1992Oct12....@netcom.com> pay...@netcom.com (Rich Payne) writes:

>In article <1992Oct12.0...@jet.uk> d...@jet.uk (David J Stevenson) writes:
>>In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>>>...[some deleted]
>>>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>>>work with some sysadmins who didn't have their act together. One day, one
>>>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>>>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>>
>>>My friend had to reinstall the entire OS on that machine after his coworker
>>>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>>>shouldn't be sysadmins in the first place.
>>When this happened to a colleague (when I worked somewhere else) he restored
>>vmunix by copying from another machine. Unfortunately, a 68000 kernel does
>>not run very well on a Sparc...
>
>If it was a Sparc and still running, could you not have re-compiled the
>kernal and copied ti back to root?
>

But I think that someone who deleted vmunix, then copied from an incompatible
machine, didn't know what the file was. Therefore, he wasn't expected to know
how to rebuild it! [I was working on IBM OS/2 at the time, so I didn't have
such problems, it (then) only ran on PS/2 machines].

J.Rowe

unread,

Oct 13, 1992, 7:00:34 AM10/13/92

to

In article <1992Oct12....@javelin.sim.es.com> gfo...@javelin.sim.es.com (Gary Fowler) writes:

> Another time I wasn't so lucky. I was a very new SA, and I was trying
> to clean some junk out of a system. I was in /usr/bin when I noticed
> a sub directory that didn't belong there. A former SA had put it
> there. I did an ls on it and determined that it could be zapped.
> Forgetting that I was still in /usr/bin, I did an rm *. No 10 second
> idiot proofing with rm. Now if some one would only create an OS with
> a "Do what I mean, not what I say" feature.

That's why I *always*always*always* have the directory in my prompt.
And set 'rmstar' in tcsh to avoid those unwanted 'rm foo *' problems.
(If the variable rmstar is set tcsh always asks before doing an rm *).

BTW Gary, please format your lines to less than 80 characters wide :-)

John

Wm. L. Ranck

unread,

Oct 13, 1992, 10:24:46 AM10/13/92

to

Hello folks,
Well, after reading some of the stories in this thread I guess I can
tell mine. I got an RS/6000 mod. 220 for my office about 6 months ago.
The OS was preloaded so I had little chance to learn that process. Being
used to a full-screen editor I was not happy with vi so I read in the manual
that INED (IBM's editor for AIX) was full-screen and I logged in as root and
installed it. I immediately started to play with the new editor and somehow
found a series of keys that told the editor to delete the current directory.
To this day I don't know what that sequence of keys was, but I was
unfortunately in the /etc directory when I found it, and I got a prompt that
said "do you want to remove this?" and I thought i was just removing the
file I had been playing with but instead I removed /etc!
I got the chance to learn how to install AIX from scratch. I did reinstall
INED even though I was a little gun-shy but I made sure that whenever I used
it from then on I was *not* root. I have since decided that EMACS may be a
better choice.

--

*******************************************************************************
* Bill Ranck ra...@joesbar.cc.vt.edu *
* DoD #496 Bikes past and present: CB175, CB550F, Norton 750, CB350F, XV535 *
*******************************************************************************

Hans Mulder

unread,

Oct 13, 1992, 1:12:54 PM10/13/92

to

In <1992Oct12.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>al...@spuddy.uucp (Alan Saunders) writes:

>>About inexperienced sysadmins .. One such had been on a Sun syasadmin
>>course, and learned all about security. One of the topics was on file
>>and group access. On his return, he decided to put what he had learned
>>into practice, and changed the ownership of all files in /bin, /usr/bin
>>to bin.bin! I was called in when no one could log in to the system
>>(of course /bin/login needs to be setuid root!)

>That's not true.
>% ls -l /bin/login
>-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login

Errhm, Casper, did you notice that the login builtin of the shell no
longer works? The error message is "Permission denied".

Or is that intentional?

--
Hope this helps,

Hans Mulder h...@fwi.uva.nl

Mike Matthews

unread,

Oct 13, 1992, 2:17:46 PM10/13/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:

>Now when I partition a disk I sit there with a calculator and make sure
>all the numbers add up correctly (offsets, number of cylinders, number of
>blocks, and so on).

Heh heh, now that you mention that...

We had just gotten a 1.2G disk drive for our Sun (which direly needed it) so
we felt we'd repartition everything.

All went well, except... on reboot, one of the partitions that was newly
restored from backup got a fsck error. Fixed it, it rebooted, then another
one got an error. fscked that one, rebooted it, and doggone it, the first
error was back!

We had a one cylinder overlap. Sheesh.

At least Ultrix WARNS you of that.

------
Mike Matthews, matt...@oberon.umd.edu (NeXTmail accepted)
------

Don't kiss an elephant on the lips today.

Casper H.S. Dik

unread,

Oct 13, 1992, 1:59:36 PM10/13/92

to

h...@fwi.uva.nl (Hans Mulder) writes:

>Or is that intentional?

In current environments there is no need for a login command in the
shell. My version of login does a:
`if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'
If I hadn't done that, the login builtin would have behaved slightly
odd. The shell must have permission to exec /bin/login, though.
/bin/sh does not exit when an exec fails. /bin/csh does, but
it is broken anyway.

The reason not to have a set-uid login is simple: there are many
ways to subvert passwordless accounts or non-shell accounts
by mucking with the environment and doing a ``login -p''.

Casper

Martin Tomes

unread,

Oct 13, 1992, 6:03:02 AM10/13/92

to

We had something really wierd happen one day. I copied a file to
/usr/local on someone elses machine and all seemed to be OK. A bit
later the user of the machine noticed that the files and directories they
were using on another disk partition were corrupted. There were 2
gigbyte files on a 650Mb disk - and lots of them with wierd names and
permissions. At first I did not connect the two events. This disk
had given trouble when the power failed a week before, so I fsck'ed
it. Now I have run fsck more times than I can begin to imagine and
seen plenty of errors, some needing 'manual intervention' but I had
never seen anything like this before! It was spectacular. And what
was more, when I ran it a second time things got worse. Then I tried
to backup the /usr/local partition before restoring this corrupt data
and lo, that was corrupt too. It turned out that our sysadmin had
created the /usr/local disk partition in the wrong place on the disk
and put it over the top of the alternate sectors partition. By
writing to the /usr/local disk I had written all over the alts which
were mapped into the users partition. Oh dear, what a mess.

Solution, rebuild all the partitions so they don't overlap and
restore, also buy the sysadmin a calculator.

Moral, always do your sums on the /etc/partitions file very carefully
before using mkpart.
--
Martin Tomes
Janet: mto...@uk.co.eurotherm
Internet: mto...@eurotherm.co.uk
UUCP: {uknet,uunet}!etherm!mtomes

Chris A. Anderson

unread,

Oct 13, 1992, 1:28:43 PM10/13/92

to

Ok, here's one...

At a company that I used to work for, the CEO's brother was the
"system operator". It was his job to do backups, maintentance,
etc. Problem was, he didn't have a clue about Unix. We were re-
quired to go through him to do anything, though.

Well, I was setting up a Plexus P-95 to be a
news/mail/communications machine and needed to wipe the disks and
install a new OS. El CEO requested that his brother do the in-
stallation and disk partitioning. He had done this before, so I
gave him the partition maps and let him at it. When he was done,
everything seemed to be ok. Great, on with the install and set-
up.

Things went fine until I started compiling the news and mail
software. All of a sudden, the machine paniced. I brought it
back up and the root file system was amazingly corrupt. After
rebuilding things, it all seemed to be fine -- diagnostics all
ran fine, etc. So I started again -- this time keeping an eye on
things. Sure enough, the root file system became corrupted again
when the system started to load.

This time I brought it down and checked everything. The problem?
Swap space started at block zero and so did the root file system.
ARRRGGGHHHHH!!

Oh yes, the brother still works there.

Chris
--
+------------------------------------------------------------+
| Chris Anderson, Unify Corp. c...@unify.com |
+------------------------------------------------------------+

Kristian Koehntopp

unread,

Oct 14, 1992, 8:39:35 AM10/14/92

to

In <1992Oct12....@javelin.sim.es.com> gfo...@javelin.sim.es.com (Gary Fowler) writes:
>determined that it could be zapped. Forgetting that I was still in /usr/bin, I
>did an rm *. No 10 second idiot proofing with rm. Now if some one would only

That's what csh's !$ is for.

Kristian
--
Kristian Koehntopp, Harmsstrasse 98, FRG W-2300 Kiel, +49 431 676689
"One who has never hacked sendmail.cf has no soul.
One who has hacked it twice has no brain." -- Peter da Silva

Bruce Krawetz

unread,

Oct 13, 1992, 7:08:48 PM10/13/92

to

Back when I was installing X-windows on a Sun-3, I accidently deleted
the console's font. Not only would that machine not boot, it wouldn't
tell me _why_ it wouldn't boot. It seems that without that font, /vmunix
dies most ungracefully very quickly.

Wietse Venema

unread,

Oct 13, 1992, 6:26:34 PM10/13/92

to

d...@ism.isc.com (Dave Butterfield) writes:

>cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>>>(of course /bin/login needs to be setuid root!)
>>
>>That's not true.

>Whether it's true or not depends on which version of Unix you're
>running.

If login is executable only for root processes (getty, telnetd, etc.)
it does not have to be set-uid.

Wietse

Peter da Silva

unread,

Oct 14, 1992, 7:39:03 AM10/14/92

to

In article <1992Oct13.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
> In current environments there is no need for a login command in the
> shell.

There never was. I don't understand why the shell treated login and newgrp
specially in V7. It was terribly inconvenient. In older environments I use
a version of newgrp that works like "su", and if you want to get the effect
of the shell "login" command there's always "exec login".

> My version of login does a:
> `if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'

Just dike the "login" builtin right out. It's a waste of bytes.
--
Peter da Silva. <pe...@sugar.neosoft.com>.
`-_-' "Megulegetted ma mar a farkasodat ?"
'U`
Dette kan umulig vaere mitt rom, eftersom jeg ikke puster ammoniakk.

Peter da Silva

unread,

Oct 14, 1992, 7:42:37 AM10/14/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:

> Now when I partition a disk I sit there with a calculator and make sure
> all the numbers add up correctly (offsets, number of cylinders, number of
> blocks, and so on).

When I partition the disk I sit there with "sc" and build a spreadsheet for
the purpose. That way I know all my numbers work, and the saved spreadsheet
is documentation on *why* things are that way.

dvs...@minster.york.ac.uk

unread,

Oct 14, 1992, 6:20:41 AM10/14/92

to

I remember my first (and only, so far) major mistake in unix
admin:

I was changing the UIDs of a few users on one of our major
servers, due to a clash with some machines newly connected to the
net. Fine, edit /etc/passwd then chown all their files to the new
UID. So, rather than just assume that all files owned by "fred"
live in /home/machine/fred I did this:

machine# find / -user old_uid -exec chown username {} \;

This was fine... except it was late at night and I was tired, and
in a hurry to get home. I had six of these commands to type, and
as they would take a long time I'd just let them run in the
background over night.....

So, you come in the next morning and a user compains... I can't
login to the 4/490 - it says "/bin/login: setgid: not owner".

Okay.... naive user problem no?

rlogin machine -l root
/bin/login: setgid: not owner

machine console
login: root
/bin/login: setgid: not owner

Okay - I REALLY can't get in... lets reboot single user and see
whats on... this worked. /bin/login is owned (and setuid to) one
of the users whos UID I changed the previous day... infact ALL
FILES in the ENTIRE filesystem are owned by this user..problem!

We `only' lost about 200 man hours through my little typing
mistake: the moral of the story.. beware anything recursive
when logged in as root!

find / -exec chown user {} \;

Oh dear...

Dave

Casper H.S. Dik

unread,

Oct 14, 1992, 6:33:39 PM10/14/92

to

pe...@NeoSoft.com (Peter da Silva) writes:

>In article <1992Oct13.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>> In current environments there is no need for a login command in the
>> shell.

>There never was. I don't understand why the shell treated login and newgrp
>specially in V7. It was terribly inconvenient. In older environments I use
>a version of newgrp that works like "su", and if you want to get the effect
>of the shell "login" command there's always "exec login".

Once upon a long ago, we had one PDP 11 to do Unix on. It was
connected through some crummy port selector (with not enough lines)
and there weren't enough terminals. Using the built-in login was
useful, but exec login would have done the trick. (Except that
my login is no longer set-uid, so ...)

>> My version of login does a:
>> `if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'

>Just dike the "login" builtin right out. It's a waste of bytes.

Replacing /bin/login on all our machines to enhance logging and
implement access controls, we can make time for that.
But installing new versions of all shells, yuck.

But someday someone should remove the login and newgrp built-in
from sh/csh etc.

Casper

Geoff (cbird) Newton

unread,

Oct 14, 1992, 8:28:13 PM10/14/92

to

Well, if everyone else can admit their failures, so can I.
My only severe disruption of a box occured on our administration machine,
where they do lots of, well, administration type stuff.

Its not so much a machine as a toy. It is running xenix.

I had just been adding terminals to be mips box, and was doing the same on
xenix.

Of course to re-read the inittab we just send init a SIGHUP. So I happily did:

kill -1 1

Not a good idea on Xenix. Everything froze. What's happened. Realization. OOPS,
think quick, call out - "Anybody got problems with dibbler (the machine)".
[ Various respones saying it was very slow/dead. ]
"Fear not", say I, "I will fix it - but it looks like it may need a reboot".

Of course I never did tell them it was *my* fault. We can't have them thinking
I can make a mistake. Besides, they would have lynched me.

Of course I deny the entire episode :-)

--
gjn | Email: g...@cs.uq.oz.au (Geoff Newton) | _-_|\
(-| | ITI Systems Obliterator^H^H^H^H^H^H^H^H^H^HAdministrator| / *
| | \_.-._/
| "It was working this morning - honest" | v

Mike Stefanik

unread,

Oct 14, 1992, 2:14:53 PM10/14/92

to

In an article, h...@fwi.uva.nl (Hans Mulder) writes:
>>% ls -l /bin/login
>>-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login
>
>Errhm, Casper, did you notice that the login builtin of the shell no
>longer works? The error message is "Permission denied".

Then you must think there we're really deprived ... ;-)
$ ls -l /bin/login
---x------ 1 root bin 113106 Sep 06 1991 /bin/login

--
Mike Stefanik mi...@pacsoft.com ...!uunet!pacsoft!mike (714) 681-2623
Pacific Software Group, Riverside, CA

Jim Frost

unread,

Oct 15, 1992, 9:50:19 AM10/15/92

to

Isn't the console font stored in ROM? It sounds to me like you set it
up to automatically bring up the X server (maybe using xdm?) rather
than spawning a getty on the console. Normally the ROM console code
would handle things (as it does when you're in the monitor), and I
find it unlikely that you could delete the fonts stored in ROM while
installing software.

jim frost
ji...@centerline.com

Erkka Sutinen

unread,

Oct 15, 1992, 2:37:47 PM10/15/92

to

In article <1992Oct12.0...@jet.uk> d...@jet.uk (David J Stevenson)
writes:

> When this happened to a colleague (when I worked somewhere else) he restored
> vmunix by copying from another machine. Unfortunately, a 68000 kernel does
> not run very well on a Sparc...

Uh. This reminds me of an long and hard day couple of weeks ago.....

I was walking merrily towards my room while world was beautiful and
sun was shining. I got the first hint that something may be wrong when I saw
our sysadmin banging his head to an nearby door. I asked if something
might be wrong, and he told me so while continuing banging. He had made
backups of our old faithful, tolsun (sun3/160) and made an minor typo.
Instead of tar cf /dev/rst0 . he had written tar xf /dev/rst0 .
(Scripts?? we don't use any bleeding scripts!! They restrict creativity
and make improvising impossible! ) Oh well. And the tape happened to be
an old backup from an sparc.

So. No binaries worked, execpt that one could login as box. inetd had
crashed (at which point it did is irrelevant.) There was no active
sessions for that system and there was no way to get in.....

Lets take a break, have some coffee and think it over: There was a
lighter side: all of the disks were mounted to another system via nfs,
and that daemon was still working. List of the files overwritten was on
an log file, and there wasn'y very many of them. backups were on 8mm tapes
and our only 8mm drive was on our server, but with nfs, that wasn't problem.
On the other hand we had lost /usr/bin, /usr/etc,
/usr/kvm and /ucb . Ugh! I think I don't like this.

Fine. Lets take everything back from tape.... wait a minute... We had
just installed new operating system, which had taken several
days since additional upgrade hadn't worked due to lack of disk space
and we had an rather unorthodox system running, which worked fine
with our add-ons, such as appletalk box... And we hadn't make backups
of this new system yet... Aww shit. We couldn't afford to lose this system,
it had too much sweat already in it.

Fortunately we had all the binaries of the upgraded system on our servers
disks, and with nfs we copied everything to the faulty system. Blessed
is the network, who rules our lives!
Nothing happened. Nothing worked. Ah! We just have to restart inetd.
Hmmmm. but how. Do not worry, this is Unix, full of possibilities! First
cron, but the clock of that system was totally out of this world showing
probably the time of Ouagadougou and we didn't know where was that! Ah!
making some false login attempts will bring error messages to the console.
Nothing happened... I guess syslog had died too.... When daemons are in
agony, no program can stand without feeling some sympathy towards those who
suffer so....

Oh no... But of course. Removing roots password.... Why are you brain cells
sometimes so slow.... This made it.... But still..... Nothing worked...
Ah, we had the upgraded binaries... Hm. Ahh. the sun's unupgrade option
was used while upgrading the system, so we just reunupgraded the system.
No problem. Everything is fine again... ?? Why doesn't that system
work?? Why our server doesn't let anyone in... Awwww...Jesus Christ.
What had I done!!! Unupgraded our SparcServer with sun3's binaries....
Awww shit and <add your favourite curses here. Mine is sys V curses>.

At this point I was trying to find an good way to get rid of all of our
computers.... Would anyone notice if I just dumped our systems to river
and claimed that we have never had any computers.... No hope. Back to work.

And at the same moment, the cron miraculously worked, and we had two inetd's
running. No harm done. It is just a nuicance.

No problem. Don't panic. Both of the systems are up and running....
Well aware of the fact that if there are any more mistakes, the systems
will crash and they will not come up. We didn't even have our server's
operating system handy, (one disk in whole unversity... ) I took the
most recent backup of our server and carefully extracted the rpc.passwdauth
and couple of other files from the tape, and lo ... everything worked again.

The reunupgrade procedure was made at this time at the right computer,
and at our suprise it started to work again.... The world is a miraculous
place......While ps again operational, we checked the daemons still
running.... nfsd and getty were only systems that had stayed up in addition
to the usual bunch (init etc...)

At this point we thought everything was fine... Lets make the backup now
and take a good care not to touch anything..... It worked....

Couple of days later I noticed that the restoring of our servers daemons
had gone to an wrong directory and the mail hadn't been running for
two days... No problem... Execpt that delivering two days worth of mail
brought our server to its knees... Fortunately it didn't crash but it
was ssslllllooooowwwwww for couple of hours....

.....and once again our heroes had defeated evil nazis and were heading
home for meal.......

Moral of the story: DON'T PANIC . There are several ways to handle unix
as long as it is running. Do NOT try to reboot
it. It only brings sorrow with it.
Moral of the story 2: Douple-check every command you give. If one of your
commands has an error, it is the one which causes
biggest damage.
Moral of the story 3: Check the machine you are in, before you give any even
possibly destructive commands... Networks can be
a real nuicance you know.....

--
======================================#=======================================
Erkka Pietari Sutinen #Eke what availeth Maner and Gentlinesse
e...@rieska.oulu.fi / sut...@csc.fi # Without yow, benygne creature?
University of Oulu. Finland # Shal Cruelte be your governesse?
Dep. of Information Prosessing Science# -Chaucher

Anthony DeBoer

unread,

Oct 16, 1992, 9:39:43 AM10/16/92

to

In article <1992Oct10....@waggen.twuug.com> brob...@waggen.twuug.com (Bill Roberts) writes:
>My most interesting in the reguard was when I deleted "/dev/null". Of
>course it was soon recreated as a "regular file", then permission problems
>started to show up.

I was once called in to save a system where most things worked, but the
main application package being used on it hung the moment you entered it
(leaving the system more than a little useless for getting things done).
I poked around for awhile, verified that the application's files were all
present, undamaged, and had the right permissions. The folks who
normally used the machine had also discovered that all was well if root
tried to run it. But nothing was visibly wrong anywhere. So, being a
bit hungry by then, I took a break for supper, and about halfway through,
the little voice at the back of my head that sometimes helps me said,
"/dev/tty". Sure enough, somebody had chmod'ded it to 0644, and the
application directed (or tried to direct, in this case) all its I/O
through it rather than just using stdin/stdout like a sane normal process.
--
Anthony DeBoer a...@geac.com | uunet!geac!adb | GEM: ANTHONY.DEBOER

Greg Lehey

unread,

Oct 16, 1992, 7:44:14 AM10/16/92

to

In article <16...@umd5.umd.edu> matt...@oberon.umd.edu (Mike Matthews) writes:
>The moral? *NEVER* move something important. Copy, VERIFY, and THEN delete.

Something like this bit me just yesterday. I'm currently trying to
work out how ISC Unix/386 handles COFF files, and discovered the
/shlib directory, which I suspected wasn't really used (*wrong*). So,
to try it out, I did:

+ root adagio:/ 819 -> mv shlib slob
+ root adagio:/ 820 -> xterm
+ /usr/bin/X11/xterm: Can not access a needed shared library

So far, so good. So, put it back:

+ root adagio:/ 821 -> mv slob shlib
+ /bin/mv: Can not access a needed shared library

Oops! So, tried it from a different system, but didn't have
permission, so:

+ root adagio:/ 822 -> chmod 777 slob
+ /bin/chmod: Can not access a needed shared library

OK, so let's just cp them across.

+ root adagio:/ 823 -> cd slob
+ root adagio:/slob 824 -> mkdir /shlib
+ /bin/mkdir: Can not access a needed shared library
+ root adagio:/slob 825 ->

Then I wrote a program which just did a link(2) of the directories.
Yes, gcc and ld didn't have any problems, but even after the link was
in place, it still didn't work. I had to reboot (but nothing else),
after which it did work. No idea why that made any difference.
--
Greg Lehey | Tel: +49-6637-1488
LEMIS | Fax: +49-6637-1489
Schellnhausen 2, W-6324 Feldatal, Germany

The Eno

unread,

Oct 16, 1992, 11:13:35 AM10/16/92

to

d...@ism.isc.com (Dave Butterfield) writes:

>cas...@fwi.uva.nl (Casper H.S. Dik) writes:

>>>(of course /bin/login needs to be setuid root!)
>>
>>That's not true.

>Whether it's true or not depends on which version of Unix you're
>running.

I suppose it depends whether you have plain vanilla or chocolate unix? ;-)

Aj.
--
-------------------------------[ The Eno ]------------------------------------.
JANET : da...@uk.ac.city da...@uk.ac.city.cs
I'net : da...@city.ac.uk da...@cs.city.ac.uk
BITNET: da188%city....@cunyvm.cuny.edu da188%cs.cit...@cunyvm.cuny.edu

Ian Chard

unread,

Oct 16, 1992, 9:05:21 AM10/16/92

to

I had the dubious honour of administrating a 3B2/300. No manual pages,
no nothing. Lovely. Anyway, back to the story.

The company in question _survives_ because it has a mailing list with x
thousand names and addresses (where x is a constant the value of which I have
forgotten). The very nasty DBMS being used can remain anonymous as I don't
really want to get sued into the Earth's core. Suffice it to say that there
was no direct access to the database other than building a form and some code,
and someone had asked me to delete one record (a previous sysadm had removed
the "delete" option from the user form, along with all the source code, just
before he walked out :-) ).

I set about writing a very short program in the DBMS's 4GL which looked
something like:

use file 'fmlist'
if c_name = 'Foo Inc' then
delete;

...or similar. You get the idea. Unfortunately the DBMS didn't, and set about
deleting every record. Thirty seconds later (the box had 16 terminals; our
maintenance people recommended no more than 6) I realised that something was
wrong and interrupted it. A quick query revealed that the world no longer
contained anyone whose name starts with A or B. Damn. Oh God, damn, damn,
erm, oh, damn, {face turns white} etc. I reach for the backup tapes. Oh
God, damn, no, please, not that. The tapes were there. Unfortunately they
were not numbered, and the somewhat hacked-together backup software we were
using core dumped if it came across a tape that was out-of-sequence. Now there
were six tapes, giving a total of... oh GOD! combinations. The users are
beginning to jump up and down, asking why "routine maintenance" (all I could
think of under pressure) was taking so damn long, and why I was looking so
pale.

Finally I hit the right combination at 2:15am the following morning.

I thought I never wanted to see another backup tape ever again.

And then, six weeks later, the disk crashed.

I changed my mind very quickly.

Ian.
--

[ Ian Chard, Systems Integration | "Kryten, unpack Rachel and get out the ]
[ University of Manchester, UK | puncture repair kit." ]
[ Email: cha...@cs.man.ac.uk | -- Rimmer (un-H), Red Dwarf ]

Rob Slade

unread,

Oct 16, 1992, 8:04:56 PM10/16/92

to

Hope this fits.

I had a job one time teaching Pascal at a "visa school". The machine was a
multi-user micro that ran UNIX. I have enough stories from that one course
to keep a group of computer educators in stitches for at least half an hour.

The finale of the course was on the last day of classes. When I showed up
and powered up the system, it refused to boot. Since all the students' term
projects and papers were in the computer, it was fairly important. After
a few hours of work, and consultation with the other teacher, who did the
sysadmin and maintenance, we were finally informed that the new admin
assistant around the place had decided that the layout of the computer lab
was unsuitable. (I had noticed that all the desk were repositioned: I thought
the other teacher had done it, he thought I had.) The AA had, the night
before, moved all the furniture, including the terminals and the micro. She
did not know anything about parking hard disks.

We knew now, that we were in trouble, but we didn't realize how much until
we started reading up on emergency procedures. For some unknown reason,
booting the micro from the original system disks would automatically reformat
the hard disk.

(The visa school refunded the tuition for all the students in that course.)

Joseph M. Newcomer

unread,

Oct 16, 1992, 9:55:18 PM10/16/92

to

Excerpts from netnews.alt.folklore.computers: 11-Oct-92 Re: WANTED: Unix
administra.. by Eiji Hi...@cc.swarthmore
> Moral of all these stories: if I had to hire a Unix sysadmin, the first
> thing I'd look for is experience. NOTHING can substitute for down-to-earth,
> real-life grungy experience in this field.

First quoted to me by Fred Brooks: "Good judgment is the result of
experience. Experience is the result of bad judgment".
joe

Joseph M. Newcomer

unread,

Oct 16, 1992, 10:59:57 PM10/16/92

to

Excerpts from netnews.alt.folklore.computers: 13-Oct-92 Re: WANTED: Unix
administra.. by Obi Tho...@gumby.ocs.com

> Now when I partition a disk I sit there with a calculator and make sure
> all the numbers add up correctly (offsets, number of cylinders, number of
> blocks, and so on).

Hey! Just the job for a computer! (I wonder sometimes if we have ever
made any progress in this field. Only a totally brain-dead programmer
would REQUIRE that the sysadmin do this by hand, but I've found that some
of the most important code, such as partitioning code, which SHOULD be
automated, check for sensibility, etc. has been designed and implemented
by high school students on summer jobs, or so it appears)
joe

Peter da Silva

unread,

Oct 17, 1992, 8:51:39 AM10/17/92

to

In article <23...@adagio.lemis.uucp> gr...@lemis.uucp (Greg Lehey) writes:
> + root adagio:/ 819 -> mv shlib slob
> + root adagio:/ 820 -> xterm
> + /usr/bin/X11/xterm: Can not access a needed shared library

> So far, so good. So, put it back:

> + root adagio:/ 821 -> mv slob shlib
> + /bin/mv: Can not access a needed shared library

setenv LD_LIBRARY_PATH /slob

We had a horror story like this: we just lost the superblock on /home1 on
a Sparc2. No problem, just recover from one of the spares. Where are they?
Well, let's pull up Answerbook and see...

% openwin
ld.so, Can't open /usr/openwin/lib/libXmu.so.4.0, Errno 2
% ls /usr/openwin/lib/libXmu.so.4.0
/usr/openwin/lib/libXmu.so.4.0
% file !$
file /usr/openwin/lib/libXmu.so.4.0
/usr/openwin/lib/libXmu.so.4.0: Symbolic link to /home1/motif113/libXmu.so.4.0

ARGH

I *like* online documentation. I hate it when it's the ONLY documentation.
If we didn't have about 70 other Sparcs I'd be really pissed.

--
Peter da Silva. <pe...@sugar.neosoft.com>.

`-_-' "Megőlegetted ma már a farkasodat ?"

Peter da Silva

unread,

Oct 17, 1992, 8:59:48 AM10/17/92

to

That's BSD for you.

Now System V has a nice automated mechanism that lets you build all sorts of
standard disk configurations automagically. The problem comes when you want to
do something weird, and have to edit the partitions file by hand. Like make a
larger ESDI disk look exactly like a smaller MFM disk with different geometry
except the last partition is bigger.

(not that I don't have gripes about System V. For all the money AT&T has spent
on it, couldn't they have got a summer hire to go through all the utilities
and at least make sure they called perror?)

Daniel Weise

unread,

Oct 17, 1992, 9:27:10 AM10/17/92

to

Who started this thread, and why? Please reply directly to me.

Keith Smith

unread,

Oct 17, 1992, 5:50:57 PM10/17/92

to

My dumbest move ever. Client in Charlotte, NC (3 hours + away) has
Xenix box with like 15 users running single app. They have a tape
backup of course. Anyway they ran slam out of space on the 70MB disk
drive so I upgraded them from an MFM to a SCSI 150MB disk. Restored
their app & data files, and they were off and running. Anyway they did
an application directories backup (tar) on a daily basis and backed the
rest of the system up with tar on Monday morning.

Being a nice guy I built a menu system and installed the backups on the
menu so they could do it with a push of the button. Swell, It's Monday
Call if anything else comes up. 1 week later I get a call. Console is
scrolling messages, App seems to be missing yesterday's orders, etc.
Call in, and cannot log in. 'w' doesn't work. Crazy stuff. Really
strange.

Grab old drive/controller, fly to Charlotte replace drive, install
app backup tape. They re-key missing stuff, etc. Bring new disk back.
Won't boot, won't do anything. Boot emergency floppy set. Looking
around. Can't figure but have backup tape from that morning that
"completed successfully". tar tvf /dev/rct0. Hmm, why all these
files look very OLD. Uh, Where, Uh. Look at menu command for the
"backup" is 'tar xvf /dev/rct0 /'

Anyway, I owned up to the mistake, re-loaded the SCSI drivers and
changed the command to 'tar cvf ..'

Hehehe, Now I DOUBLE check what I put on a menu, and try not to be in a
*HURRY* when I do this stuff.
--
Keith Smith uunet!ksmith!keith 5719 Archer Rd.
Digital Designs BBS 1-919-423-4216 Hope Mills, NC 28348-2201
Somewhere in the Styx of North Carolina ...

Bruce Albrecht

unread,

Oct 17, 1992, 9:51:52 PM10/17/92

to

In article <1992Oct13.1...@eurotherm.co.uk> mt...@eurotherm.co.uk (Martin Tomes) writes:
(story about overlapping partitions deleted)

>Solution, rebuild all the partitions so they don't overlap and
>restore, also buy the sysadmin a calculator.
>
>Moral, always do your sums on the /etc/partitions file very carefully
>before using mkpart.

The right solution is for the system software to maintain enough information
about the partitions for it to figure out that partitions will overlap,
and not let one configure it without at least a prompt warning that this
is happening.

--
br...@zuhause.mn.org
Youth is wasted on the young.

Mike Stefanik

unread,

Oct 18, 1992, 1:03:20 PM10/18/92

to

One of the more interesting problems that I ran into was a customer that
was having problems with their SCSI tape drive on a XENIX box. Around midnight,
every night, the system would automatically backup and verify their data. One
day, the customer needed to restore some data files from the last night's
backup. She called because, although the restore worked just fine, she didn't
see the busy light on the drive come on, and it didn't sound like the tape was
moving. I dialed up the system, had her put a tape in and did a retension --
the drive started winding the tape back and forth, and we both concluded that
she was mistaken. After all, the tape was retensioning, and she wasn't getting
any backup or verify errors at all. I just chalked this one up to user
confusion.

A few days later, she called back saying that there really is something wrong
with the tape. She needed to restore some data from a few days ago, and like
before, the busy light on the drive didn't come on, but files did restore.
However when she started the application program, the data hadn't changed. I
dialed up the system again, and just on a fluke, issued a "df" -- it showed
their rather large root filesystem to be nearly full. Confused, I did a "find",
searching for files over 1MB. Of course, what I found was this huge file named
/dev/rct0. As I later discovered, their system had crashed a few weeks ago,
and she had simply answered "yes" to a bunch of questions that it asked when
she brought it back up. The /dev/rct0 device was removed (but /dev/xct0 was
still there, which allowed me to retension the tape) and the backup script
never checked to make sure that it was actually writing to a character device.

Needless to say, I modified the backup program to make sure that it was really
writing to a device, and I made her promise to call me whenever the system
crashed or asked "funny questions" when it was booting.

Daniel Briggs

unread,

Oct 19, 1992, 3:25:39 PM10/19/92

to

Did anyone by chance archive the post of a year or so ago where someone
described the recovery of a Unix box from a partial "rm -r *" (where root
forgot that he was in /) ? They had lost everything up to (and including?)
/etc before the command was stopped. I seem to recall that they would lose
everything on the disk if they reinstalled the system, so there were very
good reasons to try and restore the barely running system. Of course
almost all of the utilities that they needed to do it had lived in /bin.
There were a few goodies in /usr/5bin that helped them out. The fix
eventually involved writing a bootstrap network utility on another machine,
and assembling it there, typing in the binary in an emacs process that was
still running, and overwriting some other system utility that had the
correct execute permissions, (since they couldn't chmod anything!). It was
a wonderful example of recovery from a near fatal error. If it floats my
way again, I'd love to get a copy of that post.

Trip Martin

unread,

Oct 19, 1992, 11:01:51 PM10/19/92

to

In <1992Oct19....@zia.aoc.nrao.edu> dbr...@zia.aoc.nrao.edu (Daniel Briggs) writes:

>Did anyone by chance archive the post of a year or so ago where someone
>described the recovery of a Unix box from a partial "rm -r *" (where root
>forgot that he was in /) ? They had lost everything up to (and including?)
>/etc before the command was stopped. I seem to recall that they would lose
>everything on the disk if they reinstalled the system, so there were very
>good reasons to try and restore the barely running system. Of course
>almost all of the utilities that they needed to do it had lived in /bin.
>There were a few goodies in /usr/5bin that helped them out. The fix
>eventually involved writing a bootstrap network utility on another machine,
>and assembling it there, typing in the binary in an emacs process that was
>still running, and overwriting some other system utility that had the
>correct execute permissions, (since they couldn't chmod anything!). It was
>a wonderful example of recovery from a near fatal error. If it floats my
>way again, I'd love to get a copy of that post.

Yup, I saved a copy because it was such a classic story. It's apparently
been re-posted every so often for a number of years, and it's worth
posting again. So here it is...
-------------------------------------------------------------------
From alt.folklore.computers Fri Nov 9 11:16:43 1990
Path: rpi!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!utgpu!utzoo!sq!msb
From: m...@sq.sq.com (Mark Brader)
Newsgroups: alt.folklore.computers
Subject: rm -rf / (was Hex vs. Octal)
Summary: repost
Message-ID: <1990Nov8.0...@sq.sq.com>
Date: 8 Nov 90 08:25:50 GMT
References: <1990Nov5.1...@hq.demos.su>
Organization: SoftQuad Inc., Toronto, Canada
Lines: 184
Status: OR

> ... if you're trying rm -rf / you'll NEVER get a clear disk - at least
> /bin/rm (and if it reached /bin/rmdir before scanning some directories
> then add a lot of empty directories). I've seen it once...

Then it must be version-dependent. On this Sun, "cp /bin/rm foo"
followed by "./foo foo" does not leave a foo behind, and strings
shows that rm appears not to call rmdir (which makes sense, as it
can just use unlink()).

In any case, I'm reminded of the following article. This is a classic
which, like the story of Mel, has been on the net several times;
it was in this newsgroup in January. It was first posted in 1986.

--------------------------------------------------------------------

Have you ever left your terminal logged in, only to find when you came
back to it that a (supposed) friend had typed "rm -rf ~/*" and was
hovering over the keyboard with threats along the lines of "lend me a
fiver 'til Thursday, or I hit return"? Undoubtedly the person in
question would not have had the nerve to inflict such a trauma upon
you, and was doing it in jest. So you've probably never experienced the
worst of such disasters....

It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15
BST, to be precise, when Peter, an office-mate of mine, leaned away
from his terminal and said to me, "Mario, I'm having a little trouble
sending mail." Knowing that msg was capable of confusing even the
most capable of people, I sauntered over to his terminal to see what
was wrong. A strange error message of the form (I forget the exact
details) "cannot access /foo/bar for userid 147" had been issued by
msg. My first thought was "Who's userid 147?; the sender of the
message, the destination, or what?" So I leant over to another
terminal, already logged in, and typed
grep 147 /etc/passwd
only to receive the response
/etc/passwd: No such file or directory.

Instantly, I guessed that something was amiss. This was confirmed
when in response to
ls /etc
I got
ls: not found.

I suggested to Peter that it would be a good idea not to try anything
for a while, and went off to find our system manager.

When I arrived at his office, his door was ajar, and within ten
seconds I realised what the problem was. James, our manager, was
sat down, head in hands, hands between knees, as one whose world has
just come to an end. Our newly-appointed system programmer, Neil, was
beside him, gazing listlessly at the screen of his terminal. And at
the top of the screen I spied the following lines:
# cd
# rm -rf *

Oh, shit, I thought. That would just about explain it.

I can't remember what happened in the succeeding minutes; my memory is
just a blur. I do remember trying ls (again), ps, who and maybe a few
other commands beside, all to no avail. The next thing I remember was
being at my terminal again (a multi-window graphics terminal), and
typing
cd /
echo *
I owe a debt of thanks to David Korn for making echo a built-in of his
shell; needless to say, /bin, together with /bin/echo, had been
deleted. What transpired in the next few minutes was that /dev, /etc
and /lib had also gone in their entirety; fortunately Neil had
interrupted rm while it was somewhere down below /news, and /tmp, /usr
and /users were all untouched.

Meanwhile James had made for our tape cupboard and had retrieved what
claimed to be a dump tape of the root filesystem, taken four weeks
earlier. The pressing question was, "How do we recover the contents
of the tape?". Not only had we lost /etc/restore, but all of the
device entries for the tape deck had vanished. And where does mknod
live? You guessed it, /etc. How about recovery across Ethernet of
any of this from another VAX? Well, /bin/tar had gone, and
thoughtfully the Berkeley people had put rcp in /bin in the 4.3
distribution. What's more, none of the Ether stuff wanted to know
without /etc/hosts at least. We found a version of cpio in
/usr/local, but that was unlikely to do us any good without a tape
deck.

Alternatively, we could get the boot tape out and rebuild the root
filesystem, but neither James nor Neil had done that before, and we
weren't sure that the first thing to happen would be that the whole
disk would be re-formatted, losing all our user files. (We take dumps
of the user files every Thursday; by Murphy's Law this had to happen
on a Wednesday). Another solution might be to borrow a disk from
another VAX, boot off that, and tidy up later, but that would have
entailed calling the DEC engineer out, at the very least. We had a
number of users in the final throes of writing up PhD theses and the
loss of a maybe a weeks' work (not to mention the machine down time)
was unthinkable.

So, what to do? The next idea was to write a program to make a device
descriptor for the tape deck, but we all know where cc, as and ld
live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and
so on, so that /usr/bin/ftp would work. By sheer luck, I had a
gnuemacs still running in one of my windows, which we could use to
create passwd, etc., but the first step was to create a directory to
put them in. Of course /bin/mkdir had gone, and so had /bin/mv, so we
couldn't rename /tmp to /etc. However, this looked like a reasonable
line of attack.

By now we had been joined by Alasdair, our resident UNIX guru, and as
luck would have it, someone who knows VAX assembler. So our plan
became this: write a program in assembler which would either rename
/tmp to /etc, or make /etc, assemble it on another VAX, uuencode it,
type in the uuencoded file using my gnu, uudecode it (some bright
spark had thought to put uudecode in /usr/bin), run it, and hey
presto, it would all be plain sailing from there. By yet another
miracle of good fortune, the terminal from which the damage had been
done was still su'd to root (su is in /bin, remember?), so at least we
stood a chance of all this working.

Off we set on our merry way, and within only an hour we had managed to
concoct the dozen or so lines of assembler to create /etc. The
stripped binary was only 76 bytes long, so we converted it to hex
(slightly more readable than the output of uuencode), and typed it in
using my editor. If any of you ever have the same problem, here's the
hex for future reference:
070100002c000000000000000000000000000000000000000000000000000000
0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f
8800040000bc012f65746300

I had a handy program around (doesn't everybody?) for converting ASCII
hex to binary, and the output of /usr/bin/sum tallied with our
original binary. But hang on---how do you set execute permission
without /bin/chmod? A few seconds thought (which as usual, lasted a
couple of minutes) suggested that we write the binary on top of an
already existing binary, owned by me...problem solved.

So along we trotted to the terminal with the root login, carefully
remembered to set the umask to 0 (so that I could create files in it
using my gnu), and ran the binary. So now we had a /etc, writable by
all. From there it was but a few easy steps to creating passwd,
hosts, services, protocols, (etc), and then ftp was willing to play
ball. Then we recovered the contents of /bin across the ether (it's
amazing how much you come to miss ls after just a few, short hours),
and selected files from /etc. The key file was /etc/rrestore, with
which we recovered /dev from the dump tape, and the rest is history.

Now, you're asking yourself (as I am), what's the moral of this story?
Well, for one thing, you must always remember the immortal words,
DON'T PANIC. Our initial reaction was to reboot the machine and try
everything as single user, but it's unlikely it would have come up
without /etc/init and /bin/sh. Rational thought saved us from this
one.

The next thing to remember is that UNIX tools really can be put to
unusual purposes. Even without my gnuemacs, we could have survived by
using, say, /usr/bin/grep as a substitute for /bin/cat.

And the final thing is, it's amazing how much of the system you can
delete without it falling apart completely. Apart from the fact that
nobody could login (/bin/login?), and most of the useful commands
had gone, everything else seemed normal. Of course, some things can't
stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but
by and large it all hangs together.

I shall leave you with this question: if you were placed in the same
situation, and had the presence of mind that always comes with
hindsight, could you have got out of it in a simpler or easier way?
Answers on a postage stamp to:

Mario Wolczko
------------------------------------------------------------------------
Dept. of Computer Science ARPA: miw%uk.ac.m...@cs.ucl.ac.uk
The University USENET: mcvax!ukc!man.cs.ux!miw
Manchester M13 9PL JANET: m...@uk.ac.man.cs.ux
U.K. 061-273 7121 x 5699
------------------------------------------------------------------------

Reposted to alt.folklore.computers this time around by:
--
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, m...@sq.com
"UNIX are quality sectional bookcases, made of solid oak.
Open or glass-fronted, in three sizes and three finishes,
UNIX gives unapproached flexibility."
-- Daily Mail Ideal Home Book, 1951-52

--
Trip Martin
ni...@acm.rpi.edu
--
Trip Martin
ni...@acm.rpi.edu

Dan Prener

unread,

Oct 24, 1992, 3:41:30 AM10/24/92

to

In article <1992Oct24.0...@gossip.urich.edu> he...@nomad.urich.edu (Helen C. O'Boyle) writes:

[ ... horror stories deleted ... ]

>Backups (along with network cabling) just seem to be a great place
>for trouble to hide, waiting for the unwary. In theory it's simple,
>and in practice, it's not *too* hard to do *mostly* right (online
>file catalog, dd at optimal streaming blocksize, tape file checksum
>verification, etc.) and not impossible to do *really* right (for one,
>toss QIC tapes altogether, use a backup method that accounts for
>special files, etc.) ....... but in the real world of non-UNIX-wizard
>SA's, "right" and "recommended by the vendor, so it MUST be the best
>way, right?" are often considered synonyms when they're really closer
>to being antonyms.

The problem is not that backups are difficult to do; they're not.

The real difficulty is that anything that doesn't get exercised can't be
expected to work. This is true not only for programs but for mechanical
devices. (If you never used a car, but kept one around for emergencies,
never running it and never inspecting it, you could be sure that it would
not work when you finally needed it for some emergency.) Even when programs
work, you can't reasonably expect them to keep working for long periods of
time if they are never used. Enough other things change around them so
that you might as well assume that there really is such a phenomenon as
bit rot.

Backups, by their nature, tend not to be exercised. If you want to
make sure your backup procedures are really working, you should probably
do something like randomly choosing a file and restoring it once a month.
--
Dan Prener (pre...@watson.ibm.com)

Helen C. O'Boyle

unread,

Oct 24, 1992, 1:02:26 AM10/24/92

to

In article <14...@pacsoft.com> mi...@pacsoft.com (Mike Stefanik) writes:
> [ beginnings of backup horror story deleted ]

>However when she started the application program, the data hadn't changed. I
>dialed up the system again, and just on a fluke, issued a "df" -- it showed
>their rather large root filesystem to be nearly full. Confused, I did a "find",
>searching for files over 1MB. Of course, what I found was this huge file named
>/dev/rct0.

That sounds familiar. A couple years ago, when I was out at a client site
doing some network support, they asked me to look at their main UNIX box,
to see if some files could be deleted to make more space. ZAPPPP went the
13 meg cron log. ;-) ZAPPPP went the 17 meg pacct file. Then for the
"find", just to make sure there weren't some other accounting files
growing without bound in an unfamiliar place. "GOLLLL-LLLEEEEE," said I,
quickly followed by "UHHHHH OHHHHHH," when I saw a 55 meg file in /dev,
whose name sounded like a tape device. An ls confirmed that it was
ALMOST, but not quite, the tape drive device name. ;-) I checked their
backup script (written by the manufacturer's techs when the system was
installed over a year prior to that date) and saw the typo. I fixed
the typo, pointed out the problem to the SA, and made a few friends at
that company. They had never noticed that their tape drive didn't
run during backups, because backups ran at 2am when they were asleep.
Likewise, they'd never realized that the dozens of tapes they'd
purchased (and some, duly discarded after N "uses"!!! ;-) had never
been written to, because they never needed to do a restore.

I don't know what's more amazing:
1) That the techs didn't at least test the backup script
before they left the site after the installation.
2) That this customer did not, in the space of more than
a year, *ever* need to do a restore (at least, that is
what the admin maintained to me with a straight face,
not really comprehending that anyone would ever WANT
to restore the previous day's contents of a file (???)).

Of course, if I'm getting some yukkks in at others' expense, might
as well throw some stones at my house, too. Another backup story.

Six years ago or so, AT&T 3B2's had a command which would (allegedly ;-)
write verified backups. By failing verify periodically, it even did a
good job of convincing the user that it did a decent verify. Until
Programmer Z decided she needed a file back. Admin H grabbed the evening
backup clerk's logs, found the tape set, put the first tape in, and after
a little while, SPLAT, I/O error. Skipping past that, very soon, SPLAT,
another I/O error. The online backup record indicated that the backup
had completed without errors. Puzzled was I, but I grabbed the weeklies.
Same problem. A good copy of the file was eventually retrieved from
SOME backup set, but most of the tapes were full of I/O errors, despite
system claims of successful verifies. This pleased management no end.
It was a consulting company, and the tapes being written by that machine
contained the company's bread and butter. In many cases, the only copies
of the sources to systems at client sites were found on that machine.

Yawn. I don't recall how the utility did its verify, but it didn't
verify nearly enough for most users/programmers *I* know to be
comfortable with it. So, I added a manual verify pass to the backup
(well, like probably everyone else who's ever seriously attacked the
job of being an SA, I added LOTS of things when I rewrote it in disgust,
but that's another story).

Backups (along with network cabling) just seem to be a great place
for trouble to hide, waiting for the unwary. In theory it's simple,
and in practice, it's not *too* hard to do *mostly* right (online
file catalog, dd at optimal streaming blocksize, tape file checksum
verification, etc.) and not impossible to do *really* right (for one,
toss QIC tapes altogether, use a backup method that accounts for
special files, etc.) ....... but in the real world of non-UNIX-wizard
SA's, "right" and "recommended by the vendor, so it MUST be the best
way, right?" are often considered synonyms when they're really closer
to being antonyms.

--
Helen C. O'Boyle | Netnews admin, GNU support beastie, C hacker,
he...@nomad.urich.edu | UNIX contractor and hanger-on in Richmond, VA

Lars Peter Fischer

unread,

Nov 2, 1992, 8:37:26 PM11/2/92

to

[ Our news server has been badly broken, so I'm resending this.
Sorry if you've seen it before. ]

>>>>> "Gary" == Gary Fowler (gfo...@javelin.sim.es.com)

Gary> Now if some one would only create an OS with a "Do what I mean,
Gary> not what I say" feature.

:DWIM: /dwim/ [acronym, `Do What I Mean'] 1. adj. Able to guess,
sometimes even correctly, the result intended when bogus input was
provided.
[...]
In one notorious incident, Warren added a DWIM feature to the
command interpreter used at Xerox PARC. One day another hacker
there typed `delete *$' to free up some disk space. (The
editor there named backup files by appending `$' to the
original file name, so he was trying to delete any backup files
left over from old editing sessions.) It happened that there
weren't any editor backup files, so DWIM helpfully reported
`*$ not found, assuming you meant 'delete *'.' It then started
to delete all the files on the disk! The hacker managed to stop it
with a {Vulcan nerve pinch} after only a half dozen or so files
were lost.

(Jargon File 2.9.10)

/Lars
--
Lars Fischer, fis...@iesd.auc.dk | It takes an uncommon mind to think of
CS Dept., Aalborg Univ., DENMARK. | these things. -- Calvin

Lars Peter Fischer

unread,

Nov 2, 1992, 8:15:14 PM11/2/92

to

[ Our news server has been badly broken, so I'm resending this.
Sorry if you've seen it before. ]

>>>>> "J.Rowe" == J.Rowe (JR...@cen.ex.ac.uk)

J.Rowe> That's why I *always*always*always* have the directory in my prompt.
J.Rowe> And set 'rmstar' in tcsh to avoid those unwanted 'rm foo *' problems.
J.Rowe> (If the variable rmstar is set tcsh always asks before doing an rm *).

I hate those long prompts, and I hate getting all those "do you really
mean that" questions. I've been hit by "rm *" twice, and both times it
was the dreaded delete-return key (most keyboard have delete just
above return, so it's easy to hit both, with nice roll-over action.
Nasty when doing "rm *~" -- I have an alias for that now :-)

Lars Peter Fischer

unread,

Nov 2, 1992, 8:15:33 PM11/2/92

to

[ Our news server has been badly broken, so I'm resending this.
Sorry if you've seen it before. ]

>>>>> "Alan" == Alan Saunders (al...@spuddy.uucp)

Alan> On his return, he decided to put what he had learned
Alan> into practice, and changed the ownership of all files in /bin, /usr/bin
Alan> to bin.bin!

In a related move, a new computing center sysadmin here once decided
that there was no reason for people to be able to copy system
commands, so he did a "chmod a-r /bin/* /usr/bin/*".

Lars Peter Fischer

unread,

Nov 2, 1992, 8:14:48 PM11/2/92

to

[ Our news server has been badly broken, so I'm resending this.
Sorry if you've seen it before. ]

>>>>> "Peter" == Peter da Silva (pe...@NeoSoft.com)

Peter> When I partition the disk I sit there with "sc" and build a
Peter> spreadsheet for the purpose.

Good idea.

Peter> That way I know all my numbers work, and the saved spreadsheet
Peter> is documentation on *why* things are that way.

I hope you print it out -- reminds me of a friend that proudly showed
me a fine little database he had hacked together with info about the
contents of all his tapes, including backups. Made it very easy to
find things, find the next tape to reuse, etc. Lots of neat searching
commands and stuff. When I asked to see a printout, he said "Printout?
why?".

Darin Cowan - root

unread,

Nov 3, 1992, 8:13:52 AM11/3/92

to

fis...@iesd.auc.dk (Lars Peter Fischer) writes:

> In a related move, a new computing center sysadmin here once decided
> that there was no reason for people to be able to copy system
> commands, so he did a "chmod a-r /bin/* /usr/bin/*".
>

I used to be the tech support for a large system of unix machines in the
Military. Military system managers are often "volunteers", so you don't
always get knowledgeable people there...

One day I get a call from Master Corporal who says his system was locked
up. Seems he just got back from a DOS course where he noted that when
you type "del *.*" it asks "Are you sure?"

Wondering if that was true with Unix as well, he logged in as root and
typed "rm -r *" at the prompt.

====
co...@cerianthus.pinetree.org
(Darin Cowan - root)

This beautiful post is the 553th on this system!

Chuck Tomasi

unread,

Nov 4, 1992, 7:47:07 AM11/4/92

to

I was assigned to part of a team to upgrade our board test stations
back in January. There were about six or seven systems to uprade OS
(HP-UX 7.0 -> 8.0) and upgrade the board test software. Before
starting each system we made a backup of the data to cartridge tape.
This was taking an incredibly long time for each system (located in
different buildings) so we got the idea of doing a backup across the
network to a 4mm DAT located on yet another system.

When the backup had completed cranking away we reformatted the disks
(to reconfigure swap and other things), loaded the new OS, loaded the
new software, and then found out to our horror that all the data was
not backed up. There were approximately 12 designs lost to varying
degrees, but none were usable. Each design is valued at $30,000-
40,000 each. My stomach sank.

We spent the rest of the night and all the next day trying to figure
out where the data might be recovered from and how much data we may
have lost. It was a 36 hour work day and we were supposed to go out
to dinner with some friends that night (Friday). I dropped my
flatware continuously, couldn't make simple descisions (eg: "What
would you like to drink?") and for the most part was out of the
conversation entirely unless someone mentioned backups or Unix. As it
turns out we were able to recover (from a backup two months ago) all
but one design which I believe eventually surfaced. We had egg on our
face - especially since I knew that backups over NFS (as root) had
problems. I should have seen it coming and didn't.

It wasn't long thereafter that I got permission to order an 8mm
Exabyte drive to increase our backup policy.

The good thing about this is that I got to do a talk about backups at
HP two months later (it started as a joke "Hey let's get Chuck to talk
about backups, ha ha.") and received an HP 95LX for my efforts. I
can't imagine life without the 95LX as a System Administrator.
--
Chuck Tomasi | "A munk a clone and a Ferengi
ch...@edsi.plexus.COM | decide to go bowling together..."
spool!cserver!edsi!chuck | -Data "The Outrageous Okana"
------<Enterprise Data Systems Incorporated, Appleton Wisconsin>------

peter da silva

unread,

Nov 5, 1992, 1:03:44 PM11/5/92

to

In article <FISCHER.92...@steiner.iesd.auc.dk> fis...@iesd.auc.dk (Lars Peter Fischer) writes:
> Peter> That way I know all my numbers work, and the saved spreadsheet
> Peter> is documentation on *why* things are that way.

> I hope you print it out

Generally, no. I *do* keep it on a floppy, but we have so many systems around
here that:
1. All the printouts would be more than slightly wasteful.
2. If things are ever so bad I can't get a single system up to
read the floppy I'm hosed anyway.
--
% Peter da Silva % 77487-5012 % +1 713 274 5180 %
true(<<VV$@\\$'&O 9$O%'$LT$&$"V6"$&$<4$?'&$ #I&&?$=$<<@)24 24 scale 3 21 moveto
{dup 36 eq{pop not}{dup 7 and 4 sub exch 56 and 8 div 4 sub 2 index{rlineto}{
rmoveto}ifelse}ifelse}forall stroke pop showpage % Har du kramat din varg idag?

Mitch Davis

unread,

Nov 9, 1992, 7:11:55 PM11/9/92

to

In article <1992Nov04....@edsi.plexus.COM> ch...@edsi.plexus.COM (Chuck Tomasi) writes:

>The good thing about this is that I got to do a talk about backups at
>HP two months later (it started as a joke "Hey let's get Chuck to talk
>about backups, ha ha.") and received an HP 95LX for my efforts. I
>can't imagine life without the 95LX as a System Administrator.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Do you still have to come in for work, or does the 95LX answer the phone
as well?

Mitch.