WANTED: Unix administration horror stories !

Arne Asplem

unread,

Oct 7, 1992, 8:02:46 AM10/7/92

to

I'm the program chair for a one day conference on Unix system
administration in Oslo in 3 weeks, including topics like network
management, system admininistration tools, integration, print/file-servers,
securitym, etc.

I'm looking for actual horror stories of what have gone wrong because
of bad system administration, as an early morning wakeup.

I'll summarise to the net if there is any interest.

-- Arne
--
// Arne Asplem // Office: Gjerdrums vei 12, N-0486 Oslo, Norway //
// Multix A/S // Phone/fax: +47-2-950800 / 950790 //
// // Email: ar...@multix.no //
// NUUG Chairman // Email: ar...@nuug.no //

John Ellithorpe

unread,

Oct 7, 1992, 7:12:07 PM10/7/92

to

Here's a pretty bad story. I wanted to have root use tcsh instead of the
Bourne shell. So I decided to copy tcsh to /usr/local/bin. I created the
file, /etc/shells, and put in /usr/local/bin/tcsh, along with /bin/sh and
/bin/csh.

All seems fine, so I used the chsh command and changed root's shell to
/usr/local/bin/tcsh. So I logged out and tried to log back in. Only to find
out that I couldn't get back in. Every time I tried to log in, I only got
the statement: /usr/local/bin/tcsh: permission denied!

I instantly realized what I had done. I forgot to check that tcsh has
execute privileges and I couldn't get in as root!

After about 30 minutes of getting mad at myself, I finally figured out to just
bring the system down to single-user mode, which ONLY uses the /bin/sh,
thankfully, and edited the password file back to /bin/sh.

I'll never do that again. This wasn't that much of a horror story, but good
enough if you aren't that familiar with the system.

John
--

===============================================================================
John Ellithorpe | Internet: jd...@maggie.mit.edu
Dept. of Physics, Rm 26-349 | Phone : (617) 253-3074 Office
Massachusetts Institute of Technology | (617) 253-3072 Lab
Cambridge, MA 02139 | (617) 236-4910 Home
===============================================================================

Arne Asplem

unread,

Oct 8, 1992, 5:35:50 AM10/8/92

to

>>On 7 Oct 92 12:02:46 GMT, ar...@multix.no (Arne Asplem) said:

>> I'm looking for actual horror stories of what have gone wrong because
>> of bad system administration, as an early morning wakeup.

>> I'll summarise to the net if there is any interest.

There has been a lot of interest in creating a summary of the
system administration "horror stories" - but so far I've only got a few
stories, and not any really scary ones :-)

I guess companies/system administrator are afraid of telling about
there real mistakes, and what we see in the press and magazines in just
the top of the iceberg.

I'll keep all references to companies and persons confidential if you want !

Dave Brillhart

unread,

Oct 8, 1992, 7:54:04 AM10/8/92

to

Arne Asplem (ar...@multix.no) wrote:
: I'm the program chair for a one day conference on Unix system

: administration in Oslo in 3 weeks, including topics like network
: management, system admininistration tools, integration, print/file-servers,
: securitym, etc.
:
: I'm looking for actual horror stories of what have gone wrong because
: of bad system administration, as an early morning wakeup.

We can laugh (almost) about it now, but...

Our operations group, a VMS group but trying to learn UNIX, was assigned
account administration. They were cleaning up a few non-used accounts
like they do on VMS - backup and purge. When they came across the
account "sccs", which had never been accessed, away it went. The
"deleteuser" utility fom DEC asks if you would like to delete all
the files in the account. Seems reasonable, huh?

Well, the home directory for "sccs" is "/". Enough said :-(

--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Dave Brillhart Harris Semiconductor
dbri...@dave.mis.semi.harris.com Mail Stop 62A-024
Voice: (407) 729-5430 P.O. Box 883
Fax: (407) 724-7486 Melbourne, FL 32902-0883
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Tim Smith

unread,

Oct 9, 1992, 6:04:44 AM10/9/92

to

I was working on a line printer spooler, which lived in /etc. I wanted
to remove it, and so issued the command "rm /etc/lpspl." There was only
one problem. Out of habit, I typed "passwd" after "/etc/" and removed
the password file. Oops.

I called up the person who handled backups, and he restored the password
file.

A couple of days later, I did it again! This time, after he restored it,
he made a link, /etc/safe_from_tim.

About a week later, I overwrote /etc/passwd, rather than removing it.

After he restored it again, he installed a daemon that kept a copy of
/etc/passwd, on another file system, and automatically restored it if
it appeared to have been damaged.

Fortunately, I finished my work on /etc/lpspl around this time, so we
didn't have to see if I could find a way to wipe out a couple of
filesystems...

--Tim Smith

Paul Davey

unread,

Oct 9, 1992, 2:11:30 PM10/9/92

to

How about the following (true) story which I used to tell system
administration trainees.

A friend of mine went to upgrade the Unix on a customers machine.

Before he arrived one of the users decided to backup all the
application data to save time.

Before doing this he decided to delete an application specific
directory called ``&saved&'' which contained old and unwanted files.

So: The user typed (as root) ``rm -rf /&saved/&''

(He knew you had to escape the ``&''s.)
(You know what he did wrong I hope.)

When my friend arrived the whole filesystem was (needless to say) empty.

Oh well says he, let's restore the backups...

(Yes they had made backups with cpio regularly.)

The first physical tape of the backup restored OK, but subsequent
volumes would not read.

It turned out that for 2 years the backup script had been giving
messages of the form as each tape was filled.

Write error on archive
Reached end of medium on /dev/tape
If you want to go on, type device/file name when ready.

Cpio had failed to write the volume header on all but the first tape.

Morals:
1) Unix is unforgiving if you make a mistake.
2) Don't ignore any sort of error message
3) Check that your backups read back (at least occasionally)

Happy Ending:

My friend was able to dd in the backup tapes and recreate the
volume header from the actual data with a specially written C program.

--
Regards, IXI Internet: p...@x.co.uk
Paul Davey Vision Park UUCP: p...@ixi.uucp
Cambridge Bang: ...!uunet!ixi!pd
"These are interesting times" CB4 4ZR, UK Tel: +44 223 236 555

Philip Enteles

unread,

Oct 10, 1992, 2:09:19 AM10/10/92

to

As a new system administrator of a Unix machine with limited space I
thought I was doing myself a favor by keeping things neat and clean. One
day as I was 'cleaning up' I removed a file called 'bzero'. Strange
things started to happen like vi didn't work then the compliants started
coming in. Mail didn't work. The compilers didn't work. About this time
the REAL system administrator poked his head in and asked what I had
done. Further examination showed that bzero is the zeroed memory without
which the OS had no operating space so anything using temporary memory
was non-functional. The repair? Well things are tough to do when most of
the utilities don't work. Eventually the REAL system administrator took
the system to single user and rebuilt the system including full
restores from a tape system. The Moral is don't be to anal about things
you don't understand. Take the time learn what those strange files are before
removeing them and screwing yourself.

Philip Enteles Network Administrator
phi...@haas.berkeley.edu University of California, Berkeley
510-642-4436 "Shake your hand,,,,Shake your hand"

Bill Roberts

unread,

Oct 9, 1992, 9:04:12 PM10/9/92

to

My most interesting in the reguard was when I deleted "/dev/null". Of
course it was soon recreated as a "regular file", then permission problems
started to show up.

I was new at the game at the time and couldn't figure out what happened!
It look good to me. I didn't know about "special files" and "mknod" and
major and minor device codes. A friend finally helped out and started
laughing and put me on the right track. That one episode taught me a
lot about my system.
--
| Bill Roberts, Va Beach VA | In the field of observation, chance |
| brob...@waggen.twuug.com | favors the prepared mind. - Pasteur |

Frank T Lofaro

unread,

Oct 10, 1992, 3:20:39 PM10/10/92

to

Well one time I was installing a minimal base system of Linux on a
friends PC, so that we would have all the necessary utlitities to bring
over the rest of the stuff. His 3 1/2 inch disk was dead, so when had to
get the 5 1/4 inch version of the boot/root disk. Too bad that version,
having to fit in 1.2M instead of 1.44, didn't have tar. We could get a
version of tar, but it was in a tar file (nice chicken and egg
scenario). I said, okay, since we don't have tar, we can't use that to
copy the files from floppy to the hard disk, I'll use cp instead (bad
move). It actually seemed to work for a while, then the machine
rebooted! I did it again, the same thing happened. Then I realize cp
wouldn't work on device files! (this is what happens when you try to
install un*x at 3 AM). It just read the contents of the device and made
a file containing such, which is undesireable in any event. (when it
read /dev/port, the device file that references I/O ports, it must've
did something to reboot the machine, that was the file that was causing
the reboots).

I finally got it working by having him get the tar archive of the
linux binaries (including the tar we needed), and untarring it on one of
the public decstations here, so we could ftp tar to his PC using his dos
tcp/ip stuff. A funny aside was that it untarred into ~/bin, and
superseded all his normal commands. We were wondering why everything
wouldn't run. Luckily it wasn't too hard to fix after we realized what
happened.

Mitch Wright

unread,

Oct 11, 1992, 6:02:32 PM10/11/92

to

I guess I should add a story (or maybe not). Anyway, a fellow sysadmin
was looking to free up some much needed disk space. Since it was purely
a production machine I suggested that he go through and "strip" his binaries.
Unfortunately I made the assumption that he knew what strip does and would
use it wisely -- flashes of the Bad News Bears come to mind now.
To make it short, he stripped /vmunix which didn't destroy the system, but
certainly caused some interesting problems.

~mitch

Marc Fraioli

unread,

Oct 11, 1992, 5:01:41 PM10/11/92

to

Well, here's a good one for you:

I was happily churning along developing something on a Sun workstation,
and was getting a number of annoying permission denieds from trying to
write into a directory heirarchy that I didn't own. Getting tired of
that, I decided to set the permissions on that subtree to 777 while I
was working, so I wouldn't have to worry about it. Someone had recently
told me that rather than using plain "su", it was good to use "su -",
but the implications had not yet sunk in. (You can probably see where
this is going already, but I'll go to the bitter end.) Anyway, I cd'd
to where I wanted to be, the top of my subtree, and did su -. Then I
did chmod -R 777. I then started to wonder why it was taking so damn
long when there were only about 45 files in 20 directories under where I
(thought) I was. Well, needless to say, su - simulates a real login,
and had put me into root's home directory, /, so I was proceeding to set
file permissions for the whole system to wide open. I aborted it before
it finished, realizing that something was wrong, but this took quite a
while to straighten out.
--
Marc Fraioli
mfra...@grebyn.com (So I'm a minimalist...)

Richard H. E. Eiger

unread,

Oct 11, 1992, 8:53:10 AM10/11/92

to

In article <1992Oct9.1...@u.washington.edu> t...@stein.u.washington.edu

(Tim Smith) writes:
> I was working on a line printer spooler, which lived in /etc. I wanted
> to remove it, and so issued the command "rm /etc/lpspl." There was only
> one problem. Out of habit, I typed "passwd" after "/etc/" and removed
> the password file. Oops.
>

[deleted to save space]
>
> --Tim Smith

Here's another story. Just imagine having the sendmail.cf file in /etc. Now, I
was working on the sendmail stuff and had come up with lots of sendmail.cf.xxx
which I wanted to get rid of so I typed "rm -f sendmail.cf. *". At first I was
surprised about how much time it took to remove some 10 files or so. Hitting
the interrupt key, when I finally saw what had happened was way to late,
though.

Fortune has it that I'm a very lazy person. That's why I never bothered to just
back up directories with data that changes often. Therefore I managed to
restore /etc successfully before rebooting... :-) Happy end, after all. Of
course I had lost the only well working version of my sendmail.cf...

Richard
--

___ _____2
/ ) / / / Richard H. E. Eiger
/_ _/ /__/ /__ Ing. Informatik HTL
/\ / / / Unregistered NeXT fan
/ \ / / /_____ rhe...@renext.open.ch (NeXT mail welcome)

Jerry Rocteur

unread,

Oct 11, 1992, 6:18:10 PM10/11/92

to

Horror story,

I sent one of my support guys to do an Oracle update in Madrid.

As instructed he created a new user called esf and changed the files
in /u/appl to owner esf, however in doing so he *must* have cocked up
his find command, the command was:

find /u/appl -user appl -exec chown esf {} \;

He rang me up to tell me there was a problem, I logged in via x25 and
about 75% of files on system belonged to owner esf.

VERY little worked on system.

What a mess, it took me a while and I came up with a brain wave to
fix it but it really screwed up the system.

Moral: be *very* careful of find execs, get the syntax right!!!!
--
__^__ __^__
( ___ )------------------------------------------------------------( ___ )
| / | Jerry Rocteur Email: je...@InCC.COM | \ |
|___| Independent Computer Consultants division DELDRA s.p.r.l. |___|
(_____)------------------------------------------------------------(_____)
^ ^

Eiji Hirai

unread,

Oct 11, 1992, 3:27:50 PM10/11/92

to

Some of these stories of pure stupidity rather than of interesting horror
but they did happen.

[ BTW, these happened at a different place at a different time than where I
am now. Don't bother my current employer about it. ]

(1) A consultant we had hired (and not a very good one) was installing Unix
on one our workstations. He was mucking with creating and deleting
/dev/tty* files and made /dev/tty a regular file. Weird things started to
happen. Commands would only print their output if you pressed return twice,
etc. Fortunately, we solved the problem by re-mknod-ing /dev/tty. However,
it took a while to realize what was causing this problem.

(2) I wanted to create a second swap partition on another disk and made the
partition start at sector 0 of the disk! (which sounded ok at the time since
all other regular 'a' partitions started on sector 0) Every time I rebooted,
fsck would complain about missing partition tables - I initially suspected
that the disk was bad but I later realized that swapping was overwriting the
partition table. I had lost an unknown percentage of the financial data for
the institution that I was working for at the time, right when they were
being audited! Yikes! Anyway, we were able to recover the data and life
returned to normal but I did wonder at the time whether I could still keep
my job there.

(3) At the same institution, we were running a system software that had a
serious bug where if anyone had logged out ungracefully, the system wouldn't
let any more users onto the system and users who were logged on couldn't
execute any new commands. (The newest release of the software later on did
fix this bug.) I had to reboot the machine to restore the system to a sane
state. I did a wall <<EOF We need to shutdown blah blah... EOF and then
shutdown. Well, I should've waited since at the precise moment, one of our
users was doing a once-a-year massive conversion of our financial data (talk
about bad luck). I had shutdown in the middle of a very long disk write and
thus, data was lost. We did recover that data and life went on. Moral:
make damn sure that *no one* is doing anything on your system before you
reboot, even if other users are vociferously clamoring for you to reboot.

(4) I heard this from a fellow sysadmin friend. My friend was forced to
work with some sysadmins who didn't have their act together. One day, one
of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".

My friend had to reinstall the entire OS on that machine after his coworker
did this "cleanup". Ahh, the hazards of working with sysadmins who really
shouldn't be sysadmins in the first place.

Moral of all these stories: if I had to hire a Unix sysadmin, the first
thing I'd look for is experience. NOTHING can substitute for down-to-earth,
real-life grungy experience in this field.

--
hi...@cc.swarthmore.edu (Eiji Hirai) : : : : : :: ::: :::: :::::
Unix Geek for Swarthmore College : : : : : :: ::: :::: :::::
Information Services, Swarthmore, PA, US. Copyright 1992 by Eiji Hirai.
I don't speak for Swarthmore College. All Rights Reserved.

Ken Weaverling

unread,

Oct 11, 1992, 7:34:22 PM10/11/92

to

A friend of mine called me up saying he no longer could log into his
system. I asked him what he had done recently, and found out that he
thought that all executable programs in /bin /usr/bin /etc and so on
should be owned by bin, since they were all binaries! So he had
chown'ed them all.

--
Ken Weaverling (Delaware Tech College) we...@dtcc.edu -or- we...@bach.udel.edu

"Is it too late to change the way we're bound to go? Surely one of us must know"
-- Sandy Denny, 1948-1978

Lars Peter Fischer

unread,

Oct 11, 1992, 8:40:45 PM10/11/92

to

>>>>> "Jerry" == Jerry Rocteur (je...@incc.com)

Jerry> Moral: be *very* careful of find execs, get the syntax right!!!!

Yup. Use "find <whatever> -print" and look at what you get. If it
seems reasonable, add " | xargs <whatever" at the end. -exec is slow
anyway.

/Lars
--
Lars Fischer, fis...@iesd.auc.dk | It takes an uncommon mind to think of
CS Dept., Aalborg Univ., DENMARK. | these things. -- Calvin

Randy Jarrett

unread,

Oct 11, 1992, 11:28:26 PM10/11/92

to

Here's one that will show that you shouldn't work on a system
that you don't thourghly understand.

At my "previous" employer I was instructed to install a new
(larger) disk drive in a RS/6000 system. Since a full backup
of the system was done the previous day I just looked at the file
systems vi a df to see which were on the drive that I was replacing.
After this I did a tape backup of these filesystems, ran smit and
did a remove of these filesystems. I then installed the new disk
and brought the system back up. When I ran smit and when I was able
to do the installation of the new drive and setup the file systems
I was figuring that this was going to be an easy one. WRONG!! I was
aware that you could expand filesystems under AIX but was not aware
that it would expand them 'across physical drives'!!! I first
realized that I was in trouble when I went to read in the backup tape
and cpio was not found. I did an ls of the /usr/bin directory and it
said that the file was there but when I tried to run it it was not
found. and of course when I went looking for the original install tape
it was not to be found....

Randy

--
Randy Jarrett WA4MEI
UUCP ...!{emory,gatech}!wa4mei!rsj | MAIL: 54 Patterson Rd.
PHONE +1 404 822 4073 | Lawrenceville, GA 30244

Tim Smith

unread,

Oct 12, 1992, 3:12:42 AM10/12/92

to

>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>work with some sysadmins who didn't have their act together. One day, one
>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
>My friend had to reinstall the entire OS on that machine after his coworker
>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>shouldn't be sysadmins in the first place.

This is why God invented chroot. Have everything one level down, and arrange
for login to chroot everyone. Have one link out to the real root in some
place that only competent syadmins know about. Don't tell the incompetent
ones about this. Furthermore, don't tell them that they are really running
chroot'ed down one level.

Alternatively, if you are on a system that can boot a kernel from a
subdirectory, but vmunix in /usr/vmunix. Then, assuming that something
gets mounted on /usr when you go multiuser, vmunix will be safe (unless
this is a System V Unix from before they fixed the namei bug that let
you, under the right conditions, get to directories that had things
mounted on top of them...).

--Tim Smith

David J Stevenson

unread,

Oct 12, 1992, 4:09:44 AM10/12/92

to

In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>...[some deleted]

>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>work with some sysadmins who didn't have their act together. One day, one
>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".

>My friend had to reinstall the entire OS on that machine after his coworker
>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>shouldn't be sysadmins in the first place.

When this happened to a colleague (when I worked somewhere else) he restored
vmunix by copying from another machine. Unfortunately, a 68000 kernel does
not run very well on a Sparc...

--
+---------------------------------------------------+
| David Stevenson d...@jet.uk Tel: +44 235 465028 |
+---------------------------------------------------+
- Disclaimer: Please note that the above is a personal view and should not
be construed as an official comment from the JET project.

Steven Tepper

unread,

Oct 12, 1992, 12:17:13 AM10/12/92

to

This may not exactly fit the "administration horror story" category, but...

At one place where I worked, someone had set up cron to delete any
file named "core" more than a few days old, since disk space was
always tight and most users wouldn't know what core files were or care
about them. Unfortunately not everyone knew about this and one user
lost a plain text file (a project proposal) he'd spent a one lot of
time working on because he called it "core". This was around 1976,
when Unix was still considered exotic and before bookstores carried
entire sections of Unix books.

-greep

Steve McKinty - Sun ICNC

unread,

Oct 12, 1992, 4:22:29 AM10/12/92

to

In article <W1NR...@cc.swarthmore.edu>, hi...@cc.swarthmore.edu (Eiji Hirai) writes:

> (4) I heard this from a fellow sysadmin friend. My friend was forced to
> work with some sysadmins who didn't have their act together. One day, one
> of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
> "Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
> My friend had to reinstall the entire OS on that machine after his coworker
> did this "cleanup". Ahh, the hazards of working with sysadmins who really
> shouldn't be sysadmins in the first place.

Hmm. A colleague of mine did much the same by accident on one of
our test machines. After discovering it, fortunately while the machine
was still up & running, he FTPed a copy of /vmunix from the other lab
system (both running exactly the same kernel).

After rebooting his machine everything (to his relief) worked fine.

--
Steve McKinty
SUN Microsystems ICNC
38240 Meylan, France
email: smck...@france.sun.com BIX: smckinty

Peter da Silva

unread,

Oct 12, 1992, 7:54:15 AM10/12/92

to

Well, we had one system on which you couldn't log in on the console for a
while after rebooting, but it'd start working sometimes. What was happening
was that the manufacturer had, for some idiot reason, hardcoded the names
of the terminals they wanted to support into getty (this manufacturers own
terminals, that I can understand, but also a handful of common types like
adm3a) so getty could clear the screen properly (I guess hacking that into
gettydefs was too obvious or something). If getty couldn't recognise the
terminal type on the command line, it'd display a message on the console
reading "Unknown terminal type pc100". We ignored this flamage, which was
a pity. Cos that was the problem.

It did this *before* opening the terminal, so if it happened to run between
the time rc completed and the getty on the console started the console got
attached to some random terminal somewhere, so when login attempted to open
/dev/tty to prompt for a password it failed.

Moral: always deal with error messages even when you *know* they're bogus.
Moral: never cry wolf.
--
Peter da Silva. <pe...@sugar.neosoft.com>.
`-_-' "Segodnja volka obnimal?"
'U`
Dette kan umulig vaere mitt rom, eftersom jeg ikke puster ammoniakk.

Anselm Lingnau

unread,

Oct 12, 1992, 5:02:13 AM10/12/92

to

In article <1992Oct10....@waggen.twuug.com>, brob...@waggen.twuug.com
(Bill Roberts) writes:

> My most interesting in the reguard was when I deleted "/dev/null". Of
> course it was soon recreated as a "regular file", then permission problems
> started to show up.

Years ago when I was working in the Graphics Workshop at Edinburgh University,
we used to have a small UNIX machine for testing. The machine wasn't used too
much, so nobody bothered to set up user accounts, and so everybody was running
as root all the time. Now one of the chaps who used to come in was fond of
reading fortunes (/usr/games/fortune having been removed from the University's
real machines along with all the other games). Guess what happened when the
machine said

# fortune
fortune: write error on /dev/null --- please empty the bit bucket

Quite a lot of stuff wouldn't work after the chap was done with the machine
for the day. You bet we put up proper accounts after that!

Anselm
--
Anselm Lingnau .................................. lin...@math.uni-frankfurt.de
[Sendmail] can do just about anything. Its main problem is that it can do just
about anything. --- Chris Lewis, *UNIX Email Software Survey FAQ*

Rick Furniss

unread,

Oct 12, 1992, 10:46:33 AM10/12/92

to

Horror stories:
Did this myself many years ago, and have come close to it since.

Murphy's law #?? , preventive maintenence doesnt.

try this one: /etc/dump /dev/rmt/0m /dev/dsk/0s1
Or: tar cvf /dev/root /dev/rmt0

Backups on unix can be one of the most dangerous commands used,
and they are used to prevent rather than cause a problem. If any Unix
utility were a candidate for a warning message, or error checking, this
would be it.

Just in case you didnt catch the HORROR above, the parameters are backworks
causing a TOTAL wipe out of the root file systems.

More systems have been wiped out by admins, than any hacker could do in
a life time.

***** standard DISKclamer *****
personal views of my person only

CPSMEL/IA
C210 N3877Y
ri...@pmafire.inel.gov
ri...@servprod.inel.gov

--

***** standard DISKclamer *****
personal views of my person only

Rich Payne

unread,

Oct 12, 1992, 10:35:45 AM10/12/92

to

In article <1992Oct12.0...@jet.uk> d...@jet.uk (David J Stevenson) writes:
>In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>>...[some deleted]
>>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>>work with some sysadmins who didn't have their act together. One day, one
>>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>
>>My friend had to reinstall the entire OS on that machine after his coworker
>>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>>shouldn't be sysadmins in the first place.
>When this happened to a colleague (when I worked somewhere else) he restored
>vmunix by copying from another machine. Unfortunately, a 68000 kernel does
>not run very well on a Sparc...

If it was a Sparc and still running, could you not have re-compiled the
kernal and copied ti back to root?

>--
>+---------------------------------------------------+
>| David Stevenson d...@jet.uk Tel: +44 235 465028 |
>+---------------------------------------------------+
>- Disclaimer: Please note that the above is a personal view and should not
> be construed as an official comment from the JET project.

Rich

pay...@netcom.com

Gary Fowler

unread,

Oct 12, 1992, 10:38:09 AM10/12/92

to

Once I was going to make a new file system using mkfs. The device I wanted to
make it on was /dev/c0d1s8. The device name that I used, however, was
/dev/c0d0s8 which held a very important application. I had always been a little
annoyed by the 10 second wait that mkfs has before it actually makes the file
system. I'm sure glad it waited that time though. I probably waited 9.9
seconds before I realized my mistake and hit that DEL key just in time. That
was a near disaster avoided.

Another time I wasn't so lucky. I was a very new SA, and I was trying to clean
some junk out of a system. I was in /usr/bin when I noticed a sub directory
that didn't belong there. A former SA had put it there. I did an ls on it and
determined that it could be zapped. Forgetting that I was still in /usr/bin, I
did an rm *. No 10 second idiot proofing with rm. Now if some one would only
create an OS with a "Do what I mean, not what I say" feature.

Gary "Experience is what allows you to recognize a mistake the second time you
make it." Fowler

Bill Broadley

unread,

Oct 12, 1992, 11:13:11 AM10/12/92

to

On a old decstation 3100 I was deleting last semesters users to try to
dig up some disk space, I also deleted some test users at the same time.

One user took longer then usual, so I hit control-c and tried ls.
"ls: command not found"

Turns out that the test user had / as the home directory and the remove
user script in ultrix just happily blew away the whole disk.

ftp, telnet, rcp, rsh, etc were all gone. Had to go to tapes, and had
one LONG rebuild of X11R5.

Fortunately it wasn't our primary system, and I'm only a student....

--
Bill 1st> Broa...@neurocog.lrdc.pitt.edu
Broa...@schneider3.lrdc.pitt.edu <2nd 3rd> Broa...@pitt.edu
Novell, AFS just say NO!

John Kochmar

unread,

Oct 12, 1992, 11:33:11 AM10/12/92

to

A long time ago, back when the Apollo 460 was around and I had just
graduated from college, I had the good fortune of being one of two
adminstrators in charge of making a cluster of 460's a part of our
environment. One of the things I was tasked with was geting them onto
our network.

Well, I was young, I had the manuals, and a guy from Apollo tech
support was there to help. How hard could it be, right?

Well, we got out the manuals, configured the system (relying heavily on
the defaults), and within 2 hours, we had that puppy on the network.
Life was good.

About 3 hours later, I get a phone call from a systems programmer /
developer from CMU campus (the SEI is a part of CMU, and we are on their
network.) He told me that if I didn't take the &%@*ing Apollo off the
network, he was going to do hurtful things to me physically.
Life was not so good.

As it turned out, in default mode, the Apollo answered every address
request it saw, even if it is not the machine the request was for.
Kind of a "hey, I'm not who you are looking for, but I'm out here in
case you decide you'd rather talk to me." Apollo considered this a
feature, and they took advantage of it in their OS environment.

However, one of the earlier versions of a heavily network dependant OS
developed at CMU considered this a bug. The OS would issue a request,
and expect only the machine it was looking for to answer it. Of
course, it would assume that if it got an answer to its request, it
must be the machine it expected to talk to. It didn't look at the
address of the answer it got, so if it wasn't the correct machine, most
of the time the OS would hang or panic.

The outcome? Over about 3 hours time, more and more of campus was
talking to our little 460, which had just enough muscle to keep up with
the requests. By the time campus figured out what was going on, we had
an Apollo merrily answering the network requests for hundreds of
machines (the ones that were still up, that is.) This caused the part
of campus who used the new OS going to hell in a bucket, one very busy
Apollo 460, and one very warm ethernet.

Well, we turned off the Apollo, configured it not to chat to all of
campus before putting it back on the ethernet (this time, we did it
while talking with campus, making sure we didn't cause the same
problems we did the last time -- we didn't have a packet monitor at the
time), and campus changed their OS to look at the request response
before assuming it was the correct one. I also learned to think very
carefully about default values before using them.

John
Manager, Systems and Tools admin
SEI Computing Facilities

-----------------------------------------------------------------------------
John Kochmar | Estimated amount of glucose used by an adult human
koc...@sei.cmu.edu | brain each day, expressed in M&Ms: 250
SEI Computing Facilities | -Harper's Index, October 1989

David J Dawkins

unread,

Oct 12, 1992, 10:37:23 AM10/12/92

to

we...@bach.udel.edu (Ken Weaverling) writes:

>A friend of mine called me up saying he no longer could log into his
>system. I asked him what he had done recently, and found out that he
>thought that all executable programs in /bin /usr/bin /etc and so on
>should be owned by bin, since they were all binaries! So he had
>chown'ed them all.

Oh you bastards. I was hoping that a thread like this would never
appear, because if it did, I knew I would have to confess. Oh well...

About a year back, I was looking through /etc and found that a few
system files had world write permission. Gasping with horror, I went
to put it right with something like

dipshit# chmod -r 664 /etc/*

(I know, I know, goddamnit!.. now)

Everything was OK for about two to three weeks, then the machine went
down for some reason (other than the obvious). Well, I expect that you
can imagine the result. The booting procedure was unable to run fsck,
so barfed and mounted the file systems read-only, and bunged me into
single-user mode. Dumb expression..gradual realisation..cold sweat. Of
course, now I can't do a frigging chmod +x on anything because it's all
read-only. In fact I can't run anything that isn't part of sh.
Wedgerama. Hysteria time. Consider reformatting disks. All sorts of
crap ideas. Headless chicken scene. Confession.

"You did WHAT??!!"

Much forehead slapping, solemn oaths and floor pacing.

Luckily, we have a local MegaUnixGenius who, having sat puzzled for an hour
or more, decided to boot from a cdrom and take things from there. He fixed
it.

My boss, totally amazed at the fix I'd got the system into, luckily
saw the funny side of it. I didn't. Even though at that stage, I didn't
know much about unix/suns/booting/admin, I did actually know enough to NOT
use a command like the one above. Don't ask. Must be the drugs.

BTW, if my future employer _is_ reading this (like they say he/she might),
then I have certainly learned tonnes of stuff in the last year, especially
having had to set up a complete Sun system, fix local problems, etc :-)

Anyone else got a tale of SGS (Spontaneous Gross Stupidity) ?

-dave "I'm much better now, honest.. no, really.. hey what's this button
doooooooooOOOOOO..."

--
d...@csg.cs.rdg.ac.uk
d...@csug.cs.rdg.ac.uk
dav...@integ.uucp

Alan Saunders

unread,

Oct 12, 1992, 2:28:27 AM10/12/92

to

About inexperienced sysadmins .. One such had been on a Sun syasadmin
course, and learned all about security. One of the topics was on file
and group access. On his return, he decided to put what he had learned
into practice, and changed the ownership of all files in /bin, /usr/bin
to bin.bin! I was called in when no one could log in to the system
(of course /bin/login needs to be setuid root!)

Regards .. Alan
--

* Meeeow ! Call Spuddy on (0203) 638780/638693 for FREE mail & Usenet access *

Mike Kelley

unread,

Oct 12, 1992, 12:41:50 PM10/12/92

to

Sometimes you just can't win . . .

We have a cluster of HP workstations and, once upon a time, were using
1/4-tape as the backup medium. This was very slow and cumbersome, as
we were forever increasing the amount of disk space on our system, and
we decided to purchase HP's optical jukebox to use both as large
removable media and as the primary backup device.

We had been experiencing occasional problems with the 1/4-inch tape
backups, but HP's hardware service engineer convinced us that the
problems were resolved. A complete backup was performed prior to
installation (by the HP engineer) of the jukebox. Two unfortunate
things happened. First, the problems on our backup tapes were due to
intermittent hardware problems on the tape drive which were not
discovered by the extensive diagnostics performed on the tape drive.
Second, the engineer installed the jukebox with the same hardware SCSI
address as our root file system.

As you may have anticipated, the attempt to mediainit the first
optical cartridge resulted in a rather ungraceful failure of the root
file system. This was compounded by the fact that much of the data on
the backup tapes was not recoverable.

--
Mike Kelley
Bldg 220, Rm B206
National Institute of Standards and Technology
Gaithersburg, MD 20899
(301) 975-3722 FAX (301) 926-2746
INTERNET: kel...@epg.nist.gov
BITNET: mke...@nbsenh.BITNET

Eric Wedaa

unread,

Oct 12, 1992, 12:43:34 PM10/12/92

to

The moral(s) of the story here:
-NEVER use 'rm <any pattern>', use rm -i <any pattern>' instead.
-Do backups more often than you go to church.
-Read the backup media at least as often as you go to church.
-Set up your prompt to do a `pwd` everytime you cd.
-Always do a `cd .` before doing anything.
-DOCUMENT all your changes to the system (We use a text file
called /Changes)
-Don't nuke stuff you are not sure about.
-Do major changes to the system on Saturday morning so you will
have all weekend to fix it.
-Have a shadow watching you when you do anything major.
-Don't do systems work on a Friday afternoon. (or any other time
when you are tired and not paying attention.)

>>>Ericw
(Paranoia is a "Good Thing" when you can really muck things up!)
--
Eric Wedaa - eric....@amd.com | Two more kinds of lies...
{ames apple uunet}!amd!ericw | Release Dates, and Benchmarks
Advanced Micro Devices, M/S 167 PO Box 3453 Sunnyvale, CA 94088-3453
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Rob J. Nauta

unread,

Oct 12, 1992, 11:27:52 AM10/12/92

to

mfra...@grebyn.com (Marc Fraioli) writes:

>Well, here's a good one for you:

> I was happily churning along developing something on a Sun workstation,
>and was getting a number of annoying permission denieds from trying to
>write into a directory heirarchy that I didn't own. Getting tired of
>that, I decided to set the permissions on that subtree to 777 while I
>was working, so I wouldn't have to worry about it.

At my previous employer, the sysadmin would create new user accounts by
hand by editing the passwd file, create a home dir, put some files in
it, and chown '*' and '.*' to that new user. Thus, /home/machine
was also chowned ('.*' also matches '..'). It was quite handy to see
who was added last, but after a while i slipped him the hint to
chown '.[a-z]*' which works much better of course.

But the stories told now are more folklore than real horror. Having read
2 Stephen Kings this weekend I beg everyone to tell more interesting
stories, about demons, the system clock running backwards, old files
reappearing etc !

Mike Matthews

unread,

Oct 12, 1992, 2:00:25 PM10/12/92

to

When I had first gotten my NeXTstation, it had the lil' 105M hard drive in
it. I had a 330M external, but alas, no cable for it. (Life was not fun
when I was essentially netbooting off a "test" machine.... ".. um, guys, did
you just reboot is-next?")

Finally got the cable, just in time for the winter holiday (read: no
network). Brought the machine home, and I figured I'd just copy the
configuration files over from the internal to the external (as a nice gesture
to my users so they wouldn't have to change their passwords and everything).

The external was a brand new BuildDisk'd disk (had stock NeXTstep on it).
NeXT keeps the private information of each machine (/dev, /etc, stuff like
that) in a /private directory to make netbooting easier.

Hey, I'll just move /private from the 105M to /private on the external. So I
deleted the external's /private and tried to move it via the workspace.

/dev is in /private.

/dev contains device files. Can't move them.

BUT. The workspace happily deleted all the files it DID copy, so the
internal couldn't boot (no /etc) and the external couldn't boot (no /dev).
This is before the advent of boot floppies so I was stuck for about a week at
home with $5000 of NeXT computer that I couldn't boot.

The moral? *NEVER* move something important. Copy, VERIFY, and THEN delete.
------
Mike Matthews, matt...@oberon.umd.edu (NeXTmail accepted)
------
There has been an alarming increase in the number of things you know
nothing about.

John F Carr

unread,

Oct 12, 1992, 1:19:39 PM10/12/92

to

In article <djd.718900643@reading> d...@csug.cs.rdg.ac.uk writes:

>dipshit# chmod -r 664 /etc/*

>Everything was OK for about two to three weeks, then the machine went

>down for some reason (other than the obvious).

>The booting procedure was unable to run fsck,

Your OS must run /bin/init instead of /etc/init. If init isn't
executable the system will hang or panic and reboot forever.

--
John Carr (j...@athena.mit.edu)

Contractor Bob Johnson

unread,

Oct 12, 1992, 9:51:03 AM10/12/92

to

>Arne Asplem (ar...@multix.no) wrote:
> I'm the program chair for a one day conference on Unix system
> administration in Oslo in 3 weeks, including topics like network
> management, system admininistration tools, integration, print/file-servers,
> securitym, etc.
>
> I'm looking for actual horror stories of what have gone wrong because
> of bad system administration, as an early morning wakeup.

Management told us to email a security notice to every user on the our
system (at that time, around 3000 users). A certain novice administrator
on our system wanted to do it, so I instructed them to extract a list of
users from /etc/passwd, write a simple shell loop to do the job, and
throw it in the background. Here's what they wrote (bourne shell)...

for USER in `cat user.list`; do
mail $USER <message.text &
done

Have you ever seen a load average of over 300 ???

Bob Johnson, Systems Administrator
Tinker AFB, Oklahoma

Contractor Bob Johnson

unread,

Oct 12, 1992, 10:51:34 AM10/12/92

to

Another horror story (mine this time)...

Cleaning out an old directory, I did 'rm *', then noticed several files
that began with dot (.profile, etc) still there. So, in a fit of obtuse
brilliance, I typed...

rm -rf .* &

By the time I got it stopped, it had chewed through 3 filesystems which
all had to be restored from tape (.* expands to ../*, and the -r makes
it keep walking up the directory tree). Live and learn...

And another...

After changing my /etc/inittab file, I was going to kick init by sending
it a HUP signal to tell it the file had changed. Unfortunately, I missed
and the 1 became a Q... kill -q 1. Large systems die in interesting ways
when you lose init!

But the best (IMHO)...

We had an operator lay a book on the console keyboard, throwing the console
into system monitor mode. This stops the system clock, which locks every
session dead in it's tracks. At that time we had over 100 user sessions
running. Most of our inbound lines are essentially modem lines on a very
large "rotor". After their session hung for a minute or so, many users
disconnected and called back. They got connected, but received no login
prompt (the system was in a sort of suspended animation). Little did they
know that they were now on a different port than the one they just abandoned.

A call to the computer room soon identified the problem, and the operator was
given the commands to resume normal system operation. As near as we can
figure, somewhere around half of the users had disconnected but the system
didn't notice because it never saw carrier drop on those ports (being dead).
New, different users had now connected to those ports. We received several
semi-confused user calls, realized what had happened and invoked the magic
"/etc/shutdown NOW" command. The procedure (should this ever happen again)
will be to manually panic the system and reboot. I also surgically removed
the keycap from that particular key on our terminal - you have to work to
press it now!

Attilio Dinicola

unread,

Oct 12, 1992, 1:33:16 PM10/12/92

to

Once upon a time...
I was mor'ing somethin at the system console, ultrix os under me!

I wanted to press a ^L and, unfortunately, the nearest ^P suspended

system activities: a console mode prompt appeared.

So, I pressed:
res
Thinking .. resume .. but res became restart and the system
rebooted destroying all processes.

Naturally, Murphy was in front of me and some batch jobs were
running since four or five days before. WERE .. RUNNING!

#############################################################

Just use abbreviated commands!
--
Attilio Dinicola, C.d.C. Fac. di Ingegn., loc. Mesiano, 77 38050 Trento, Italia
tel. 0461+881975 / 881919 fax. 0461+881999
Internet addr. dini...@itnux2.cineca.it (dini...@130.186.12.2)
Decnet addr. ITNING::DINICOLA (37.72::DINICOLA)

Steven Tepper

unread,

Oct 12, 1992, 1:28:31 PM10/12/92

to

> But the stories told now are more folklore than real horror. Having read
> 2 Stephen Kings this weekend I beg everyone to tell more interesting
> stories, about demons, the system clock running backwards, old files
> reappearing etc !

I once had problems with files that mysteriously refused to stayed
changed for very long. It was a PDP-11 Unix system that had crashed,
and I brought it up single-user. I would change some file and it
would stay changed for a minute or so but then revert to its earlier
state (contents, protection mode, etc). What happened was that the
write-protect switch on the disk drive had gotten bumped into the "on"
position but the device driver failed to report any write errors. As
long as the data stayed in kernel buffers the changes "took", but they
would disappear once the buffers were reused and the system had to
reread the disk.

-greep

B. Samuel Blanchard

unread,

Oct 12, 1992, 3:33:14 PM10/12/92

to

#1 I never actually verified it but I think I deleted some of my
bosses files as a very novice sysadmin. He found some things missing
after I had a minor tangle with rm. When he ask I said I had run into
a problem and he smiled and let it go. Sorry Raul!

#2 I had a boss continue to reboot a dying system in an attempt to print
out material for his conference presentation. He was not interested
in waiting until I worked on the system; if he couldn't get it working,
he assumed I couldn't.
I quit :-(
Then he quit :-)
Then I spent a weeking fixing the system. :-0 <--words edited
Some thing have improved there they tell me.
Disclaimer: This is purely my interpretation and not intended to offend.
It was my pre-assumption that you didn't read this group.

#3 Recently had someone recover an old full backup over a running system.
A manager 2 levels up noticed that our automatic backup, written by his
staff, was failing far too often. Even worse, it did not always report
errors. Since I was gone, he felt free to assign a manual backup to
another group. The guy doing the "backup" called a member of his group
at 8pm, that person finally called me at some un-goddly hour in the
morning (I was glad he called!).

The best part was the end result. We now do backups in our group.
Don't you love how progress slaps you awake some times.
--
B. Sam Blanchard s...@bsu-cs.bsu.edu
418 Winfield Dr (317) 284-7131 work
Greenfield, IN 46140

Eiji Hirai

unread,

Oct 12, 1992, 4:35:50 PM10/12/92

to

er...@hobbes.amd.com (Eric Wedaa) writes:
> -NEVER use 'rm <any pattern>', use rm -i <any pattern>' instead.

4:33pm,gingko,[~]% echo $version
tcsh 6.02.00 (Cornell) 92/05/15 options 8b,nls,dl,al,dir
4:34pm,gingko,[~]% set rmstar
4:34pm,gingko,[~]% rm *
Do you really want to delete all files? [n/y] n
4:34pm,gingko,[~]%

> -Set up your prompt to do a `pwd` everytime you cd.

4:34pm,gingko,[~]% echo "$prompt"
%@,%m,[%~]%
4:34pm,gingko,[~]%

--
hi...@cc.swarthmore.edu (Eiji Hirai) : : : : : :: ::: :::: :::::
Unix Geek for Swarthmore College : : : : : :: ::: :::: :::::
Information Services, Swarthmore, PA, US. Copyright 1992 by Eiji Hirai.
I don't speak for Swarthmore College. All Rights Reserved.

Casper H.S. Dik

unread,

Oct 12, 1992, 3:41:10 PM10/12/92

to

al...@spuddy.uucp (Alan Saunders) writes:

>About inexperienced sysadmins .. One such had been on a Sun syasadmin
>course, and learned all about security. One of the topics was on file
>and group access. On his return, he decided to put what he had learned
>into practice, and changed the ownership of all files in /bin, /usr/bin
>to bin.bin! I was called in when no one could log in to the system
>(of course /bin/login needs to be setuid root!)

That's not true.
% ls -l /bin/login
-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login

This on SunOS 4.1.x with a hacked login.
Root ownership of most files is preferred, BTW.
All commands executed by root should be owned by
root and reside in directories owned by root. Otherwise,
a non-root user can get to root far too easily.
This is especially true in NFS environments.

Casper

PS: In case you're wondering why login need not be set-uid root:
getty/rlogind/telnetd run as root, those execute login.
You cannot remove the set-uid bit from a standard login, as
it will call setreuid(2) and exec the user's shell without checks.
This can be confusing if a user types login in his/her shell.

B. Samuel Blanchard

unread,

Oct 12, 1992, 4:10:18 PM10/12/92

to

Oh yea, I recalled 2 more

kill -1 1 on an Altos SV box is not good. I pulled this one trying to show
off. No more gettys appeared when uses logged off. When I went to the console,
I calmly typed 0 to the Run Level request prompt. 2 would have been nice?
It was my first SystemV like box, and it seemed to have such nice berkley
commands.

a control-s on a Sequent S27 console can cause processes to hang waiting to
write to the console. Unfortunatly, su is one such process. No real problem
since I don't blindly reboot on request ;-)

Pete Bentley

unread,

Oct 12, 1992, 3:42:33 PM10/12/92

to

David J Dawkins (d...@csg.cs.reading.ac.uk) wrote:
: About a year back, I was looking through /etc and found that a few

: system files had world write permission. Gasping with horror, I went
: to put it right with something like
:
: dipshit# chmod -r 664 /etc/*

:
A similar thing happened at a place a used to work 3 or 4 years back.
The guys next door had just got a Sun 3/360 (or some such) to host a
VME-bus image processing system - none of them knew much (or cared
much) about Un*x and so early on a student on loan to them got a
space in the wrong place and did
pillock# chmod -r -x ~ /*
with the same results (system in single user, refusing to run any commands
or go multi-user).

As it happened
a) This was a government establishment, and so the order for the QIC tapes
for backups had not yet been approved, hence no backups...
b) The install script for the kernel drivers for the image processing stuff
had not worked 'out of the box', and so the company had sent an
engineer down to install it. I hadn't been around when he came and
built their drivers, and they hadn't a clue what he had done. So,
there was no way to rebuild the drivers without another engineer call
and because of (a) there were no backups of the driver...Anyway, a complete
reload was therefore out of the question.

These were the days before SunOS on CD-ROM. In the end I managed to get
the thing up by booting from tape, installing the miniroot into the swap
partition and booting from that. This gave me a working tar and a
working mount, but no chmod. Also no mt command. Also at this time
very little of my Un*x experience was on Suns, so I had no idea of
the layout of the distribution tape. Various experiments
with dd and the non-rewinding tape device eventually found the file on
the tape with a chmod I could extract. chmod +x /etc/* /bin/* /usr/bin/*
on the system's existing disk was enough to make it bootable. After that
I sat the student down with a SunOS manual and let him figure out the
mess and correct the permissions that had been todged all over the system...

Pete.

Jerry Rocteur

unread,

Oct 12, 1992, 4:58:53 PM10/12/92

to

In article <FISCHER.92...@steiner.iesd.auc.dk>, fis...@iesd.auc.dk (Lars Peter Fischer) writes:
>
> >>>>> "Jerry" == Jerry Rocteur (je...@incc.com)
>
> Jerry> Moral: be *very* careful of find execs, get the syntax right!!!!
>
> Yup. Use "find <whatever> -print" and look at what you get. If it
> seems reasonable, add " | xargs <whatever" at the end. -exec is slow
> anyway.

Be careful here:

- xargs is not on all systems
- xargs places file names one after each other <whatever file1 file...
which means you can have trouble with large file lists as all files
will not be acted upon
- the big advantage with xargs of course is that it runs the command
once, unlike exec which is one file at a time.

I have had so many problems with xargs that I only use it when my
list of file is modest!

--
__^__ __^__
( ___ )------------------------------------------------------------( ___ )
| / | Jerry Rocteur Email: je...@InCC.COM | \ |
| / | Independent Computer Consultants | \ |
| / | Phone +32.2.235.7045 +32.81.65.53.58 fax +32.81.65.70.20 | \ |
| / |--------------------------------------------------------------| \ |
| / | This week's quote: | \ |
|___| I would if I could but I can't so I won't. |___|
(_____)------------------------------------------------------------(_____)
^ ^

Bob Arnold

unread,

Oct 12, 1992, 8:34:48 PM10/12/92

to

In article <1992Oct12.2...@pony.Ingres.COM> I wrote:
>I was brave and bold, not to mention boneheaded, and formatted the user disk.
>
> [ rest of story deleted ... Bob ]
>
>Morals:
> 1) The "man" pages don't tell you everything you need to know.
> 2) Don't do backups to floppies.
> 3) Test your backups to make sure they are readable.
> 4) Handle the format program (and anything else that writes directly
> to disk devices) like nitroglycerine.
> 5) Strenuously avoid systems with inadequate backup and restore
> programs wherever possible (thank goodness for "restore" with
> an "e"!).
> 6) If you've never done sysadmin work before, take a formal
> training class.

Just thought of a few more related morals (managers pay attention now):

7) You get what you pay for.
8) There's no substutite for experience.
9) It's a lot less painful to learn from someone else's experience
than your own (that's what this thread is about, I guess :-) )

Part of the story I should tell here. My employer had been looking for
a way to cut costs. I was 15% cheaper than their previous sysadmin so
they let him go and hired me. It wasn't as nasty as it sounds, since
they kept him on as a consultant at 4 hours a week and he ended up with
a better job too (so did I). Everyone benefited in the end. I leaned
heavily on his consulting, which was great. He was older and wiser, and
probably had his own horror stories to tell. After this one, so did I!

Bob
--
__ _ _ Bob Arnold Ingres, An ASK Corporation
|/ \ / \ / \| 1080 Marina Village Parkway
| / / | Alameda, CA, 94501
| \__/ \__/| r...@ingres.com 510/748-2819

Bob Arnold

unread,

Oct 12, 1992, 7:35:24 PM10/12/92

to

Many moons ago, in my first sysadmin job, learning via "on-the-job
training", I was in charge of a UNIX box who's user disk developed a
bad block. (Maybe you can see it already ...)

The "format" man page seemed to indicate that it could repair bad
blocks. (Can you see it now?) I read the man page very carefully.
Nowhere did it indicate any kind of destructive behavior.

I was brave and bold, not to mention boneheaded, and formatted the user disk.

Heh.

The good news:
1) The bad block was gone.
2) I was about to learn a lot real fast :-)
The bad news:
1) The user data was gone too.
2) The users weren't happy, to say the least.

Having recently made a full backup of the disk, I knew I was in for a
miserable all day restore. Why all day? It took 8 hours to dump
that disk to 40 floppies. And I had incrementals (levels 1, 2, 3, 4,
and 5, which were another sign of my novice state) to layer on top
of the full.

Only it got worse. The floppy drive had intermittent problems reading
some of the floppies. So I had to go back and retry to get the files
which were missed on the first attempt.

This was also a port of Version 7 UNIX (like I said, this was many
moons ago). It had a program called "restor", primordial ancestor of
BSD's "restore". If you used the "x" option to extract selected files
(the ones missed on earlier attempts), "restor" would use the *inode
number* as the name of the extracted files. You had to move the
extracted files to their correct locations yourself (the man page said
to write a shellscript to do this :-(). I didn't know much about shell
scripts at the time, but I learned a lot more that week.

Yes, it took me a full week, including the weekend, maybe 120 hours or
more, to get what I could (probably 95% of the data) off the backups.
And there were a few ownership and permissions problems to be cleaned up
after that.

Once burned twice shy. This is the only truly catastrophic mistake I've
ever made as a sysadmin, I'm glad to be able to say.

I kept a copy of my memo to the users after I had done what I could.
Reading it over now is sobering indeed! I also kept my extensive notes
on the restore process - thank goodness I've never had to use them since.

Morals:
1) The "man" pages don't tell you everything you need to know.
2) Don't do backups to floppies.
3) Test your backups to make sure they are readable.
4) Handle the format program (and anything else that writes directly
to disk devices) like nitroglycerine.
5) Strenuously avoid systems with inadequate backup and restore
programs wherever possible (thank goodness for "restore" with
an "e"!).
6) If you've never done sysadmin work before, take a formal
training class.

Well, I haven't thought about that one in a while! I can laugh about
it now ....

Dave Butterfield

unread,

Oct 12, 1992, 6:57:02 PM10/12/92

to

cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>>(of course /bin/login needs to be setuid root!)
>
>That's not true.

Whether it's true or not depends on which version of Unix you're
running.
--
Vote for *anybody* but Quayle!

Obi Thomas

unread,

Oct 12, 1992, 10:24:28 PM10/12/92

to

This isn't nearly as bad as some of the stories in this thread, but...

I once mistakenly partitioned my Sun's boot disk so that the swap
partition overlapped the usr partition. The machine ran fine for a long
time (many months), presumably because the swap space was always nearly
empty. Then, one day there was a memory parity error and the system crash
dumped at the *end* of the swap partition. What should have been a simple
reboot after the crash dump turned into a long and painful re-install of
the entire system (Suns cannot boot without a /usr partition).

Now when I partition a disk I sit there with a calculator and make sure
all the numbers add up correctly (offsets, number of cylinders, number of
blocks, and so on).

Russell Street

unread,

Oct 12, 1992, 9:42:45 PM10/12/92

to

r...@Ingres.COM (Bob Arnold) writes:
> 9) It's a lot less painful to learn from someone else's experience
> than your own (that's what this thread is about, I guess :-) )

With out trying to wander off the thread tooooo much ... In my
experience the best experiences to learn off are your own :)
I wonder how many stories we have got so far about "I will never
type rm -r /" as root. (And no I have not done that _yet_, but
the day will come :()

I guess it is like the line from the science labs "I have all my
data. I just have to analyse it. I will do that at home" :)

--------------------------------------------------------------------
Russell (sync; sync; sync; halt) Street (russ...@ccu1.aukuni.ac.nz)

James Cummings

unread,

Oct 12, 1992, 8:57:08 PM10/12/92

to

In article <1992Oct9.1...@u.washington.edu> t...@stein.u.washington.edu (Tim Smith) writes:
|I was working on a line printer spooler, which lived in /etc. I wanted
|to remove it, and so issued the command "rm /etc/lpspl." There was only
|one problem. Out of habit, I typed "passwd" after "/etc/" and removed
|the password file. Oops.
|
|I called up the person who handled backups, and he restored the password
|file.
|
|A couple of days later, I did it again! This time, after he restored it,
|he made a link, /etc/safe_from_tim.
|
|About a week later, I overwrote /etc/passwd, rather than removing it.
|
|After he restored it again, he installed a daemon that kept a copy of
|/etc/passwd, on another file system, and automatically restored it if
|it appeared to have been damaged.

Hmmm.....you were either a very good friend OR he was an unusually
easy going sysadmin. I think I would have fixed the problem by DELETING
YOUR password entry and changing root password....probably the later on
the FIRST occurance.

All this just to remove an old spooler directory???

Jeff DelPapa

unread,

Oct 13, 1992, 2:17:17 AM10/13/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:
>This isn't nearly as bad as some of the stories in this thread, but...
>
>I once mistakenly partitioned my Sun's boot disk so that the swap
>partition overlapped the usr partition. The machine ran fine for a long
>time (many months), presumably because the swap space was always nearly
>empty.

I remember a similar thing once - on a symbolics machine, a customer
declared a file in the FEP filesystem as a paging file, and as part of
the file system (it was one way to solve their disk space crunch) It
was caught before damage was done - we weren't sure if it was because
they hadn't done anything real yet, or simply the machine knew not to
mess with the IRS (the customer).

<dp>

Rick Furniss

unread,

Oct 13, 1992, 3:12:30 AM10/13/92

to

In article <ericw.718908214@hobbes> er...@hobbes.amd.com (Eric Wedaa) writes:
>
>The moral(s) of the story here:
> -NEVER use 'rm <any pattern>', use rm -i <any pattern>' instead.
> -Do backups more often than you go to church.
> -Read the backup media at least as often as you go to church.
> -Set up your prompt to do a `pwd` everytime you cd.
> -Always do a `cd .` before doing anything.
> -DOCUMENT all your changes to the system (We use a text file
> called /Changes)
> -Don't nuke stuff you are not sure about.
> -Do major changes to the system on Saturday morning so you will
> have all weekend to fix it.
> -Have a shadow watching you when you do anything major.
> -Don't do systems work on a Friday afternoon. (or any other time
> when you are tired and not paying attention.)
>

Current backups, usualy means backups of current BAD data.
Any problem that takes a backups cycle time to locate, is there to stay.
Better have some old stuff around.

Backups can be as dangerous as no backups ! CATCH 22.

***** standard DISKclamer *****
personal views of my person only

CPSMEL/IA
C210 N3877Y
ri...@pmafire.inel.gov
ri...@servprod.inel.gov

--

***** standard DISKclamer *****
personal views of my person only

David J Stevenson

unread,

Oct 13, 1992, 4:14:10 AM10/13/92

to

In <1992Oct12....@netcom.com> pay...@netcom.com (Rich Payne) writes:

>In article <1992Oct12.0...@jet.uk> d...@jet.uk (David J Stevenson) writes:
>>In <W1NR...@cc.swarthmore.edu> hi...@cc.swarthmore.edu (Eiji Hirai) writes:
>>>...[some deleted]
>>>(4) I heard this from a fellow sysadmin friend. My friend was forced to
>>>work with some sysadmins who didn't have their act together. One day, one
>>>of them was "cleaning" the filesytem and saw a file called "vmunix" in /.
>>>"Hmm, this is taking up a lot of space - let's delete it". "rm /vmunix".
>>
>>>My friend had to reinstall the entire OS on that machine after his coworker
>>>did this "cleanup". Ahh, the hazards of working with sysadmins who really
>>>shouldn't be sysadmins in the first place.
>>When this happened to a colleague (when I worked somewhere else) he restored
>>vmunix by copying from another machine. Unfortunately, a 68000 kernel does
>>not run very well on a Sparc...
>
>If it was a Sparc and still running, could you not have re-compiled the
>kernal and copied ti back to root?
>
But I think that someone who deleted vmunix, then copied from an incompatible
machine, didn't know what the file was. Therefore, he wasn't expected to know
how to rebuild it! [I was working on IBM OS/2 at the time, so I didn't have
such problems, it (then) only ran on PS/2 machines].
--
+---------------------------------------------------+
| David Stevenson d...@jet.uk Tel: +44 235 465028 |
+---------------------------------------------------+
- Disclaimer: Please note that the above is a personal view and should not
be construed as an official comment from the JET project.

J.Rowe

unread,

Oct 13, 1992, 7:00:34 AM10/13/92

to

In article <1992Oct12....@javelin.sim.es.com> gfo...@javelin.sim.es.com (Gary Fowler) writes:

> Another time I wasn't so lucky. I was a very new SA, and I was trying
> to clean some junk out of a system. I was in /usr/bin when I noticed
> a sub directory that didn't belong there. A former SA had put it
> there. I did an ls on it and determined that it could be zapped.
> Forgetting that I was still in /usr/bin, I did an rm *. No 10 second
> idiot proofing with rm. Now if some one would only create an OS with
> a "Do what I mean, not what I say" feature.

That's why I *always*always*always* have the directory in my prompt.
And set 'rmstar' in tcsh to avoid those unwanted 'rm foo *' problems.
(If the variable rmstar is set tcsh always asks before doing an rm *).

BTW Gary, please format your lines to less than 80 characters wide :-)

John

Simon Leinen

unread,

Oct 13, 1992, 9:13:39 AM10/13/92

to

In article <168....@incc.com> je...@incc.com (Jerry Rocteur) writes:

- xargs is not on all systems

Yeah, well, neither is find I assume. And there's always GNU xargs...

- xargs places file names one after each other <whatever file1
file... which means you can have trouble with large file lists
as all files will not be acted upon

Xargs should split the list into reasonable chunks and execute the
command multiple times if necessary.

- the big advantage with xargs of course is that it runs the
command once, unlike exec which is one file at a time.

(see above).

A problem with find ... -print | xargs ... is that it has trouble with
filenames containing whitespace. GNU find/xargs have options to use
ASCII NUL as the separator, which is much safer (since it is difficult
to create files with NULs in their names).
--
Simon.

John Stoffel

unread,

Oct 13, 1992, 5:21:52 AM10/13/92

to

>>>>> On 12 Oct 92 20:58:53 GMT, je...@incc.com (Jerry Rocteur) said:

Jerry> In article <FISCHER.92...@steiner.iesd.auc.dk>, fis...@iesd.auc.dk (Lars Peter Fischer) writes:
>
> >>>>> "Jerry" == Jerry Rocteur (je...@incc.com)
>
> Jerry> Moral: be *very* careful of find execs, get the syntax right!!!!
>
> Yup. Use "find <whatever> -print" and look at what you get. If it
> seems reasonable, add " | xargs <whatever" at the end. -exec is slow
> anyway.

Jerry>
Jerry> Be careful here:

Jerry> - xargs is not on all systems
Jerry> - xargs places file names one after each other <whatever file1 file...
Jerry> which means you can have trouble with large file lists as all files
Jerry> will not be acted upon
Jerry> - the big advantage with xargs of course is that it runs the command
Jerry> once, unlike exec which is one file at a time.

Try and use perl. This does a really nice job. Of course you still
aren't protected from yourself... :)

find <whatever> -print | perl -ne 'print; chop; unlink;'

And this doesn't have the potential problem of running out of space
for all the file names like xargs, and it doesn't have to fork a new
exec for every command like find would do.

--
Youth of today! Join me in a mass rally for traditional mental attitudes!
-------------------------------------------------------------------------------
jo...@wpi.wpi.edu | Work Station Specialist | Worcester Polytechnic Institute
John Stoffel | 508-831-5512 (work) | Worcester, MA 01609

Rik Harris

unread,

Oct 13, 1992, 7:55:15 AM10/13/92

to

Sometimes it takes a few tries to get it through the tired brain...

Most of our disks reside on a single, high-powered server. We decided
this probably wasn't too good an idea, and put a new disk on one of
the workstations (particularly since the w/s has a faster transfer
rate than the server does!). It's still really useful to be able to
use all disks from the one machine, so I mounted the w/s disk on the
server. I said to myself (being a Friday afternoon...see previous
post) "it's only temporary.../mnt is already being used...I'll mount
it in /tmp". So, I mounted on /tmp/a (or something). This was fine
for a few hours, but then the auto-cleanup script kicked in, and blew
away half of my source (the stuff over 2 weeks old). I didn't notice
this for a few days, though. After I figured out what had happened,
and restored the files (we _do_ have a good backup strategy),
everything was OK.

Until a few months later. We were trying to convince a sysadmin from
another site that he shouldn't NFS export his disks rw,root to everyone,
so I mounted the disk to put a few suid root programs in his home
directory to convince him. Well, it's only a temporary mount, so....

You guessed it, another Friday afternoon. I did a umount /tmp/b, and
forgot about it. I noticed this one about halfway through the next
day. (NFS over a couple of 64k links is pretty slow). The disk had
not unmounted because it was busy...busy with two find scripts, happily
checking for suid programs, and deleting anything over a week old. A
df on the filesystem later showed about 12% full :-( Sorry Craig.

Now, I create /mnt1, /mnt2, /mnt3.... :-)

Remember....Friday afternoons are BAD news.

rik.
--
Rik Harris - rik.h...@fcit.monash.edu.au
+61 3 571-2895 (AH & ans.mach) +61 3 573-2679 (BH)
Faculty of Computing and Information Technology,
Caulfield Campus, Monash University, Australia
--
Rik Harris - rik.h...@fcit.monash.edu.au
+61 3 571-2895 (AH & ans.mach) +61 3 573-2679 (BH)
Faculty of Computing and Information Technology,
Caulfield Campus, Monash University, Australia

Wm. L. Ranck

unread,

Oct 13, 1992, 10:24:46 AM10/13/92

to

Hello folks,
Well, after reading some of the stories in this thread I guess I can
tell mine. I got an RS/6000 mod. 220 for my office about 6 months ago.
The OS was preloaded so I had little chance to learn that process. Being
used to a full-screen editor I was not happy with vi so I read in the manual
that INED (IBM's editor for AIX) was full-screen and I logged in as root and
installed it. I immediately started to play with the new editor and somehow
found a series of keys that told the editor to delete the current directory.
To this day I don't know what that sequence of keys was, but I was
unfortunately in the /etc directory when I found it, and I got a prompt that
said "do you want to remove this?" and I thought i was just removing the
file I had been playing with but instead I removed /etc!
I got the chance to learn how to install AIX from scratch. I did reinstall
INED even though I was a little gun-shy but I made sure that whenever I used
it from then on I was *not* root. I have since decided that EMACS may be a
better choice.

--

*******************************************************************************
* Bill Ranck ra...@joesbar.cc.vt.edu *
* DoD #496 Bikes past and present: CB175, CB550F, Norton 750, CB350F, XV535 *
*******************************************************************************

Casper H.S. Dik

unread,

Oct 13, 1992, 11:09:55 AM10/13/92

to

si...@lia.di.epfl.ch (Simon Leinen) writes:

>A problem with find ... -print | xargs ... is that it has trouble with
>filenames containing whitespace. GNU find/xargs have options to use
>ASCII NUL as the separator, which is much safer (since it is difficult
>to create files with NULs in their names).

Not difficult, impossible.

Casper

Eiji Hirai

unread,

Oct 13, 1992, 12:00:28 PM10/13/92

to

rik.h...@fcit.monash.edu.au writes:
> I'll mount it in /tmp

Though this may strike most sane sysadmins as bad practice, SunOS (3.4 or so
- my memory is vague) shipped a command called "on". If you were logged on
machine A and wanted to execute a command on machine B, you said "on B
command", sort of like rsh.

However, A would mount B's disks under some invokations of "on" and it would
mount it in /tmp! Of course, lots of folks got bitten by this stupid
command and it was taken out after a long delay by Sun.

Anyone remember the details? I've blocked out my memory of pre-4.0 SunOS.
Am I just hallucinating?

Paul Bijnens

unread,

Oct 13, 1992, 5:22:39 PM10/13/92

to

In article <168....@incc.com>, je...@incc.com (Jerry Rocteur) says:
> - xargs places file names one after each other <whatever file1 file...
> which means you can have trouble with large file lists as all files
> will not be acted upon

Read that manual page again please: xargs limits the number of arguments
in different ways (number of args, and total number of bytes). There
is NO problem with large file lists, it is designed to handle just
this kind of problems.
However, be carefull if there are filenames with newlines in them:

$ mkdir '/tmp/
'

Now you have a directory called "/tmp/\n"

$ mkdir '/tmp/
/etc'
$ touch '/tmp
/etc/passwd'

Now we have a file called "/tmp/\n/etc/passwd", then we can do nasty
things like:

$ find /tmp -print | xargs rm -f

Or just wait till cron runs that command. That's why for the
cron cleanup script it is safer (but a little less efficient) to say:

$ find /tmp -exec rm {} \;

> I have had so many problems with xargs that I only use it when my
> list of file is modest!

Read that manual page again then.
--
Paul Bijnens -- MS-DOS is the world's most widespread virus.
Linguistics dept., K. University Leuven, Belgium
Pol...@cc1.kuleuven.ac.be

Hans Mulder

unread,

Oct 13, 1992, 1:12:54 PM10/13/92

to

In <1992Oct12.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>al...@spuddy.uucp (Alan Saunders) writes:

>>About inexperienced sysadmins .. One such had been on a Sun syasadmin
>>course, and learned all about security. One of the topics was on file
>>and group access. On his return, he decided to put what he had learned
>>into practice, and changed the ownership of all files in /bin, /usr/bin
>>to bin.bin! I was called in when no one could log in to the system
>>(of course /bin/login needs to be setuid root!)

>That's not true.
>% ls -l /bin/login
>-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login

Errhm, Casper, did you notice that the login builtin of the shell no
longer works? The error message is "Permission denied".

Or is that intentional?

--
Hope this helps,

Hans Mulder h...@fwi.uva.nl

Randal L. Schwartz

unread,

Oct 13, 1992, 1:21:27 PM10/13/92

to

In article <JOHN.92Oc...@sekrit.WPI.EDU> jo...@sekrit.WPI.EDU (John Stoffel) writes:
Try and use perl. This does a really nice job. Of course you still
aren't protected from yourself... :)

find <whatever> -print | perl -ne 'print; chop; unlink;'

And this doesn't have the potential problem of running out of space
for all the file names like xargs, and it doesn't have to fork a new
exec for every command like find would do.

Simpler:

find <whatever> -print | perl -lne 'print; unlink;'

Weirder: :-)

find2perl <whatever> -print -eval unlink | perl

(Both should work with most modern versions of Perl.)

print "Just another Perl hacker,"
--
Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
mer...@reed.edu (guest account) mer...@ora.com (better for permanent record)
cute quote: "Welcome to Portland, Oregon -- home of the California Raisins!"

Mike Matthews

unread,

Oct 13, 1992, 2:17:46 PM10/13/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:

>Now when I partition a disk I sit there with a calculator and make sure
>all the numbers add up correctly (offsets, number of cylinders, number of
>blocks, and so on).

Heh heh, now that you mention that...

We had just gotten a 1.2G disk drive for our Sun (which direly needed it) so
we felt we'd repartition everything.

All went well, except... on reboot, one of the partitions that was newly
restored from backup got a fsck error. Fixed it, it rebooted, then another
one got an error. fscked that one, rebooted it, and doggone it, the first
error was back!

We had a one cylinder overlap. Sheesh.

At least Ultrix WARNS you of that.
------
Mike Matthews, matt...@oberon.umd.edu (NeXTmail accepted)
------
Don't kiss an elephant on the lips today.

Casper H.S. Dik

unread,

Oct 13, 1992, 1:59:36 PM10/13/92

to

h...@fwi.uva.nl (Hans Mulder) writes:

>Or is that intentional?

In current environments there is no need for a login command in the
shell. My version of login does a:
`if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'
If I hadn't done that, the login builtin would have behaved slightly
odd. The shell must have permission to exec /bin/login, though.
/bin/sh does not exit when an exec fails. /bin/csh does, but
it is broken anyway.

The reason not to have a set-uid login is simple: there are many
ways to subvert passwordless accounts or non-shell accounts
by mucking with the environment and doing a ``login -p''.

Casper

Martin Tomes

unread,

Oct 13, 1992, 6:03:02 AM10/13/92

to

We had something really wierd happen one day. I copied a file to
/usr/local on someone elses machine and all seemed to be OK. A bit
later the user of the machine noticed that the files and directories they
were using on another disk partition were corrupted. There were 2
gigbyte files on a 650Mb disk - and lots of them with wierd names and
permissions. At first I did not connect the two events. This disk
had given trouble when the power failed a week before, so I fsck'ed
it. Now I have run fsck more times than I can begin to imagine and
seen plenty of errors, some needing 'manual intervention' but I had
never seen anything like this before! It was spectacular. And what
was more, when I ran it a second time things got worse. Then I tried
to backup the /usr/local partition before restoring this corrupt data
and lo, that was corrupt too. It turned out that our sysadmin had
created the /usr/local disk partition in the wrong place on the disk
and put it over the top of the alternate sectors partition. By
writing to the /usr/local disk I had written all over the alts which
were mapped into the users partition. Oh dear, what a mess.

Solution, rebuild all the partitions so they don't overlap and
restore, also buy the sysadmin a calculator.

Moral, always do your sums on the /etc/partitions file very carefully
before using mkpart.
--
Martin Tomes
Janet: mto...@uk.co.eurotherm
Internet: mto...@eurotherm.co.uk
UUCP: {uknet,uunet}!etherm!mtomes

Chris A. Anderson

unread,

Oct 13, 1992, 1:28:43 PM10/13/92

to

Ok, here's one...

At a company that I used to work for, the CEO's brother was the
"system operator". It was his job to do backups, maintentance,
etc. Problem was, he didn't have a clue about Unix. We were re-
quired to go through him to do anything, though.

Well, I was setting up a Plexus P-95 to be a
news/mail/communications machine and needed to wipe the disks and
install a new OS. El CEO requested that his brother do the in-
stallation and disk partitioning. He had done this before, so I
gave him the partition maps and let him at it. When he was done,
everything seemed to be ok. Great, on with the install and set-
up.

Things went fine until I started compiling the news and mail
software. All of a sudden, the machine paniced. I brought it
back up and the root file system was amazingly corrupt. After
rebuilding things, it all seemed to be fine -- diagnostics all
ran fine, etc. So I started again -- this time keeping an eye on
things. Sure enough, the root file system became corrupted again
when the system started to load.

This time I brought it down and checked everything. The problem?
Swap space started at block zero and so did the root file system.
ARRRGGGHHHHH!!

Oh yes, the brother still works there.

Chris
--
+------------------------------------------------------------+
| Chris Anderson, Unify Corp. c...@unify.com |
+------------------------------------------------------------+

Roger Miles

unread,

Oct 13, 1992, 3:33:04 PM10/13/92

to

A year ago we moved to a brand spanking new building. All the equipment
was moved by professional movers. The last piece of equipment I wanted
moved was the computer (a Zilog s8000, 6ft. tall, with 3 disk drives,
cartridge drive and reel tape drive all mounted in one cabinet. It must have
weighed 250 to 300 lbs) because I wanted to keep an eye on the movers.
Actually, I was hoping they'd drop it so I could get a new computer. Anyway,
much to my surprize the movers said they would not move the computer because
of the liability. One of my co-workers owned a Ford pickup so we hoisted it
up and drove off with me riding in the back hanging on to the Zilog. It
was the longest 15 minute drive I was ever on in my life.

Roger Miles
KSU

Tim Miller

unread,

Oct 13, 1992, 5:04:23 PM10/13/92

to

This one qulaified for Stupid Act of the Month:

All this happened on my sparcII...

I was making room on / because I needed to to test run something
(which was using a tmp file in, of all places, /var/tmp. I could have
recompiled the application to use more memory and/or /tmp, but I'm too
lazy for that), so I figure "I'll just compress this, and this, and
this..." One of those "this'" was vmunix.

Well, of course the application crashes the machine, and stupid
me had forgotten that I'd compressed vmunix, so the damn thing won't
boot. checksum: Bad value or some such error. Took me most of the day
to figure out just what I'd done to the dang thing. 8)

Moral(s):

1) Never, ever, EVER play with vmunix.
2) Always keep a log of what you do to the root file system.

-- Cerebus <t...@hrt213.brooks.af.mil>
"'Course, being military, it's difficult to get myself fired..."

Kristian Koehntopp

unread,

Oct 14, 1992, 8:39:35 AM10/14/92

to

In <1992Oct12....@javelin.sim.es.com> gfo...@javelin.sim.es.com (Gary Fowler) writes:
>determined that it could be zapped. Forgetting that I was still in /usr/bin, I
>did an rm *. No 10 second idiot proofing with rm. Now if some one would only

That's what csh's !$ is for.

Kristian
--
Kristian Koehntopp, Harmsstrasse 98, FRG W-2300 Kiel, +49 431 676689
"One who has never hacked sendmail.cf has no soul.
One who has hacked it twice has no brain." -- Peter da Silva

Bruce Krawetz

unread,

Oct 13, 1992, 7:08:48 PM10/13/92

to

Back when I was installing X-windows on a Sun-3, I accidently deleted
the console's font. Not only would that machine not boot, it wouldn't
tell me _why_ it wouldn't boot. It seems that without that font, /vmunix
dies most ungracefully very quickly.

Wietse Venema

unread,

Oct 13, 1992, 6:26:34 PM10/13/92

to

d...@ism.isc.com (Dave Butterfield) writes:

>cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>>>(of course /bin/login needs to be setuid root!)
>>
>>That's not true.

>Whether it's true or not depends on which version of Unix you're
>running.

If login is executable only for root processes (getty, telnetd, etc.)
it does not have to be set-uid.

Wietse

John Jarocki

unread,

Oct 13, 1992, 5:12:44 PM10/13/92

to

In article <ericw.718908214@hobbes> er...@hobbes.amd.com (Eric Wedaa) writes:
>
>The moral(s) of the story here:

[Eric's "Guidebook to Being a Good Paranoid UNIX Sysadmin" Deleted]
>
>>>>Ericw
>(Paranoia is a "Good Thing" when you can really muck things up!)
>--
>Eric Wedaa - eric....@amd.com | Two more kinds of lies...
>{ames apple uunet}!amd!ericw | Release Dates, and Benchmarks
>Advanced Micro Devices, M/S 167 PO Box 3453 Sunnyvale, CA 94088-3453
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Eric,

You left out an important one:
- Never hand out directions on "how to" do some sysadmin task
until the directions have been tested thoroughly.
- Corollary: Just because it works one one flavor
on *nix says nothing about the others. '-}
- Corollary: This goes for changes to rc.local (and
other such "vital" scripties.

``The only justifiable purpose of political institutions is to
insure the unhindered development of the individual.''
-- Albert Einstein
---------john.j...@amd.com voice:512-462-4098 fax:512-462-5156-------
--
John Jarocki - john.j...@amd.com

Hari Seldon ... psychohistorian

unread,

Oct 13, 1992, 11:20:13 PM10/13/92

to

In <1992Oct13.0...@ccu1.aukuni.ac.nz> russ...@ccu1.aukuni.ac.nz (Russell Street) writes:

>r...@Ingres.COM (Bob Arnold) writes:
>> 9) It's a lot less painful to learn from someone else's experience
>> than your own (that's what this thread is about, I guess :-) )

>With out trying to wander off the thread tooooo much ... In my
>experience the best experiences to learn off are your own :)
>I wonder how many stories we have got so far about "I will never
>type rm -r /" as root. (And no I have not done that _yet_, but
>the day will come :()

after a real bad crash (tm) and having been an admin (on an rs/6000)
for less than a month (honest it wasn't my fault, yea right stupid)
we got to test our backup by doing:
# cd /
# rm -rf *
ohhhhhhhh sh*t i hope those tapes are good

ya know it's kinda funny (in a perverse way) to watch the system just
slowly go away.

bill pociengel
--
llbi locgeenpi " i'm a little confused right now"

Barrie Spence

unread,

Oct 13, 1992, 12:54:41 PM10/13/92

to

In article <1992Oct13.0...@ccu1.aukuni.ac.nz> russ...@ccu1.aukuni.ac.nz (Russell Street) writes:
>r...@Ingres.COM (Bob Arnold) writes:
>> 9) It's a lot less painful to learn from someone else's experience
>> than your own (that's what this thread is about, I guess :-) )
>
>With out trying to wander off the thread tooooo much ... In my
>experience the best experiences to learn off are your own :)
>I wonder how many stories we have got so far about "I will never
>type rm -r /" as root. (And no I have not done that _yet_, but
>the day will come :()
>

My mistake on SunOS (with OpenWindows) was to try and clean up all the
'.*' directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I
was very careful and made sure I was in /tmp and then executed
"rm -rf ./.*".

I will never do this again. If I am in any doubt as to how a wildcard
will expand I will echo it first.

Barrie
--
Barrie Spence DataCAD Ltd, Caird Street
bar...@calvin.demon.co.uk Hamilton, Lanarkshire ML3 0AL, Scotland
Telephone +44 (0) 698 425599 / Fax +44 (0) 698 419173

Bob Stockler

unread,

Oct 13, 1992, 9:24:12 PM10/13/92

to

r...@Ingres.COM (Bob Arnold) writes:

>Morals:
> 2) Don't do backups to floppies.

Once, Tandy Xenix had the largest installed base of *NIX systems extant.

My friend, mentor and guru Bob Snapp and I undertook to write a systematic
backup set of shell scripts do what the *NIX programs then available would
not do: make a reliable compressed Master Backup, and reliable compressed
incremental backups (so 'cron' could do it) to available 8" floppy drives.

We've never found that our programs failed. Now, on SCO *NIX systems we
prefer CTAR. We've never found it to fail either.

Jean-Louis Faraut

unread,

Oct 14, 1992, 5:02:56 AM10/14/92

to

In article <1992Oct7.1...@multix.no>, ar...@multix.no (Arne Asplem) writes:
|> I'm the program chair for a one day conference on Unix system
|> administration in Oslo in 3 weeks, including topics like network
|> management, system admininistration tools, integration, print/file-servers,
|> securitym, etc.
|>
|> I'm looking for actual horror stories of what have gone wrong because
|> of bad system administration, as an early morning wakeup.
|>
|> I'll summarise to the net if there is any interest.
|>
|> -- Arne
|> --
|> // Arne Asplem // Office: Gjerdrums vei 12, N-0486 Oslo, Norway //
|> // Multix A/S // Phone/fax: +47-2-950800 / 950790 //
|> // // Email: ar...@multix.no //
|> // NUUG Chairman // Email: ar...@nuug.no //

Yes, I've got one for you: I tried to use the Sun automounter :-)

[jlf]

J.Rowe

unread,

Oct 14, 1992, 7:41:23 AM10/14/92

to

In article <rik.71...@nella15.cc.monash.edu.au> r...@nella15.cc.monash.edu.au (Rik Harris) writes:

> I said to myself (being a Friday afternoon...see previous
> post) "it's only temporary.../mnt is already being used...I'll mount
> it in /tmp". So, I mounted on /tmp/a (or something). This was fine
> for a few hours, but then the auto-cleanup script kicked in, and blew
> away half of my source (the stuff over 2 weeks old). I didn't notice
> this for a few days, though. After I figured out what had happened,
> and restored the files (we _do_ have a good backup strategy),
> everything was OK.

If you're doing this using find always put -xdev in:

find /tmp/ -xdev -fstype 4.2 -type f -atime +5 -exec rm {} \;

This stops find from working its way down filesystems mounted under
/tmp/. If you're using, say, perl you have to stat . and .. and see if
they are mounted on the same device. The fstype 4.2 is pure paranoia.

Needless to say, I once forgot to do this. All was well for some weeks
until Convex's version of NQS decided to temporarily mount /mnt under
/tmp... Interestingly, only two people noticed. Yes, the chief op.
keeps good backups!

Other triumphs: I created a list of a user's files that hadn't been
accessed for three months and a perl script for him to delete them.
Of course, it had to be tested, I mislaid a quote from a print
statement... This did turn into a triumph, he only wanted a small
fraction of them back so we saved 20 MB.

I once deleted the only line from within an if.. then statement in
rc.local, the sun refused to come up, and it was surprisingly
difficult to come up single user with a writeable file system.

AIX is a whole system of nightmares strung together. If you stray
outside of the sort of setup IBM implicitly assume you have (all IBM
kit, no non IBM hosts on the network, etc.) you're liable to end up in
deep doodoo.

One thing I would like all vendors to do (I know one or two do) is
to give root the option of logging in using another shell. Am I the
only one to have mangled a root shell?

John Rowe
Dept. Physics
Exeter University
UK.

Peter da Silva

unread,

Oct 14, 1992, 7:39:03 AM10/14/92

to

In article <1992Oct13.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
> In current environments there is no need for a login command in the
> shell.

There never was. I don't understand why the shell treated login and newgrp
specially in V7. It was terribly inconvenient. In older environments I use
a version of newgrp that works like "su", and if you want to get the effect
of the shell "login" command there's always "exec login".

> My version of login does a:
> `if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'

Just dike the "login" builtin right out. It's a waste of bytes.
--
Peter da Silva. <pe...@sugar.neosoft.com>.
`-_-' "Megulegetted ma mar a farkasodat ?"
'U`
Dette kan umulig vaere mitt rom, eftersom jeg ikke puster ammoniakk.

Peter da Silva

unread,

Oct 14, 1992, 7:42:37 AM10/14/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:
> Now when I partition a disk I sit there with a calculator and make sure
> all the numbers add up correctly (offsets, number of cylinders, number of
> blocks, and so on).

When I partition the disk I sit there with "sc" and build a spreadsheet for
the purpose. That way I know all my numbers work, and the saved spreadsheet
is documentation on *why* things are that way.

Jeff Stehman

unread,

Oct 14, 1992, 8:43:24 AM10/14/92

to

From article <39...@wzv.win.tue.nl>, by r...@wzv.win.tue.nl (Rob J. Nauta):
>
> But the stories told now are more folklore than real horror. Having read
> 2 Stephen Kings this weekend I beg everyone to tell more interesting
> stories, about demons, the system clock running backwards, old files
> reappearing etc !

Hmmm. Maybe this is a little closer to what you're looking for...

Many years ago a tiny little college in the middle of nowhere purchased an
NCR tower, then a newfangled contraption. A half-dozen of us were using it
for an assembly class. The prof should have made his warnings about TRAP a
little more clear. One student runs his program and it suddenly begans
spawning processes, rapidly filling the machine. The prof came in, amused,
logged on as superuser, and killed a process. Another process was
immediately spawned. The prof tried again. He was ignored. He was also no
longer amused. After several minutes he gave up and turned off the box.
The tower didn't even flinch. He pulled the plug. Nothing. He ripped the
back off the box and dug around. Finally he found the fuse and pulled it,
killing the machine. Some of us later claimed we heard laughter as it went
down.

(Many times since then I have wished other computers came with a backup
battery as standard issue.)

--
Jeff Stehman Systems Staff, G-18A Jordan
ste...@cs.clemson.edu Dept. of Computer Science
(803)656-2639 Clemson University

Tim Pierce

unread,

Oct 14, 1992, 9:00:26 AM10/14/92

to

In article <92287.142...@cc1.kuleuven.ac.be> Paul Bijnens <FFA...@cc1.kuleuven.ac.be> writes:

>In article <168....@incc.com>, je...@incc.com (Jerry Rocteur) says:
>
>> - xargs places file names one after each other <whatever file1 file...
>> which means you can have trouble with large file lists as all files
>> will not be acted upon
>
>Read that manual page again please: xargs limits the number of arguments
>in different ways (number of args, and total number of bytes). There
>is NO problem with large file lists, it is designed to handle just
>this kind of problems.

I'll believe my eyes before I believe the man page, thanks. xargs
has a history of barfing on long argument lists. (Thank heavens we
installed GNU xargs, to get around this.)

--
____ Tim Pierce / "You are just naive and repressed because
\ / twpi...@unix.amherst.edu / penis envy is here and it's now and it's
\/ (BITnet: TWPIERCE@AMHERST) / all around you." -- Neal C. Wickham

grover davidson

unread,

Oct 14, 1992, 4:32:38 AM10/14/92

to

Several months ago here, we were reoganizing our disk space on an
RS/6000 with AIX 3.1. I have done this many time before, but for some
reason, I was rushing through expanding a file system. Instead of entering
the new file system size where it belongs, I entered it into the mount
point. It also turns out that I was attached 2 levels down in the file
system. Since the size was entered as a number ('234567') and was
INTERPRETED as a mount point directory, the result was a
circular hard link that basicly left the file system unusable.
IBM was not able to help, and we had done quite a bit of work that day,
we had to somehow recover some of the stuff. We ended up doing a dd of the
raw volume, and the read it back in a couple MB at a time and extracted
the pieces that we needed for the mess.

The other day while reading Stevens new book, "Advanced Programming in
the UNIX Environment", he stated that he had done the exact same thing
durring the preparation of his book. At least I am not alone.....

--
Grover Davidson II | The opinions expressed here are solely mine
Conley, Canitano, & Assoc., Inc. | and in no way reflect any opinions or
voice: 216-831-6240 | policies of my employer, even if they may
internet: gro...@ccai.clv.oh.us | agree with them, and especially if they

dvs...@minster.york.ac.uk

unread,

Oct 14, 1992, 6:20:41 AM10/14/92

to

I remember my first (and only, so far) major mistake in unix
admin:

I was changing the UIDs of a few users on one of our major
servers, due to a clash with some machines newly connected to the
net. Fine, edit /etc/passwd then chown all their files to the new
UID. So, rather than just assume that all files owned by "fred"
live in /home/machine/fred I did this:

machine# find / -user old_uid -exec chown username {} \;

This was fine... except it was late at night and I was tired, and
in a hurry to get home. I had six of these commands to type, and
as they would take a long time I'd just let them run in the
background over night.....

So, you come in the next morning and a user compains... I can't
login to the 4/490 - it says "/bin/login: setgid: not owner".

Okay.... naive user problem no?

rlogin machine -l root
/bin/login: setgid: not owner

machine console
login: root
/bin/login: setgid: not owner

Okay - I REALLY can't get in... lets reboot single user and see
whats on... this worked. /bin/login is owned (and setuid to) one
of the users whos UID I changed the previous day... infact ALL
FILES in the ENTIRE filesystem are owned by this user..problem!

We `only' lost about 200 man hours through my little typing
mistake: the moral of the story.. beware anything recursive
when logged in as root!

find / -exec chown user {} \;

Oh dear...

Dave

Belinda Asbell

unread,

Oct 14, 1992, 9:00:31 AM10/14/92

to

In article <Bw40G...@cen.ex.ac.uk>, JR...@cen.ex.ac.uk (J.Rowe) writes:

Probably not. I learned the hard way to be careful if messing with /etc/passwd.
One day, for some reason, I couldn't login as root (pretty scary, since I knew
the root passwd and hadn't changed it).

Turned out that somehow I'd blitzed the first letter of /etc/passwd somehow (vi
does bizarre things sometimes). So I logged in as 'oot' and fixed it.

NEVER do a "chmod -R u-s .", especially not in /usr....

I think that "mount -o" or something similar will mount a filesystem read-write
if it's come up in singleuser mode and is mounted read-only.....

Just my tuppence....
--
=============================================================================
Belinda Asbell + System Administrator, Harris Controls Division
m...@ccd.harris.com + Any opinions presented are mine alone.
-

Andy Dennie

unread,

Oct 14, 1992, 1:08:05 PM10/14/92

to

In article <MERLYN.92O...@romulus.reed.edu>, mer...@ora.com (Randal L. Schwartz) writes:
>In article <JOHN.92Oc...@sekrit.WPI.EDU> jo...@sekrit.WPI.EDU (John Stoffel) writes:
>>
>> find <whatever> -print | perl -ne 'print; chop; unlink;'
>

>Simpler:
>
> find <whatever> -print | perl -lne 'print; unlink;'

Simpler still (although I hesitate to try to one-up Randall :-) )

find <whatever> -print | perl -lnpe 'unlink;'

--
o o| Andy Dennie, HyperDesk Corporation and...@hyperdesk.com
o o| 2000 West Park Drive, Suite 300 Phone: (508) 366-5050 x109
---+ Westboro, MA 01581 Fax: (508) 898-3841

Bill Vermillion

unread,

Oct 14, 1992, 10:28:28 AM10/14/92

to

In article <39...@unix.SRI.COM> gr...@Speech.SRI.COM (Steven Tepper) writes:
>This may not exactly fit the "administration horror story" category, but...

>At one place where I worked, someone had set up cron to delete any
>file named "core" more than a few days old, since disk space was
>always tight and most users wouldn't know what core files were or care
>about them. Unfortunately not everyone knew about this and one user
>lost a plain text file (a project proposal) he'd spent a one lot of
>time working on because he called it "core". This was around 1976,
>when Unix was still considered exotic and before bookstores carried
>entire sections of Unix books.

It probably was just a matter of timing. Sooner or later he would have
done somethig to dump core in his own directory, and the results would
have been the same. Well almost - he would have file but not what he
expected.

--
Bill Vermillion - bi...@bilver.oau.org bill.ve...@oau.org
- bi...@bilver.uucp
- ..!{peora|ge-dab|tous|tarpit}!bilver!bill

Russell Street

unread,

Oct 14, 1992, 3:01:10 PM10/14/92

to

I have not seen one yet about machine generated logs going crazy,
but here is one I had to fix just this morning:

Ultrix (for VAX) has this nice little hardware error logger that
records things like "Your NFS Server has died but I am still trying",
and "I just found a bad block on your swap partition" etc.
It also logs when a file system fills up completely.

So when I came in this morning a user's session had crashed while
he was replying to mail and emacs had spent the night quietly
filling up the root partion (where /tmp) was.

Once it had filled /tmp the error logging deamon starting
filling /usr with messages about / being full! By the time
I found the offending process the error log file had grown
to 6Meg!

Trimming this file in gnu-emacs was not a pretty sight either :)
Have you ever seen a 16M gnu-emacs process? (On a MicroVAX???)

-----------------------------------------------------------------
Russell Street (russ...@ccu1.aukuni.ac.nz)
Listeners are requested to make the necessary adjustments

John Jarocki

unread,

Oct 14, 1992, 3:02:45 PM10/14/92

to

In article <6...@ocdis01.UUCP> rob...@ocdis01.UUCP (Contractor Bob Johnson) writes:
>
>Management told us to email a security notice to every user on the our
>system (at that time, around 3000 users). A certain novice administrator
>on our system wanted to do it, so I instructed them to extract a list of
>users from /etc/passwd, write a simple shell loop to do the job, and
>throw it in the background. Here's what they wrote (bourne shell)...
>
> for USER in `cat user.list`; do
> mail $USER <message.text &
> done
>
>Have you ever seen a load average of over 300 ???
>
>Bob Johnson, Systems Administrator
>Tinker AFB, Oklahoma

Oh, (tee-hee) that reminds me of what I did as a student on a old
Sun 3 (or was it a Sun 2?) once.

This was *way* before anyone let me do any administration ;-), and
just after I had learned UNIX (because I needed to write a resume in
troff). I was curious as to how many users out there had troff source
that I could look at as an example, and so I thought: "Hey, I'll just
generate a list of who has what files" (please, no flames, I know all
about privacy issues now).

Well (being a novice) I typed:

% foreach i in ( /home/*/*/* )
? ll $i &
? end

Well, since the University had a *lot* of user directories NFS
mounted on that machine, it *really* began to crawl. Can you
believe it? -- It actually *crashed* eventually.

The best part was: I went over to look at the console right before
it crashed and it looked like television static. ;)s I never *did*
find out if that was related, or not. I can just see the bug report:
"Spawning hundreds of background processes causes console to wig out".

<sigh>, I can't laugh about stuff like this as much anymore, now that
it's my job to make sure the machines are up. :-)

--john

Casper H.S. Dik

unread,

Oct 14, 1992, 6:33:39 PM10/14/92

to

pe...@NeoSoft.com (Peter da Silva) writes:

>In article <1992Oct13.1...@fwi.uva.nl> cas...@fwi.uva.nl (Casper H.S. Dik) writes:
>> In current environments there is no need for a login command in the
>> shell.

>There never was. I don't understand why the shell treated login and newgrp
>specially in V7. It was terribly inconvenient. In older environments I use
>a version of newgrp that works like "su", and if you want to get the effect
>of the shell "login" command there's always "exec login".

Once upon a long ago, we had one PDP 11 to do Unix on. It was
connected through some crummy port selector (with not enough lines)
and there weren't enough terminals. Using the built-in login was
useful, but exec login would have done the trick. (Except that
my login is no longer set-uid, so ...)

>> My version of login does a:
>> `if (geteuid() != 0) { fprintf(stderr,"Permission denied\n"); exit(1); }'

>Just dike the "login" builtin right out. It's a waste of bytes.

Replacing /bin/login on all our machines to enhance logging and
implement access controls, we can make time for that.
But installing new versions of all shells, yuck.

But someday someone should remove the login and newgrp built-in
from sh/csh etc.

Casper

Gary Heston

unread,

Oct 14, 1992, 5:45:35 PM10/14/92

to

In article <1992Oct7.1...@multix.no>, ar...@multix.no (Arne Asplem) writes:
|> I'm the program chair for a one day conference on Unix system
|> administration in Oslo in 3 weeks, including topics like network
|> management, system admininistration tools, integration, print/file-servers,
|> securitym, etc.

|> I'm looking for actual horror stories of what have gone wrong because
|> of bad system administration, as an early morning wakeup.

With all these stories, I'm suprised nobody has posted the "scratch monkey"
story. Has that admin gone onto bigger and better things?

--
Gary Heston SCI Systems, Inc. ga...@sci34hub.sci.com site admin
The Chairman of the Board and the CFO speak for SCI. I'm neither.
"...I looked out my window, and saw Kyle Pettys' car upside down, then I
thought 'One of us is in real trouble'." Davey Allison, re: a 150MPH crash

Bob Arnold

unread,

Oct 14, 1992, 10:59:59 PM10/14/92

to

In article <SIMON.92O...@liasg2.epfl.ch> si...@lia.di.epfl.ch (Simon Leinen) writes:
>In article <168....@incc.com> je...@incc.com (Jerry Rocteur) writes:
>
> - xargs is not on all systems
>
>Yeah, well, neither is find I assume. And there's always GNU xargs...

Out of curiousity, I got out my Version 6 and Version 7 manuals.
"find" is in both of them. And thus it is probably in virtually every
UNIX variant available today. Out of curiousity too, does anyone know
of any exceptions?

Bob

--
__ _ _ Bob Arnold Ingres, An ASK Corporation
|/ \ / \ / \| 1080 Marina Village Parkway
| / / | Alameda, CA, 94501
| \__/ \__/| r...@ingres.com 510/748-2819

Bob Arnold

unread,

Oct 14, 1992, 10:57:35 PM10/14/92

to

In article <Bw1G0...@gumby.ocs.com> o...@gumby.ocs.com writes:

>I once mistakenly partitioned my Sun's boot disk so that the swap
>partition overlapped the usr partition. The machine ran fine for a long
>time (many months), presumably because the swap space was always nearly
>empty. Then, one day there was a memory parity error and the system crash
>dumped at the *end* of the swap partition. What should have been a simple
>reboot after the crash dump turned into a long and painful re-install of
>the entire system (Suns cannot boot without a /usr partition).

Seems we've had a number of overlapping partition horror stories in
this thread. After a similar and very recent painful struggle with one
here, I decided to check our Suns.

That required a bit of coding, which at 5k is worth the bandwidth :-)

It would have to be modified for other OS, and perhaps for your
environment as well. "df" outputs, checklist/fstab/vfstab formats,
available filesystem types, and methods of discovering disk
partitioning info (dkinfo, chpt -q, disktab, diskpart, prtvtoc, ...)
are notoriously different across UNIXs.

We use raw partitions for raw ingres log files on some hosts (surprise :-)
so the code attempts to tease those partitions out of fstab entries and
comments. Swap partitions are also snagged out of fstab.

This code has been tested by tweaking the partition table of an
otherwise unused disk and running the script against it. It has also
been run on most of our Suns, and we found one small time bomb waiting
to go off. We now run it on every new Sun and newly attached disks.

Use it at your own risk, of course. No warranty or guarantee of any sort
is implied. I hope you find it useful, though, giving you neither a false
sense of security nor any false alarms.

Proactive is better than Reactive, for sure!

Bob

__ _ _ Bob Arnold Ingres, An ASK Corporation
|/ \ / \ / \| 1080 Marina Village Parkway
| / / | Alameda, CA, 94501
| \__/ \__/| r...@ingres.com 510/748-2819

-------------------------------------------------------------------------------
#!/bin/sh
# chkoverlap - check overlapping disk partitions on suns
# * works only on suns (so far)
# * written by Bob Arnold 9/22/92
# Use at your own risk. No warranty or guarantee of any sort is implied.
# Hopefully, you will find it useful.

badnewsonly=false # assume we report good news too
showhost=false # assume we show hostname
overlap=false # assume we don't find overlap
filter=/tmp/$pr.filter # filter egrep-ified from $usedparts
rpt=/tmp/$pr.rpt # final report
tmp=/tmp/$pr.tmp # scratch file
usedparts=/tmp/$pr.used # list of used partitions
splithead=/tmp/chko. # for split of egrep RE if it's too long
arch=`arch` # Sun architecture, we hope
host=`hostname` # get hostname of this machine
rmlist="$filter $rpt $tmp $usedparts ${splithead}*" # files to clean up
USAGE="usage: $pr [-b|badnewsonly] [-s|showhost]
$pr [-h|help]"

case "$arch" in
sun*) : do nothing since we are ok ;;
*) echo "$pr: This script only works on Suns." ; exit 1 ;;
esac

for arg in $* ; do
case $arg in
-b*|b*) badnewsonly=true ;;
-h*|h*) echo "$USAGE" ; exit ;;
-s*|s*) showhost=true ;;
esac
done

## build list of used partitions
# first, make sure we haven't left anything around
rm -f $rmlist
# first, get filesystems and root swap partition from df of local filesystems
df -t 4.2 \
| awk '
/^\/dev\// {
print $1, $NF
if ($NF == "/") {
print $1, "root_swap"
}
}' \
| sed -e '/ root_swap$/s/. /b /' > $usedparts
# a brief diversion here to get the name of the used filesystem
rootdev=`sed -n -e 's,/dev/$[^ ]*$ /$,\1,p' $usedparts`
# next, get used partitions from fstab; this list will mostly overlap the
# df output. But it would include any swap partitions and "ignored"
# partitions ("ignored" partitions are probably Ingres raw log partitions).
egrep '^/dev/' /etc/fstab \
| awk '{print $1, $2}' \
| sed -e 's/ swap$/ fstab_swap/' >> $usedparts
# third, get raw ingres log partitions from fstab comments
awk '
/^\#/ {
# if this line mentions an Ingres raw log partition
if ( $0 ~ /[iI]ngres.*[lL]og|[rR]aw.*[lL]og/ ) {
# try to get the device name from this line
gotdev=0
for (i=1;i<=NF;i++) {
if ($i~/\/dev\//) {
print $1, "ingres_raw_log"
gotdev=1
}
}
# if this line did not have the device name, try the next line
if (gotdev==0) {
trynext=1
}
} else if (trynext==1) {
trynext=0
for (i=1; i<=NF; i++) {
if ($i~/\/dev\//) {
print $1, "ingres_raw_log"
gotdev=1
}
}
}
}
' /etc/fstab >> $usedparts
# finally, strip floppies, cdroms, and leading "/dev/" from $usedparts,
# and sort uniq lines too
sed -e '/^\/dev\/fd/d' -e '/^\/dev\/sr/d' -e 's,\#*/dev/,,' $usedparts \
| sort +0u -1 > $tmp
mv $tmp $usedparts

## build list of used disks
disklist=`sed -e 's/. .*//' $usedparts | sort -u`

## build egrep command to grab *used* partitions from list
## of *available* partitions
# create egrep filter
awk '{print "^" $1 " "}' $usedparts > $filter
# If there are too many filesystems, egrep will barf on the filter, saying
# egrep: regular expression too long
# In that case we have to split the filter into pieces and then create
# an egrep command that can handle them all
# To test for that condition, we have to feed the filter something it is
# guaranteed to find, i.e. the root device (which we figured out earlier)
# followed by a <SPACE>
if echo "$rootdev " | egrep -f $filter > /dev/null 2>&1 ; then
egrepcmd="egrep -f $filter"
else
split -5 $filter $splithead
for i in ${splithead}* ; do
if test -z "$egrepcmd" ; then
egrepcmd="egrep -f $i"
else
egrepcmd="$egrepcmd | egrep -f $i"
fi
done
fi

## put check of each disk into $rpt
# gnarly code gets info about partitioning of each disk,
# egreps for partitions we're really using,
# joins that info to $usedparts and sorts it by starting cylinder
# (which is *key* because the following awk algorithm depends on it),
# shoving that output into an awk script to do the actual overlap check
for disk in $disklist ; do
dkinfo $disk 2>&1 \
| sed -e '/^[a-z][a-z][0-9]:/d' \
-e '/No such device or address/d' \
-e '/cylinders.*heads.*track/d' \
| awk '
$1 ~ /^[a-h]:/ {printf("%s%s %s ", "'$disk'", $1, $4)}
$1 !~ /^[a-h]:/ {print $NF}
' \
| sed -e 's/[:(]//g' \
| awk '{print $1, $3, $2}' \
| eval $egrepcmd \
| join - $usedparts \
| sort +1n -2 \
| awk '
BEGIN {n=0}
# load arrays from each input line
{ part[n]=$1; startcyl[n]=$2; ncyl[n]=$3; mtpt[n]=$4; line[n]=$0; n++ }
# now check arrays to see if any partitions overlap
END {
for (i=0; i+1<n; i++) {
if ((startcyl[i]+ncyl[i]-1) >= startcyl[i+1]) {
print line[i], "OVERLAP", line[i+1]
}
}
}
' >> $rpt
done

## final results !!
test -s $rpt && overlap=true
case $badnewsonly in
false) test ! -s $rpt && echo OK > $rpt ;;
esac

case $showhost$badnewsonly in
truetrue) echo -n "$host " ; test $overlap = true && cat $rpt ;;
falsetrue) : ; test $overlap = true && cat $rpt ;;
truefalse) echo -n "$host " ; cat $rpt ;;
falsefalse) : ; cat $rpt ;;
esac

## clean up and quit
rm -f $rmlist
if test $overlap = "true" ; then
exit 1
else
exit 0
fi

Nancy Milligan

unread,

Oct 14, 1992, 6:13:22 PM10/14/92

to

After 8 years of system administrating, I've had more than my share of
horror stories. But the ones from the early years were the scariest,
and now the funniest.

Way back in the mid-eighties we had this awful looking Motorola machine
with a Bernoulli disk that ran some ancient version of System V.
I used to torment that poor machine, 'cause I didn't know any better.
One day we ran out of room in the root file system so I deleted this
big file called "unix"...

Another time I was messing with /etc/gettydefs and I managed to
screw up the console so that everytime you pressed a key on the
console it would change baud rates. That made logging in a real
challenge.

I'm so grateful my friend and mentor, Drake Coker, was always
on hand to straigten out my messes.

And then there was a time when I was a newish system administrator.
Kind of a hot shot too. Most everyone was impressed with my experience
and prowess. Until one night at about 3:00 a.m. when I was
installing a new disk drive on a machine that already had one, and I
newfs'd the wrong disk.

Ha! I'd nearly forgot about this one. I wrote a program to
compress or delete stuff in various directories, like /tmp, depending
on the contents of a configuration file and the age of files that it
found. I wrote it, ran it, debugged it, wrote the configuration file
and installed the bugger in crontab.

About three days later almost every file on this machine had been deleted or
compressed. Apparently I got distracted by something while I was writing
the config file, and the entry that was supposed to be for /tmp said /.
Boy, did I feel like an ijjit.

Just because I have God-like powers doesn't mean I can't be a fool.

--
Nancy P. Milligan n...@dale.cts.com
Titan Linkabit cts!dale!npm
3033 Science Park Road
San Diego, CA 92121

Nancy Milligan

unread,

Oct 14, 1992, 8:21:15 PM10/14/92

to

Eiji Hirai (hi...@cc.swarthmore.edu) wrote:

> rik.h...@fcit.monash.edu.au writes:
> > I'll mount it in /tmp
>

> Though this may strike most sane sysadmins as bad practice, SunOS (3.4 or so
> - my memory is vague) shipped a command called "on". If you were logged on
> machine A and wanted to execute a command on machine B, you said "on B
> command", sort of like rsh.
>
> However, A would mount B's disks under some invokations of "on" and it would
> mount it in /tmp! Of course, lots of folks got bitten by this stupid
> command and it was taken out after a long delay by Sun.
>
> Anyone remember the details? I've blocked out my memory of pre-4.0 SunOS.
> Am I just hallucinating?

No you aren't hallucinating, I remember this one VERY well. Fortunately
the Sun kernel had to be patched in order to allow root to touch anything
on an NFS file system, so when the "find /tmp/. -mtime +3 -exec rm {} \;"
command ran, it couldn't do anything on the NFS system. Scared me witless
when I saw what it was doing, until I realized that "nobody" feature
protected us.

But those poor devils that patched their kernels so that root could access
an NFS disk (which we did later)... I think that was one of the dumbest
things I ever saw come from Sun.

Keith Warren Rickert

unread,

Oct 14, 1992, 11:42:58 PM10/14/92

to

Well, there was the time one of the /dev/tty* things got messed up,
and I decided to remake all of them from some big nasty
script that came with the system.
Unfortunately, that script deleted all the old /dev files,
but remade them in the current dir...which in this case
happened to be /etc. :(
Needless to say, they didnt do much good there, and when I
later tried to login and got a out of tty's error....ugh.
Lucky there werent any long jobs running, so I could
reboot it to single user mode (using a tty def that is apparently in
PROM or some such...*whew*) and remake the tty's correctly.

Keith

David J. MacKenzie

unread,

Oct 14, 1992, 11:34:16 PM10/14/92

to

: >A problem with find ... -print | xargs ... is that it has trouble with
: >filenames containing whitespace. GNU find/xargs have options to use
: >ASCII NUL as the separator, which is much safer (since it is difficult
: >to create files with NULs in their names).
:
: Not difficult, impossible.

Yes, this is infinitely more secure. Note that you can also feed such
filename lists to Perl using the -0 switch.

Credit where credit is due . . . Dan Bernstein originated the idea,
in October '90. GNU got it from him, and Perl got it from GNU, I believe.

David J. MacKenzie

unread,

Oct 15, 1992, 1:56:54 AM10/15/92

to

> - xargs is not on all systems
>
>Yeah, well, neither is find I assume. And there's always GNU xargs...

Out of curiousity, I got out my Version 6 and Version 7 manuals.
"find" is in both of them. And thus it is probably in virtually every
UNIX variant available today. Out of curiousity too, does anyone know
of any exceptions?

I'm sure every AT&T derived system has find. Maybe Minix 1.0 or early
Coherents didn't (I don't know), but those are the only possible
exceptions I can imagine.

Now, some of those systems have pretty bare-bones or broken versions
of find (see comp.unix.solaris), but that's another story . . . .

Mike Stefanik

unread,

Oct 14, 1992, 2:14:53 PM10/14/92

to

In an article, h...@fwi.uva.nl (Hans Mulder) writes:
>>% ls -l /bin/login
>>-r-xr-xr-x 1 root 40960 Jul 2 15:42 /bin/login
>
>Errhm, Casper, did you notice that the login builtin of the shell no
>longer works? The error message is "Permission denied".

Then you must think there we're really deprived ... ;-)
$ ls -l /bin/login
---x------ 1 root bin 113106 Sep 06 1991 /bin/login

--
Mike Stefanik mi...@pacsoft.com ...!uunet!pacsoft!mike (714) 681-2623
Pacific Software Group, Riverside, CA