... so I wanted to add hd capacity to my Proliant 1600; since all the bays were
full, I decided to pull one of the 9gig drives and pop in a 73 gig drive (see
my prior post, "Adding a SCSI HD" from 10/28). Now, I first contemplated
removing the old drive in software and then adding the new one via mkdev hd.
Took a look in the the TA's and found 105864; it look really formidable and
forbidding (why can't SCO, er, Caldera automate this stuff?) Mentioned my plan
to one of the denizens of the NG, and he suggested skipping the sw incantations
and simply swapping the two drives ("If the bus fits, you must send bits.").
Sounded like a plan to me.
So, I did that, and the problem was that the old drive seemed to remain in
memory. By that I mean, there seemed no difference in how the system responded
to hd commands targeted to the disk; mount, umount, df, ls, etc., all responded
as if I'd done nothing. Divvy, invoked via mkdev hd, showed the old disk's
filesystems; fdisk, the old partition table. Yes, I verified manually (by
pulling other drives and watching the errors) that I was talking to the correct
hd. (Parenthetically, these are SCA hot-swap drives, and therefore I had not
yet rebooted). Presumably, a hd shipped from the factory does not contain any
filesystems, let alone a HTFS filesystem with identical files as the old one!
Therefore, I deduced that something is wrong. So, I did the Windows thing and
rebooted. (On the SCSI bios screen, it correctly noted the vendor and model
number).
And got a flurry of error messages from fsck (before going multiuser),
presumably invoked with the '-y' option: LOTS of things were wrong with my
disk. Hmmmm. Lots of phantom files reconnected, lots of new lost+found entries,
etc. And still the disk looked like the old one: showing, e.g. 9gig in capacity
instead of 73gig. Divvy showed the old filesystem. What the hey?
At this point, I decided I was getting myself in trouble. The remedy would be
to go back to plan A and remove traces of the old disk from the OS.
[Parenthetically, there was a phantom disk in software caused, I guess, by
previously not following the removal instructions in the TA to the letter; it
never bothered anyone, but now was as good a time as any to fix it. So,
following the TA's instructions to the letter, I removed disks 4 and 3.5. :-)]
And then rebooted. And then re-ran mkdev hd (this time, it acted as if I had
run it for the first time, which seemd logical) and rebooted and re-ran it.
This time, fdisk show a 9gig partition table and divvy showed a filesystem of
yet ANOTHER hd which I had removed from the system - months ago! Encouraging
sign, though - badtrk (I think) in reporting the vendor ID, correctly noted
that it was a Compaq. Not knowing what to do here, I took a deep breath and re-
made the filesystems (taking steps to make sure it was talking to the right
drive).
And now, everything works (it fsck's and mounts beautifully and I can copy
to/from it all day long), but it's still a 9gig drive, meaning I wasted $700
and an hour and a half of my time.
So,
a) What did I do wrong? and
b) Does Compaq's UW-SCSI-2 bus have something against hard drives > 9gig? or,
c) Am I missing a Compaq ceremony to initialize the disk somehow? (I doubt
this, b/c I did not do it for the other drives.) or,
d) Do I need to do something special to tell 5.0.6 about this hd (dparam,
maybe)?
e) Are there (better) foolproof ways of ensuring that one is talking to the
correct disk?
--
_________________________________________
Nachman Yaakov Ziskind, EA, LLM aw...@egps.com
Attorney and Counselor-at-Law http://yankel.com
Economic Group Pension Services http://egps.com
Actuaries and Employee Benefit Consultants
I would have done....:
shutdown
swap the drives
boot up, and at the boot: prompt enter "DEFBOOTSTR biosgeom"
enter the root passwd to go into single-user mode instead of booting
all the way up.
run fdisk -f /dev/whatever-it-is
use entire disk for unix
write, exit
... note: if fdisk did not show the correct full size of the drive,
might as well not proceed past this point...
run divvy <options for this device>
create the new (bigger) filesystems, give them the same names and
ordering
(if you want, don't have to of course)
write, exit.
exit single-user, allow it to boot up. done.
now for the lots of talking...
if you created all the same names in divvy, just bigger sizes, then
when /etc/default/filesys gets read and goes to auto-mount the old
filesystems, it should just silently mount them no problem. if you
made new names, different arrangement, that you should edit
/etc/default/filesys while still in single user mode, and you can test
it by mounting the new fs's and /etc/default/filesys by mounting and
then umounting the new fs's by giving just the mount-point as the only
argument to mount "mount /u5" etc...
reboot without putting anything on the boot: command line just to be
sure it still works without the biosgeom option. that is just a way to
allow the os to "see" the real size of the disk even if the drive has
been "stamped" with the wrong geometry. which generally only happens
as a result of doing something like using dd to write a raw copy of
another disk. which you didn't say anything like that so don't worry.
when fdisk sees the wrong size for the entire drive, that is when you
should try booting with the biosgeom option and then see if fdisk sees
a different size. dparam can be used to fix a bad stamp, but I confess
I maybe only did it that way once or twice, I usually just take the
easy way out and use an airbag boot floppy which has a simple menu
choice to do it for you, and the correct numbers are even already
filled in if you boot with biosgeom (assuming your bios can correctly
handle the size of your disk) and the problem practically never comes
up anyways so it's not like I do that very often either, although I
did just happen to have the problem recently thanks to some less than
delicious backup software.
you can boot with the option, go only into single user mode, and run
fdisk on the device, just to see what size it reports, and exit fdisk
withuout writing and reboot without the option to take a look without
damaging your data.
another thing is...
you are certain this drive is not part of a raid array right??? :)
it can be on a raid card, even on the same cable/channel/bus as other
disks that are part of a raid array, just not part of one itself.
And at this point, your method becomes superflous.
He has SCSI, SCSI disks. The bios doesn't care or know about the
geometry of scsi disks.
--
==========================================================================
Tom Parsons t...@tegan.com
==========================================================================
You didn't say if this disk was part of an array. Some of the things
you experienced would indicate it was in an array. If you have a scsi
array of 4-9 gig disks and replace one with a 73.4 gig disk, the array
will only use an amount of the disk equivalent to the original disk.
As a rule, arrays want identical disk sizes.
Making an assumption since we don't know if/how/what array you are
using.
If you had an array and you merely pulled out the old disk and inserted
the larger one, odds are the array controller would rebuild the new
disk to be identical to the old one.
Again, a few details missing.
You do not run mkdev hd unless you need to add a new drive device to
the kernel.
Replacing disks in an array is somewhat dependent on the array.
To replace a scsi disk that is not part of an array.
The steps are:
fdisk (see man HW hd for the correct device name)
create the partition layout you desire)
divvy (see man HW hd for the correct device name)
create/modify the division tables
If you changed division names or mount points, you have two choices:
edit /etc/default/filesys
create your own lost+found directory in each filesystem.
(my method)
cd {filesystem}
mkdir lost+found
cd lost+found
copy /usr/lib/terminfo/a/* .
rm -f a*
<or>
run mkdev fs and carefully make the filesystem changes, letting
it create the lost+found directories.
Tom, I didn't mention an array b/c I don't have one. Why should I mention what
I *don't* have?
| Again, a few details missing.
|
| You do not run mkdev hd unless you need to add a new drive device to
| the kernel.
Oh? That's not what SCO says (see, e.g., 'man mkdev'). You have to run run
mkdev hd twice. Anyway, it's a convenient way to run fdisk, badtrk (don't
forget that one) and divvy, right? Especially if you're not so good on device
names and nodes and stuff and don't want to overwrite the wrong disk. :-(
| Replacing disks in an array is somewhat dependent on the array.
|
| To replace a scsi disk that is not part of an array.
| The steps are:
|
| fdisk (see man HW hd for the correct device name)
| create the partition layout you desire)
| divvy (see man HW hd for the correct device name)
| create/modify the division tables
| If you changed division names or mount points, you have two choices:
| edit /etc/default/filesys
| create your own lost+found directory in each filesystem.
| (my method)
| cd {filesystem}
| mkdir lost+found
| cd lost+found
| copy /usr/lib/terminfo/a/* .
| rm -f a*
| <or>
| run mkdev fs and carefully make the filesystem changes, letting
| it create the lost+found directories.
| --
| ==========================================================================
| Tom Parsons t...@tegan.com
| ==========================================================================
But what if fdisk shows the wrong number of tracks, and divvy the wrong number
of blocks? What does one do then?
| Oh? That's not what SCO says (see, e.g., 'man mkdev'). You have to run run
| mkdev hd twice. Anyway, it's a convenient way to run fdisk, badtrk (don't
| forget that one) and divvy, right? Especially if you're not so good on device
| names and nodes and stuff and don't want to overwrite the wrong disk. :-(
Forget badtrk? I most certainly never use it it. There is no logical reason
to run badtrk on a SCSI drive unless you want to perform a minimal erase
of everything on the drive. SCSI drives handle bad tracks internally.
The man pages say absolutely nothing about running mkdev hd when you
change hard disks. It specifically states:
mkdev hd
add a hard disk to the system by creating the necessary device files
^^^^^^^^^^^^
You already have the device files. If the man page said you HAD to run
mkdev hd to change a hard drive, I would have bitched long ago and had
it changed.
Forget badtrk? I most certainly never use it. There is no logical reason
to run badtrk on a SCSI drive unless you want to perform a minimal erase
of everything on the drive. SCSI drives handle bad tracks internally. If
a scsi drive is in a condition where you need to run badtrk, the drive should
be thrown away.
| | To replace a scsi disk that is not part of an array.
| | The steps are:
| |
| | fdisk (see man HW hd for the correct device name)
| | create the partition layout you desire)
| | divvy (see man HW hd for the correct device name)
| | create/modify the division tables
| | If you changed division names or mount points, you have two choices:
| | edit /etc/default/filesys
| | create your own lost+found directory in each filesystem.
| | (my method)
| | cd {filesystem}
| | mkdir lost+found
| | cd lost+found
| | copy /usr/lib/terminfo/a/* .
| | rm -f a*
| | <or>
| | run mkdev fs and carefully make the filesystem changes, letting
| | it create the lost+found directories.
|
| But what if fdisk shows the wrong number of tracks, and divvy the wrong number
| of blocks? What does one do then?
If fdisk shows the wrong number of tracks (are you sure), then you should
first determine if you are looking at the right hard disk. It is rather
obvious that if fdisk doesn't display proper size, divvy isn't going to
display the proper size.
fdisk tells you what device node it is reading, does that correspond to
where you have the drive configured? I guess it is possible the Compaq
controller might have the size of the drive saved somewhere. I've seen
dumber...but not much.
except it happened to me only last weekend. I'm not theorizing.
fdisk only saw half the disk, the cause was an ill-thought-out use of
a certain backup programs "bare-metal" backup & restore. (their
support told me that was how I'd have to do what I was trying to do,
copy a box onto new hardware) the original disk was 36 gigs, the new
disk was 73 gigs, backed up, restored, and fdisk itself only sees 36
gigs on the new disk even though the raid card bios clearly showed the
logical volume being 73 gigs, even when I used a boot floppy to delete
all fdisk partitions and re-create. the biosgeom option however caused
fdisk to see the correct size. (only while it was in effect. reboot
without it and fdisk shows the half-size again, until I re-stamped the
disk)
this was a scsi ami megaraid and the 73gig is a logical disk of 4
36gigs in a raid-10 config. ctar/airbag utilities option re-stamped it
with what I assume is a user-friendly front-end to dparam, and then I
was able to fdisk the whole drive and no longer need the biosgeom
option.
the backup/restore was analogous to ghost or dd from what I can make
of the docs. it wrote the geometry dparam stamp from the 36gig disk
onto the 73gig
the result was a fully working set of sco fdisk and divvy partitions,
it booted up the first time after the restore. it crashed as soon as
the kernel was loaded but that is to be expected because the hardware,
including the scsi card was all new and foreign to that kernel, but it
worked in so far as the boot program ran and loaded the kernel and if
you gave boot: options to load the unix.install with btld floppies,
the fs's all were viable and I even proceeded to rebuild the on-disk
kernel so that the box booted up and ran on it's own. After all that
was done was when I noticed that the disk was the same size as it was
on the old box. Had to do it all over again from scratch. :)
the second time I just used an ordinary ctar tape & airbag floppy to
create fs's and restore the OS, and updated the data directly from the
old box via the network. Yes, I should have only done that in the
first place, it's a no-brainer, and that's all I ever wanted to do,
but that is another story probably best just left alone. :)
> He has SCSI, SCSI disks. The bios doesn't care or know about the
> geometry of scsi disks.
The machine bios doesn't, but the SCSI adaptor bios certainly does.
Theres a bunch of stuff Bela Lubkin contributed to the FAQ starting at
http://aplawrence.com/SCOFAQ/scotec6.html#drivegeom1
--
Please note new phone number: (781) 784-7547
Tony Lawrence
Unix/Linux Support Tips, How-To's, Tests and more: http://aplawrence.com
Free Unix/Linux Consultants list: http://aplawrence.com/consultants.html
> |
> | You do not run mkdev hd unless you need to add a new drive device to
> | the kernel.
>
> Oh? That's not what SCO says (see, e.g., 'man mkdev'). You have to run run
> mkdev hd twice. Anyway, it's a convenient way to run fdisk, badtrk (don't
> forget that one) and divvy, right? Especially if you're not so good on device
> names and nodes and stuff and don't want to overwrite the wrong disk. :-(
Yes, it is convenient. Tom's not completely wrong, but I would do just
what you did. It's a new disk- it needs fdisk, and divvy and even
though Tom is right about SCSI disks not needing badtrk, I run a quick
scan anyway because if this disk is a total piece of crap, I want to
know it NOW before I waste more time on it. So it sounds like you did
exactly what I would have done, though perhaps for different reasons.
Can we backup a second and go over what I think you had and what you
should have done?
I think you had a 9 gig drive mounted somewhere. If you did, for
example, "df -v" it would have shown up as one of your mounted
filesystems. The device node would have been listed there also.
Lets say it was /dev/u2
Run "divvy /dev/u2" just to see if you left any space there or have any
non-fs's you forgot about.
Backup the directory it is mounted on.
Then shutdown, replace the drive with the 72gb, and reboot single user
(single user so /dev/u2 doesn't try to mount).
From what I read, it sounds like you followed this procedure or
something very close to it so far, am I right?
Now run "divvy /dev/u2" again. I think this is where you used "mkdev
hd", which is (as Tom said) perhaps unnecessary, but otoh it is also
completely harmless (assuming you answer things to make it so) and does
indeed lead you through the steps mentioned. And I said, running badtrk
may raise some eyebrows, but I still think its worth the time.
However, when you got to the divvy part, it should have been blank. If
it was NOT blank (which seems to be the implication here) then you
either were NOT looking at the proper disk or your system is very very
confused.
Is the above pretty much what happened?
> But what if fdisk shows the wrong number of tracks, and divvy the wrong number
> of blocks? What does one do then?
For a new drive, there shouldn't have been an fdisk partition either,
unless it was something from another machine or had been previously
used. Was there?
I'm going to disagree.
I still run badtrk, because once in a great while you get something that
is defective out of the box. Running badtrk shows you that NOW, rather
than during your restore of data. It may be a waste of time 999 times
out of a thousand nowadays, but hey, I get paid by the hour :-)
Seriously, I do think it's better to find out earlier rather than later.
The quick scan isn't all that bad and I usually have something useful
to do while its running.
^^^^^^^^^^^^
>
> You already have the device files. If the man page said you HAD to run
> mkdev hd to change a hard drive, I would have bitched long ago and had
> it changed.
Nonetheless, it is a convenient way to step through fdisk and divvy. It
doesn't require consultation of "man HW hd" to remember SCO's silly disk
naming (Solaris makes perfect sense, SCO never has). It's as harmless
and foolproof as manual invocations and perhaps even more so if you at
least know the scsi id.
Nachman did exactly what I would have done, and there's nothing wrong
with what he did.
> Forget badtrk? I most certainly never use it. There is no logical reason
> to run badtrk on a SCSI drive unless you want to perform a minimal erase
> of everything on the drive. SCSI drives handle bad tracks internally. If
> a scsi drive is in a condition where you need to run badtrk, the drive should
> be thrown away.
And badtrk is a good way to find out NOW if that is the case.
Nachman Yaakov Ziskind wrote:
> (Apologize for the long post, but I think all the info is necessary.)
>
> ... so I wanted to add hd capacity to my Proliant 1600; since all the bays were
> full, I decided to pull one of the 9gig drives and pop in a 73 gig drive (see
> my prior post, "Adding a SCSI HD" from 10/28). Now, I first contemplated
> removing the old drive in software and then adding the new one via mkdev hd.
> Took a look in the the TA's and found 105864; it look really formidable and
> forbidding (why can't SCO, er, Caldera automate this stuff?) Mentioned my plan
> to one of the denizens of the NG, and he suggested skipping the sw incantations
> and simply swapping the two drives ("If the bus fits, you must send bits.").
> Sounded like a plan to me.
>
> So, I did that, and the problem was that the old drive seemed to remain in
> memory. By that I mean, there seemed no difference in how the system responded
> to hd commands targeted to the disk; mount, umount, df, ls, etc., all responded
> as if I'd done nothing. Divvy, invoked via mkdev hd, showed the old disk's
> filesystems; fdisk, the old partition table. Yes, I verified manually (by
> pulling other drives and watching the errors) that I was talking to the correct
> hd. (Parenthetically, these are SCA hot-swap drives, and therefore I had not
> yet rebooted). Presumably, a hd shipped from the factory does not contain any
> filesystems, let alone a HTFS filesystem with identical files as the old one!
> Therefore, I deduced that something is wrong. So, I did the Windows thing and
> rebooted. (On the SCSI bios screen, it correctly noted the vendor and model
> number).
Did you SHUTDOWN to replace this disk or do you have hot swap disks? If
the latter, I would expect this is what you WOULD see.
>
> And got a flurry of error messages from fsck (before going multiuser),
BEFORE going multiuser? That doesn't make sense - unless you mean
"after I hit ctrl-d but before I could log in"
> presumably invoked with the '-y' option: LOTS of things were wrong with my
> disk. Hmmmm. Lots of phantom files reconnected, lots of new lost+found entries,
> etc. And still the disk looked like the old one: showing, e.g. 9gig in capacity
> instead of 73gig. Divvy showed the old filesystem. What the hey?
Well, if this was done hotswap, parts of your old fs would have been
written to the new disk. Though honestly its hard for me to imagine
that it would have cached enough to get anything that could be fsck'd at
all, but maybe.
My first thought (like others here) would have been a RAID array somehow
botched, but you say you don't have RAID..
>
> At this point, I decided I was getting myself in trouble. The remedy would be
> to go back to plan A and remove traces of the old disk from the OS.
> [Parenthetically, there was a phantom disk in software caused, I guess, by
> previously not following the removal instructions in the TA to the letter; it
> never bothered anyone, but now was as good a time as any to fix it.
Ahh- perhaps this was the source of your trouble. Though I don't see
how it got you into this mess.
So,
> following the TA's instructions to the letter, I removed disks 4 and 3.5. :-)]
> And then rebooted. And then re-ran mkdev hd (this time, it acted as if I had
> run it for the first time, which seemd logical) and rebooted and re-ran it.
> This time, fdisk show a 9gig partition table and divvy showed a filesystem of
> yet ANOTHER hd which I had removed from the system - months ago! Encouraging
> sign, though - badtrk (I think) in reporting the vendor ID, correctly noted
> that it was a Compaq. Not knowing what to do here, I took a deep breath and re-
> made the filesystems (taking steps to make sure it was talking to the right
> drive).
>
> And now, everything works (it fsck's and mounts beautifully and I can copy
> to/from it all day long), but it's still a 9gig drive, meaning I wasted $700
> and an hour and a half of my time.
Show us divvy on that drive.
[big snip of stuff we've all seen already. My problem boils down to this: after
replacing a small hd with a bigger one, the old partition tables and
filesystems were still visible, and the disk could even be written to/read
from].
I wrote:
| But what if fdisk shows the wrong number of tracks, and divvy the wrong
| number of blocks? What does one do then?
Tom Parsons wrote:
| If fdisk shows the wrong number of tracks (are you sure), then you should
| first determine if you are looking at the right hard disk. It is rather
| obvious that if fdisk doesn't display proper size, divvy isn't going to
| display the proper size.
|
| fdisk tells you what device node it is reading, does that correspond to
| where you have the drive configured? I guess it is possible the Compaq
| controller might have the size of the drive saved somewhere. I've seen
| dumber...but not much.
I absolutely was looking at the right hard disk. First of all, since I used
mkdev (for better or for worse) I only input the SCSI ID, which I could tell
(in this Compaq setup drive ID is determined by the slot the hd sits in) and
the mkdev script figured out the rest. Plus, I pulled the drive and watch the
errors pile up. So, I did have the right drive.
Anyway, for the record, the real reasons I run badtrk are 1) because it's
there, and 2) if there's an error, I want to know about it - another data point
for my investigation. Unix is sparse enough with its messages that I don't want
to throw any away!
Then Tony Lawrence wrote:
| Can we backup a second and go over what I think you had and what you
| should have done?
| I think you had a 9 gig drive mounted somewhere. If you did, for
| example, "df -v" it would have shown up as one of your mounted
| filesystems. The device node would have been listed there also.
Yup. Yup. Yup.
| Lets say it was /dev/u2
| Run "divvy /dev/u2" just to see if you left any space there or have any
| non-fs's you forgot about.
| Backup the directory it is mounted on.
Done all that previously - it was now empty of everything except lost+found.
| Then shutdown, replace the drive with the 72gb, and reboot single user
| (single user so /dev/u2 doesn't try to mount).
| From what I read, it sounds like you followed this procedure or
| something very close to it so far, am I right?
Except reboot into single user - I skipped this. I just umounted.
| Now run "divvy /dev/u2" again. I think this is where you used "mkdev
| hd", which is (as Tom said) perhaps unnecessary, but otoh it is also
| completely harmless (assuming you answer things to make it so) and does
| indeed lead you through the steps mentioned. And I said, running badtrk
| may raise some eyebrows, but I still think its worth the time.
| However, when you got to the divvy part, it should have been blank. If
| it was NOT blank (which seems to be the implication here) then you
| either were NOT looking at the proper disk or your system is very very
| confused.
THAT was my problem. Both fdisk and divvy were retaining the old tables - even
across a reboot.
| For a new drive, there shouldn't have been an fdisk partition either,
| unless it was something from another machine or had been previously
| used. Was there?
It showed precisely the same numbers as before the swap. Same tracks, same
filesystems, etc.
So, last night, while working from home, feeling dreary, pondering Tom's
admonition to make sure I was looking at the right drive (and darn it, I WAS
looking at the right drive), and then I saw it:
Current Hard Disk Drive: /dev/rdsk/3s0
+-------------+----------+-----------+---------+---------+---------+
| Partition | Status | Type | Start | End | Size |
+-------------+----------+-----------+---------+---------+---------+
| 1 | Active | UNIX | 1 | 281774 | 281774 |
+-------------+----------+-----------+---------+---------+---------+
Total disk size: 2258025 tracks (256 reserved for masterboot and diagnostics)
At which point, I noticed the "Total disk size:" DUH! Perhaps the 'tracks' it
is counting is not the same as the Start/End figures? I did some feverish
reading on the fdisk man page and decided that, no, tracks are tracks, and
fdisk is only looking at 12% of the drive. I selected "Use Entire
Disk for UNIX", and (after a scary warning) got:
| 1 | Active | UNIX | 1 | 2257769 | 2257769 |
Which looked MUCH better. So, letting the dice roll, I ran divvy, and got the
same division table as before, with one small difference:
71111722 1K blocks for divisions, 8001 1K blocks reserved for the system
Which I picked up on right away. :-) I thought I'd just expand the last block
of the first filesystem to 71111722, umount and mount, and away I'd go.
But no, the filesystem stayed at 9gig in df -vk. After more head scratching, I
decided to remove all the first division entirely, and reinput the numbers.
Much happier! Divvy declared:
Making Filesystems
and mkfs sat there for a most satisfying five or ten minutes, and I was now up
to 73gig, Woo-hoo!
So, where was OpenServer getting its copies of the tables from? I refuse to
believe the the Compaq Proliant SCSI system is intelligent enough to store
stuff like that. Anyone?
NYZ
Well, I suspect that it has to be that, and my guess would be that the
controller has a big cache. I asked before if this was a hot swap- if
it was, then perhaps that, and the fact that you apparently never
powered off, caused the retention of this old data. It really could not
have been anywhere else.
After reading this over and over, I think this is what happened:
You unmounted your 9gig and, without shutting down, put in the 72gb drive.
The partition table and the divvy info were (quite naturally) in cache,
so whem you ran "mkdev hd", the 9 GB data was picked up and then written
to the new disk at your request. You then rebooted, but of course that
was too late.
I suppose the controller cache could have added to this also- I don't
know if it is smart enough to flush when a drive is replaced.
Morale: hot swap should ONLY be used for RAID systems. For non-raid,
shut the darn thing off when replacing drives.
[mucho deletio]
>> So, where was OpenServer getting its copies of the tables
>> from? I refuse to believe the the Compaq Proliant SCSI system
>> is intelligent enough to store stuff like that. Anyone?
>After reading this over and over, I think this is what happened:
>You unmounted your 9gig and, without shutting down, put in the
>72gb drive.
>The partition table and the divvy info were (quite naturally)
>in cache, so whem you ran "mkdev hd", the 9 GB data was picked
>up and then written to the new disk at your request. You then
>rebooted, but of course that was too late.
>I suppose the controller cache could have added to this also- I don't
>know if it is smart enough to flush when a drive is replaced.
Sure sounds like that is the answer.
>Morale: hot swap should ONLY be used for RAID systems. For non-raid,
>shut the darn thing off when replacing drives.
I suspect that it would work on the Unixware and beyond systems
which do tend to notice changes in the SCSI bus, or on systems
where you can force a SCSI bus rescan. OSR5's roots are deep enough
that you do have to reboot anytime you change anything to make sure
it is all correct.
--
Bill Vermillion - bv @ wjv . com
> Nachman Yaakov Ziskind wrote:
Sounds about right...
I would blame the OpenServer buffer cache entirely, on the theory that
the cache attached to a hot-swap-capable controller should _know_ about
hot swapping. That is, if you pull one drive and stick in another, the
controller should already have invalidated every block it was holding in
cache which came from the old drive. But the kernel knows nothing about
this unless the host adapter driver tells it (and even if it tries to
tell the kernel, the kernel isn't very hip to hot-swap -- a host adapter
driver could probably get things mostly invalidated, but it would have
to try hard.)
From Nachman's description, it sounds like the disk parameter table,
fdisk table, divvy table, and at least a good part of the inode table
got flushed out of memory onto the new disk. Otherwise, after the
reboot, it wouldn't have been able to get as far as fsck being annoyed.
I'm surprised to hear that so many parts of the chain were written back
to disk. Some of those, especially the disk parameter, fdisk & divvy
tables, are quite static and wouldn't normally be written. Perhaps they
were written _because_ Nachman was fiddling around with `mkdev hd`.
Bottom line: do not expect to be able to hot-swap drives in OpenServer
unless you are using a host adapter driver which is documented as
positively supporting this operation under OpenServer. If it talks
about hot-swap under various OSes and doesn't specifically discuss OSR5,
be suspicious. In general, I would expect hardware RAID implementations
to fully support hot-swap; non-RAID setups to not support it unless the
doc really clearly says so.
>Bela<
> Brian K. White enscribed:
> | boot up, and at the boot: prompt enter "DEFBOOTSTR biosgeom"
> | enter the root passwd to go into single-user mode instead of booting
> | all the way up.
>
> And at this point, your method becomes superflous.
>
> He has SCSI, SCSI disks. The bios doesn't care or know about the
> geometry of scsi disks.
You're both wrong... "biosgeom" is a dangerous little piece of fire
that OpenServer offers the unwary admin. It unfortunately overrides the
geometry of _EVERY_ hard disk in the system, even ones that were long
since successfully configured. Every hard disk refers back to some set
of BIOS parameters that the kernel digs up from somewhere that I
probably couldn't explain even if I spent the next 8 hours studying
kernel source.
This may be OK if you said "biosgeom" every time you added a disk, over
the system's history (Brian's statement implies that he does this). On
a system which has not been grown in this manner, typing "biosgeom" when
trying to add a new disk is likely to cause all the old disks to become
inaccessible (until rebooted without "biosgeom" -- which should be
seconds later, since it probably won't boot successfully).
What's wanted is a way to say "refer to BIOS for geometry of specific
disk #N", but it does not exist. And, I'll repeat, exactly what
"biosgeom" means for any particular drive is almost impossible to
understand, much less explain.
>Bela<
> So, last night, while working from home, feeling dreary, pondering Tom's
> admonition to make sure I was looking at the right drive (and darn it, I WAS
> looking at the right drive), and then I saw it:
>
> Current Hard Disk Drive: /dev/rdsk/3s0
>
> +-------------+----------+-----------+---------+---------+---------+
> | Partition | Status | Type | Start | End | Size |
> +-------------+----------+-----------+---------+---------+---------+
> | 1 | Active | UNIX | 1 | 281774 | 281774 |
> +-------------+----------+-----------+---------+---------+---------+
>
> Total disk size: 2258025 tracks (256 reserved for masterboot and diagnostics)
>
> At which point, I noticed the "Total disk size:" DUH! Perhaps the 'tracks' it
> is counting is not the same as the Start/End figures? I did some feverish
> reading on the fdisk man page and decided that, no, tracks are tracks, and
> fdisk is only looking at 12% of the drive. I selected "Use Entire
> Disk for UNIX", and (after a scary warning) got:
>
> | 1 | Active | UNIX | 1 | 2257769 | 2257769 |
During the hot-swap episode, you caused a copy of the old disk's fdisk
table to be written to the new disk. You did _not_ do anything to
overwrite the new disk's proper geometry. fdisk tables are expressed in
linear block numbers; in effect you laid down a table that gave
partition #1 exactly as much space as it had on the old disk (and thus
only a fraction of the new disk).
(Actually, for SCSI disks, OpenServer only cares about the
"sectors/track" and "heads" geometry parameters; it determines the
"tracks" parameter by dividing total disk size by those two. So you
could not have "damaged" the total disk size by writing a wrong
parameter table.)
> Which looked MUCH better. So, letting the dice roll, I ran divvy, and got the
> same division table as before, with one small difference:
>
> 71111722 1K blocks for divisions, 8001 1K blocks reserved for the system
>
> Which I picked up on right away. :-) I thought I'd just expand the last block
> of the first filesystem to 71111722, umount and mount, and away I'd go.
>
> But no, the filesystem stayed at 9gig in df -vk.
Changing the boundaries of an HTFS filesystem doesn't have any effect on
its internal structures, which tell it that it ends at the old boundary.
If you have a house and you buy the vacant lot next door, it doesn't
make your house any bigger -- you would have to build an addition or rip
it down and build a bigger one. John DuBois has a utility for building
that addition, at ftp://ftp.armory.com/pub/admin/chfssize (danger: this
is a utility for People Who Know What They Are Doing And Have Made Good
Backups). Otherwise you tear down and rebuild, as you did.
> After more head scratching, I
> decided to remove all the first division entirely, and reinput the numbers.
What you wanted was divvy's "c[reate] Create a new file system on
this division." choice. You probably used that, since changing
boundaries does not automatically prompt divvy to create a filesystem.
It didn't occur to you to just use "c[reate]", but it was obvious to you
that after moving the boundaries you needed to do that.
> Much happier! Divvy declared:
>
> Making Filesystems
>
> and mkfs sat there for a most satisfying five or ten minutes, and I was now up
> to 73gig, Woo-hoo!
Yay!
>Bela<
By the by, does it matter that the Compaq driver has *some* intelligence? To
be specific, when a drive is pulled/inserted, I get messages:
NOTICE: cled: A ProLiant drive has been installed ht=cha ha=0 id=5 (cledN05)
NOTICE: cled: A ProLiant drive has been removed ht=cha ha=0 id=5 (cledN06)
This is what gave me the confidence to try and hot-swap in the first place.
Ok, a driver message does not mean that the kernel is cognizant of what's
going on, but why would they (Compaq) program such a message if hot-swapping
were dangerous? Plus, Compaq is rumored to have a special (technology)
relationship with SCO - hence the SmartStart stuff, etc.
NYZ
Because it ISN'T and wasn't dangerous. Nothing was damaged, your
machine didn't catch fire, crash or even hiccup. The hot swap worked-
but it's the WRONG thing to do when it isn't RAID.
> Nachman Yaakov Ziskind wrote:
> > By the by, does it matter that the Compaq driver has *some* intelligence? To
> > be specific, when a drive is pulled/inserted, I get messages:
> >
> > NOTICE: cled: A ProLiant drive has been installed ht=cha ha=0 id=5 (cledN05)
> > NOTICE: cled: A ProLiant drive has been removed ht=cha ha=0 id=5 (cledN06)
> >
> > This is what gave me the confidence to try and hot-swap in the first place.
> > Ok, a driver message does not mean that the kernel is cognizant of what's
> > going on, but why would they (Compaq) program such a message if hot-swapping
> > were dangerous?
>
> Because it ISN'T and wasn't dangerous. Nothing was damaged, your
> machine didn't catch fire, crash or even hiccup. The hot swap worked-
> but it's the WRONG thing to do when it isn't RAID.
Depends on the admin's assumptions. He could have inserted, not a new
blank drive, but a drive that was already full of precious data,
expecting to be able to mount and use its filesystems. Apparently the
driver knows about hot-inserts but still does not cause dirty buffers to
be invalidated -- pretty scary.
There is probably documentation for the controller that says what you're
supposed to do here -- things like "if you pull one drive and insert
another, all data on the new drive should be considered destroyed, and
you must run the operating system's drive partitioning software as if
starting from a new blank drive"...
>Bela<