I have a client that has a SCO 5.0.5 box that is getting the error in
the subject of this post.
The drive having the problem is connected to a QLogic SCSI PCI host
bus. I have installed an IDE drive with SCO and it boots up just fine
and my plan was to run fsck or whatever to get the drive in trouble
going again.
I've added the QLogic HBA BTLD to the kernel running on the IDE drive
and the messed up drive with mkdev hd. I named the partitions with
divvy (didn't make any changes to the partitions, filesystems etc. as
to not mess up existing data) ... couldn't tell what the original
names were as nothing showed up in most of the slots but two or three
and I didn't recognize what any of them meant (nothing like boot, u,
root etc.) so I just named them part0 thru part7 in hopes that I could
mount each one individually and figure out what they were by looking
at them.
It kind of sounds like a filesystem is out of space but now I'm having
trouble mounting partitions to look at them.
I'm getting pretty rusty on SCO as I haven't used it since 2001.
When I did the mkdev hd portion, it did a non-destructive block scan
and found 3 bad blocks but that didn't fix the problem (I think a
filesystem is full).
The scsi drive is hd1a. A few of the partitions divvy showed had HTFS
filesystems. What device name can I use to try to mount these to /mnt
so I can look at them and try to find root and free some space? What
are the dump files called so I can smoke a few of them to free up
space so I can hopefully find the reason they are being generated in
the first place?
At the Boot : prompt, can I specify dump to be a bigger size or not to
do dumps?
I'm running out of ideas.
Thanks!
Brian
> nothing like boot, u, root etc.
The defaults are d1050,51,52 and the last is d1057all when you add
drives. On an initial install they are boot, swap, root, and recover.
You won't be able to rename them directly since the names are already
in use with the new drive.
> It kind of sounds like a filesystem is out of space.
> I think a filesystem is full.
Yep.
> The scsi drive is hd1a. A few of the partitions divvy showed had HTFS
> filesystems.
Once you know what the names are on the hard drive (per #1) and
verified by looking for the same name in /dev you can mount via the
standard mount command "mount /dev/name /mountpoint". Or, if they
won't mount, you should be able to run fsck.
I wish I'd known about this, or understood it better, when I first encountered
SCO OpenServer. In fact, I'd encourage OpenServer uses to name their drives by
some consistent scheme when they first encounter SCO. Coupled with the SCO
package tendency, and SCO's built-in tendency, to semi-randomly install
software package as symlinks to packages in /opt, and you have a very
confusing layout.
It really makes me miss the wonderful and helpful 'fdisk -l' command of the
Linux and other UNIX worlds for finding all your hard drives.
Here is what I would suggest, connect everything back to the way it
was and do this:
Turn on and do CTRL-D to get into single user mode and try this, I had
done it so I know it should work, now remember during this time you
will get out of disk space message right on top of what you are
typing, just ignore it and keep on going, once you get a root (#)
sign, type cd /tmp and press ENTER once you are in /tmp double check
and triple check by typing (pwd) type rem *.* and press ENTER, depands
on what is in your tmp folder it might take a while but finally you
will get a root prompt (#), at this point you will notice that the out
of disk space message will not appear, now look around and remove un-
wanted file/folders, you may not be able to find anything you an
safely delete if so that means the disk is really full and you need to
add or upgrade your hard disk. If that didn't work email me at
(ak...@att.net) and we will explore other options.
Abid
This will list your configured drives, with partitions, etc.:
ftp://ftp.armory.com/pub/admin/divisions
help page:
ftp://ftp.armory.com/pub/admin/help_pages/divisions
John
--
John DuBois spc...@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
First thing: What was the SCSI controller originally used on the 5.0.5 system
when the SCSI drive was the root drive? If it is the QLogic drive, you should
be okay. If the QLogic is a new SCSI controller and the drive was on
some other SCSI controller, be warned that controller manufactures have
different ideas on managing SCSI drives spare sectors. And that idea varies
from model to mode for the same manufacturer.
Case in point: I moved a drive from an old 3.2v4.2 UNIX system on an
Adaptec 1542CF drive to a new system with a 2940U2 controller (a running
system so the moved drive was hd10 just as you are doing) and some
files copied from the drive to the new system's root disk were corrupted.
Sum -r on the 3.2v4.2 disk running in the old system on the 1542 were
different on SOME files when the disk was connected to the 2940 as hd10.
The solution was to move the 1542CF with the disk to the new system
temporarily and then transfer the files to the system disk on the 2940.
>
> I've added the QLogic HBA BTLD to the kernel running on the IDE drive
> and the messed up drive with mkdev hd. I named the partitions with
> divvy (didn't make any changes to the partitions, filesystems etc. as
> to not mess up existing data) ... couldn't tell what the original
> names were as nothing showed up in most of the slots but two or three
> and I didn't recognize what any of them meant (nothing like boot, u,
> root etc.) so I just named them part0 thru part7 in hopes that I could
> mount each one individually and figure out what they were by looking
> at them.
Sounds okay so far. If the SCSI disk is from a 5.0.5 system, the active
fdisk partition will be the boot partition with the root file system.
Division 0 is boot, division 1 is swap and division 2 is root. Past 2
anything goes (name wise). So you should be able to mount /dev/part0
on /mnt with the command: "mount /dev/part0 /mnt." I suggest that you
add "-r" for read only on the mount command until you know what you are
looking at (file system wise).
>
> It kind of sounds like a filesystem is out of space but now I'm having
> trouble mounting partitions to look at them.
>
> I'm getting pretty rusty on SCO as I haven't used it since 2001.
>
> When I did the mkdev hd portion, it did a non-destructive block scan
> and found 3 bad blocks but that didn't fix the problem (I think a
> filesystem is full).
That would not be my first choice given the limited information you presented
above (original SCSI controller? etc...) But what's done is done. So
now we have to get beyond the problem.
The "cannot dump to dumpdev hd(1/41)..." is not the problem but a symptom.
What your subject infers is that the system is panicking and when it
tries to dump memory, it can't.
First: If you have bad tracks in the swap space hd(1/41) the first time
the kernel tries to read a block and gets an "un-recoverable SCSI read
error" reading the block the system will panic. This is a safety feature.
When swap is corrupted, the kernel can't guarantee that data swapped in
or out is uncorrupted and so it panics the system to shut it down and avoid
potentially writing corrupted data to file.
Unlike unreadable tracks in the root or other file systems, unreadable
tracks in the swap will not be logged to /usr/adm/syslog and you won't
be warned about them.
The only fix for unreadable tracks in the swap that works is to run the SCSI
controller's "verify" function to spare out bad tracks at the controller level
before the OS sees them. All Adaptec controllers have a verify function.
Look for it on your QLogic (I don't use them so I can't tell you if it exists
on QLogic).
Second: The error "cannot dump to dumpdev..." can be eliminated by telling
the kernel to not dump memory on a kernel panic. This is done by adding
"dumpdev=none" to the defbootstr either by booting with defbootstr dumpdev=none
at the Boot: prompt or by editing boot partition /etc/default/boot file and
inserting it in the defbootstr line:
vi /stand/etc/default/boot: <- example only! It gets over written from
/etc/default/boot on /etc/shutdown.
#ScoAdminInit BOOTMNT {RO RW NO} RO
#
DEFBOOTSTR=hd(40)unix swap=hd(41) dumpdev=none root=hd(42) hd=Sdsk
AUTOBOOT=YES
Back in the "old days" when swap was 1.5 * system RAM, it was okay to
dump to hd(41). But since migrating from 3.2v4.2 with 128-256M
system RAM to SCO 5.0.5 with 1-2G system RAM where the goal is to prevent
any use of swap for moving parts of running jobs from memory to swap, the
swap partition is commonly set to some fraction of system memory just for
grins and the DEFBOOTSTR dump=hd(41) is changed to dumpdev=none.
>
> The scsi drive is hd1a. A few of the partitions divvy showed had HTFS
> filesystems. What device name can I use to try to mount these to /mnt
> so I can look at them and try to find root and free some space?
As you indicated that you named them part0, part1, part2, etc...:
mount /dev/part0 /mnt
mount /dev/part1 /mnt <- Note that this is the swap partition and you
can't mount it or fsck it.
What
> are the dump files called so I can smoke a few of them to free up
> space so I can hopefully find the reason they are being generated in
> the first place?
In answer to the above question, "dump files" don't exist in the context
of your question as the memory dump is written to the swap space
on the active UNIX partition.
>
> At the Boot : prompt, can I specify dump to be a bigger size or not to
> do dumps?
No to bigger size. And if it were possible, that in itself would not
stop the system from trying to write the memory dump on kernel panic.
See also man bootstring:
> For example, to dump to a SCSI tape drive, you might use
> dump=Stp(0). To dump to the same drive without prompting,
> you would use dump=Stp(0,1). Such a drive should be
> considered a dedicated dump device, since a data tape in
> the drive would be overwritten in the event of a panic.
Maybe to "not do dumps": defbootstr dumpdev=none However, I don't know if
this will override the dump=hd(41) in the boot file system /etc/default/boot
file.
>
> I'm running out of ideas.
Why don't you post the divvy table you see when you run "divvy /dev/hd10"
so that we can see what you see.
Note that if you have more then one UNIX partition on the disk then you
need to run divvy for each partition:
fdisk -f /dev/rhd00 <- Change this to fdisk -f /dev/rhd10 for your disk
Current Hard Disk Drive: /dev/rhd00
+-------------+----------+-----------+---------+---------+---------+
| Partition | Status | Type | Start | End | Size |
+-------------+----------+-----------+---------+---------+---------+
| 1 | Active | UNIX | 1020 | 285471 | 284452 |
| 2 | Inactive | UNIX | 288472 | 857376 | 568905 |
| 3 | Inactive | UNIX | 857377 | 1137553 | 280177 |
| 4 | Inactive | DOS (16) | 1 | 1019 | 1019 |
+-------------+----------+-----------+---------+---------+---------+
divvy /dev/hd01 for partition 1 on disk 0 /dev/hd11 for partition 1 on disk1
divvy /dev/hd02 for partition 2 on disk 0 /dev/hd12 for partition 2 on disk 1
etc...
John Dubois wrote:
> This will list your configured drives, with partitions, etc.:
> ftp://ftp.armory.com/pub/admin/divisions
Thanks John! I tried it and it sure works slick!!
>
> Thanks!
>
> Brian
>
>
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
Since the swap space isn't a file system it wouldn't enforce any other
changes.
Just curious if anyone would consider it an option. The problem may
already be fixed so this is more an exercise in expanding
troubleshooting procedures.
I don't know that that is a valid assessment. How does the kernel decide
how much space is available and how many pages that space will hold?
What caused the kernel to compute "space for only 0 pages"?
> would be curious whether the system would come back up with the
> original names reinstalled and the swap space starting point moved up
> from where it was when it failed.
The original names need not be reinstalled as they were never changed.
The OP gave names to the file systems on the hd10 disk in the /dev
directory of the hd00 (boot) disk. Once the hd10 disk SCSI controller
was set as the boot device, the unchanged names in the /dev directory
on that disk would be appear in divvy once again.
>
> Since the swap space isn't a file system it wouldn't enforce any other
> changes.
>
> Just curious if anyone would consider it an option. The problem may
> already be fixed so this is more an exercise in expanding
> troubleshooting procedures.
>
>
It's all academic and hard to speculate on this particular case without
the OP posting the divvy table.
However, one big stumbling block is the OP's inability to accurately identify
the swap division. Changing the starting block of a valid file system
by mistake would likely be disastrous. Fsck -n /dev/part0, ...part1, ...part2,
etc., should identify valid file systems. when he hits the swap it should
be similar to running fsck -n /dev/swap:
# fsck /dev/swap
fsck: cannot determine filesystem type of /dev/swap.
If there is an unreadable track in the swap space, it should become apparent
by just using "dd if=/dev/part1 of=/dev/null bs=63b" from the booted system
on the IDE disk. Dd does not care what is in the file system and if a
bad block is encountered, it will be displayed on the system consol and
logged to /usr/adm/syslog. The 63b is taken from hwconfig -h for the
hard disk showing 255 hds, 63 sectors.
Assuming that your opening statement is valid, and
just picking numbers at random: If the the swap was
| swap | NON FS | no | 1 | 30722| 816753|
I'd change it to:
| swap | NON FS | no | 1 | 430722| 816753|
Dropping approximately 50% of the swap space from the beginning of
its original range then try to boot the disk.
We are not trying to get the system to run with only 50% of its original
swap space. Just trying to prevent a kernel panic if their is a unreadable
track at or near the beginning of the swap space.
If that gets the disk to boot, I'd then back it up and use the backup
to move the system to a new hard disk. A low level format by the SCSI
controller on the original disk followed by restoring the backup, might
resolve the problem but a new disk would be the best way to go.
> If there is an unreadable track in the swap space, it should become apparent
> by just using "dd if=/dev/part1 of=/dev/null bs=63b" from the booted system
> on the IDE disk. Dd does not care what is in the file system and if a
> bad block is encountered, it will be displayed on the system consol and
> logged to /usr/adm/syslog. The 63b is taken from hwconfig -h for the
> hard disk showing 255 hds, 63 sectors.
>
> Assuming that your opening statement is valid, and
> just picking numbers at random: If the the swap was
>
> | swap | NON FS | no | 1 | 30722| 816753|
>
>
> I'd change it to:
>
> | swap | NON FS | no | 1 | 430722| 816753|
>
> Dropping approximately 50% of the swap space from the beginning of
> its original range then try to boot the disk.
>
> We are not trying to get the system to run with only 50% of its original
> swap space. Just trying to prevent a kernel panic if their is a unreadable
> track at or near the beginning of the swap space.
I'm pretty sure that OSR5 neither writes nor reads a single sector
to/from the swap device until (a) it starts swapping or (b) it tries to
write a panic dump. If a panic reports 0 blocks of dump space, this
means that dumpdev was set to a nonsensical value (a major/minor number
that isn't even a disk); or, most likely, points to a partition or
division slot number which hasn't been created. That is, if the disk
has two partitions and dumpdev is pointing to the third (nonexistent)
partition.
The default value of swapdev & dumpdev is 1,41, i.e. "first drive unit
of 'hd' driver, active partition, division #1" (and 'hd' driver is
hooked to "wd" == IDE if there are any IDE hard disks in the system,
else SCSI. Or IDA or PS/2 ESDI or PS/2 ST506 or a few other even more
obscure drivers, none of which are in play...) Even if an active
partition exists on drive 0, if its division #1 slot is empty, you'll
get that message.
It's impossible for the dump device to be "full" of old dumps. Each
kernel dump starts at the beginning of the dump device and overwrites
what's already there.
If dumpdev is smaller than needed by the dump, the dump aborts before
writing anything (even if it's only 1K short). The message in that case
would be a pretty obvious "Cannot dump 65536 pages to dumpdev hd (1/41):
Space for only 65535 pages".
> However, one big stumbling block is the OP's inability to accurately identify
> the swap division. Changing the starting block of a valid file system
> by mistake would likely be disastrous. Fsck -n /dev/part0, ...part1, ...part2,
> etc., should identify valid file systems. when he hits the swap it should
> be similar to running fsck -n /dev/swap:
>
> # fsck /dev/swap
> fsck: cannot determine filesystem type of /dev/swap.
Don't do that. Your text says "fsck -n" but your example shows just
"fsck". If it does detect /dev/swap as some sort of filesystem, who
knows what it's going to do.
But most of all, don't do that because:
# dtype /dev/swap
# fstyp /dev/swap
are better and safer ways of doing it.
===
Nobody has addressed the subtext of the original poster's issue. If
it's saying "Cannot dump to dumpdev ...", it's doing that because the
system's panicking. Whether it can write a dump is an irrelevant
distraction. The important thing is, why is it panicking?
Before the "Cannot dump" message there should be some panic messages.
Post those. Try to duplicate every character exactly, precision is
important. Also describe how it gets there: does it panic every time at
bootup, randomly while the system's been up for a while, only when it's
under heavy load, only when printing to a printer, or what?
Bcc'd original poster in case he's not reading the news. Post response
to comp.unix.sco.misc.
>Bela<
What are these files? Shell scripts or what?
> Sounds okay so far. If the SCSI disk is from a 5.0.5 system, the active
> fdisk partition will be the boot partition with the root file system.
> Division 0 is boot, division 1 is swap and division 2 is root. Past 2
> anything goes (name wise). So you should be able to mount /dev/part0
> on /mnt with the command: "mount /dev/part0 /mnt." I suggest that you
> add "-r" for read only on the mount command until you know what you are
> looking at (file system wise).
I was able to mount /dev/part0 to /mnt and I did do it read only.
I looks to me that my /dev/part0 division is the boot partition/
division.
The two HTFS divisions have messed up filesystems ... these are /dev/
part4 & /dev/part5
When I do a fsck -ofull on these, they both say something like
"WARNING the filesystem is larger than it container" or something like
that (I'm at work and not at the machine in question). Both
filesystems give this warning and the reported size of the filesystem
is about twice what it says the division is.
When asked to continue I said no.
> The "cannot dump to dumpdev hd(1/41)..." is not the problem but a symptom.
> What your subject infers is that the system is panicking and when it
> tries to dump memory, it can't.
Not sure what is causing it to panic. I think the hard disk my have
problems as when I did the mkdev hd it complained about bad block
found etc.
>
> First: If you have bad tracks in the swap space hd(1/41) the first time
> the kernel tries to read a block and gets an "un-recoverable SCSI read
> error" reading the block the system will panic. This is a safety feature.
> When swap is corrupted, the kernel can't guarantee that data swapped in
> or out is uncorrupted and so it panics the system to shut it down and avoid
> potentially writing corrupted data to file.
>
> Unlike unreadable tracks in the root or other file systems, unreadable
> tracks in the swap will not be logged to /usr/adm/syslog and you won't
> be warned about them.
>
> The only fix for unreadable tracks in the swap that works is to run the SCSI
> controller's "verify" function to spare out bad tracks at the controller level
> before the OS sees them. All Adaptec controllers have a verify function.
> Look for it on your QLogic (I don't use them so I can't tell you if it exists
> on QLogic).
There is an option to go into some BIOS type SCSI menu before the OS
boots. Is this what you are talking about?
>
> Second: The error "cannot dump to dumpdev..." can be eliminated by telling
> the kernel to not dump memory on a kernel panic. This is done by adding
> "dumpdev=none" to the defbootstr either by booting with defbootstr dumpdev=none
> at the Boot: prompt or by editing boot partition /etc/default/boot file and
> inserting it in the defbootstr line:
>
> vi /stand/etc/default/boot: <- example only! It gets over written from
> /etc/default/boot on /etc/shutdown.
OK, that is what I wanted to try but didn't know the syntax or how to
do it from the Boot : prompt since I don't get far enough to login
before the panic.
>
> #ScoAdminInit BOOTMNT {RO RW NO} RO
> #
> DEFBOOTSTR=hd(40)unix swap=hd(41) dumpdev=none root=hd(42) hd=Sdsk
> AUTOBOOT=YES
>
> Back in the "old days" when swap was 1.5 * system RAM, it was okay to
> dump to hd(41). But since migrating from 3.2v4.2 with 128-256M
> system RAM to SCO 5.0.5 with 1-2G system RAM where the goal is to prevent
> any use of swap for moving parts of running jobs from memory to swap, the
> swap partition is commonly set to some fraction of system memory just for
> grins and the DEFBOOTSTR dump=hd(41) is changed to dumpdev=none.
>
>
>
> > The scsi drive is hd1a. A few of the partitions divvy showed had HTFS
> > filesystems. What device name can I use to try to mount these to /mnt
> > so I can look at them and try to find root and free some space?
>
> As you indicated that you named them part0, part1, part2, etc...:
I'm able to do that now. What was freaking me out is I 'd like to
find a inittab or something that shows what the layout of the
divisions/partitions originally was.
Shouldn't I be able to grep the /dev/part4 or /dev/part5 for the
existance of this file and maybe cat it so I can now what to properly
name things with divvy?
>
> Why don't you post the divvy table you see when you run "divvy /dev/hd10"
> so that we can see what you see.
>
> Note that if you have more then one UNIX partition on the disk then you
> need to run divvy for each partition:
No, there is only one partition. I'll try to post the divvy output
sometime.
If I recall, I only see partition 4 which has all the things I named
part0-part7 mentioned earlier.
The guy that worked on it before me mirrored the drive to a new SCSI.
I suspect that it is going into panic due to the drive or filesystem
being corrupt.
Should I run fsck -ofull on the filesystems using the new SCSI drive
that contains the image of the bad one and see what happens? I figure
I'll get the same warning telling me the filesystem is larger than the
container etc.
I have some tapes that I have just started investigating. My problem
is trying to figure out what they tapes are, how they were made (dd,
cpio, tar etc.) and what to do with them. This is where my thinking
gets messed up because right now I have part0-part7 divvvy names and I
wondering if I do a restore from tape which fs to I tell it to write
from the tape to?
> >Bela<
Thanks for all the responses! Sorry it took me a while to respond ...
been traveling.
Regards,
Brian
'divisions' is an awk program (which in turn invokes a few bits of shell code).
I figured out how the tapes I have were made with dtype /dev/rStp0. One
was made with tar (compressed tar archives) and the other with cpio
(using the scoadmin util I'm told).
The actual message displayed when I try to boot off the SCSI disk is:
PANIC: srmountfn Error 22 mounting rootdev hd (1/42)
cannot dump 32671 pages to dumpdev hd (1/41): space for only 0 pages
Dump not completed
This happens every time and information displayed just before it is:
H iinit
... then it gives the Safe to Power Off or Reboot prompt
Another question that I have is what is in the restore division? Is
that something I can use to try and recover from?
Hope maybe that sheds more light on the subject.
Thanks for the suggestions! I appreciate it!
Regards,
Brian
If you have to ask that, then they aren't intended for you, so don't worry about it.
May sound rude, but it's about like a bus driver asking what those funny lever things are on the floor in front of his chair. The only possible answer is "get out of that chair".
--
Brian K. White br...@aljex.com http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
Tell us more about this "mirrored the drive to a new SCSI"
Is the old drive available? Did the old drive panic? How was the
mirror accomplished?
We can return to your other questions once you provide answers to the
above questions.
>
> Should I run fsck -ofull on the filesystems using the new SCSI drive
> that contains the image of the bad one and see what happens? I figure
> I'll get the same warning telling me the filesystem is larger than the
> container etc.
>
> I have some tapes that I have just started investigating. My problem
> is trying to figure out what they tapes are, how they were made (dd,
> cpio, tar etc.) and what to do with them. This is where my thinking
> gets messed up because right now I have part0-part7 divvvy names and I
> wondering if I do a restore from tape which fs to I tell it to write
> from the tape to?
Stop. DO NOT ATTEMPT TO RESTORE FROM TAPE TO ANY FILE SYSTEM ON THE
SCSI DISK. If you feel you must restore from tape, get a new SCSI disk,
run mkdev hd and create file systems on the new disk then restore to those.
If you get lucky and one of the tapes has a full system backup of the root
file system from the original SCSI disk, restore it the new SCSI disk and
boot that disk. Then investigate the remaining tapes.
The only reason to fool with a disk is if there is important data on the
disk that is not backed up to tape or other storage and has to be recovered.
In another post, you indicated that the only partitions that fsck complains
about is part4 and part5. The root partition is part2 and if it passes fsck
ok, then you can mount it and explore its contents.
Likely any SCSI disk you buy today will be two to 4 times larger then the
original SCSI disk. Set up a minimum a boot, swap, and root partition on
the new disk with sizes estimated to be twice the size of the old disk
divisions. Mount both the disk you now have and the new SCSI disk on suitable
mount points: mount /dev/part0 /mnt, mkdir /mnt1, mount /dev/(new disk part0) /mnt1.
then cd /mnt, execute find . -depth -print | cpio -pmvd /mnt1 to copy
part0 on the old SCSI disk to part0 on the new disk. do the same for part2.
At that point you will have a boot and root partition on the new disk. you must
make the disk bootabl and write a suitable boot track to the disk. DOS fdisk /mbr
is one way followed by:
dd if=/etc/hdboot0 of=/dev/hd2a
dd if=/etc/hdboot1 of=/dev/hd2a bs=1k seek=1
Note that hd2a above assumes the new SCSI disk mounted on the SCSI controller.
If any of the above is confusing, don't attempt it. get professional help from
someone in your area.
>>> Bela<
>
> Thanks for all the responses! Sorry it took me a while to respond ...
> been traveling.
>
> Regards,
>
> Brian
>
>
--
Not when they're still sitting in the cardboard box, not yet installed or even
ordered from the catalog! Be fair to bus drivers!
Not when they're still sitting in the cardboard box, not yet installed or even
When I'm doing something dangerous things to someone's machine that
controls high dollar equipment, I tend to be cautious and ask questions.
Sorry my questions are stupid to you.
>>
>> The guy that worked on it before me mirrored the drive to a new SCSI.
>> I suspect that it is going into panic due to the drive or filesystem
>> being corrupt.
>
>
> Tell us more about this "mirrored the drive to a new SCSI"
>
> Is the old drive available? Did the old drive panic? How was the
> mirror accomplished?
>
> We can return to your other questions once you provide answers to the
> above questions.
>
I'm not sure what he used ... I'll ask and find out. The brand new SCSI
disk does the exact same thing the old one did. Divvy looks the same,
the same 3 bad blocks were complained about when I added the disk with
mkdev hd the first time, when I do fsck -ofull on /dev/part4 and
/dev/part5 they both have the same warning about the filesystem being
larger than the filesystem it is currently in etc. (which I exit out of
and don't continue) so at this point it looks like a good copy of the
original disk.
I haven't pulled the trigger on doing anything destructive to this disk
until I've found all the parts to the puzzle and know what my options are.
/dev/part2 says there is no filesystem.
I think root was what I call /dev/part5 as when I mount it all it has is
a lost+found directory with tons of filenames that are big numbers for
the filename. I'll have to try this again as I don't remember, but I
think I grepped /dev/part5 for something that made me think it was root.
>
>>
>
> Stop. DO NOT ATTEMPT TO RESTORE FROM TAPE TO ANY FILE SYSTEM ON THE
> SCSI DISK. If you feel you must restore from tape, get a new SCSI disk,
> run mkdev hd and create file systems on the new disk then restore to those.
> If you get lucky and one of the tapes has a full system backup of the root
> file system from the original SCSI disk, restore it the new SCSI disk and
> boot that disk. Then investigate the remaining tapes.
At this point I'm just figuring out what I have to work with. This box
controls a huge mail sorting machine and apparently the SCO box went
down right after they bought it. It came with a few tapes and no media
.. no boot floppies, no rescue CD, nothing. So I'm trying to figure out
what is on the tapes and what the filesystems on the disk are so I can
try and match up what I see on the tape with where it should go on the disk.
All I have done so far is look.
>
> The only reason to fool with a disk is if there is important data on the
> disk that is not backed up to tape or other storage and has to be
> recovered.
>
> In another post, you indicated that the only partitions that fsck complains
> about is part4 and part5. The root partition is part2 and if it passes fsck
> ok, then you can mount it and explore its contents.
I'm not so sure since divvy reports no fs on /dev/part2.
Does SCO not have a file like Linux's inittab that shows the devices and
where they are mounted? I look at the inittab of the IDE drive I have
that works and it doesn't help me in trying to figure out where to look
on the messed up SCSI drive to get a clue as to the original filesystem
layout. I thought I'd see entries for root, boot etc. but I don't.
I don't remember if I mentioned this or not but I used to administer a
SCO box back in 2001 but it has been too long ago and I've forgotten
most of what I knew about SCO ... now it is Linux and Solaris I work on.
Bela will likely chime in here, but I don't think it is possible to install
SCO 5.0.5 with root anywhere but in part2. By default 0 is boot, 1 is swap,
and 2 is root. Unless this is SCO UNIX 3.2v4.2 (pre SCO 5.0.0) in which
case 0 is root and 1 is still swap.
>
>>
>>>
>
>>
>> Stop. DO NOT ATTEMPT TO RESTORE FROM TAPE TO ANY FILE SYSTEM ON THE
>> SCSI DISK. If you feel you must restore from tape, get a new SCSI disk,
>> run mkdev hd and create file systems on the new disk then restore to
>> those.
>> If you get lucky and one of the tapes has a full system backup of the
>> root
>> file system from the original SCSI disk, restore it the new SCSI disk and
>> boot that disk. Then investigate the remaining tapes.
> At this point I'm just figuring out what I have to work with. This box
> controls a huge mail sorting machine and apparently the SCO box went
> down right after they bought it.
That's a whole different ball game. I was called to an AMF bowling center
where they had a very customized SCO Xenix system running the pin setters.
It was totally unlike any SCO I'd worked on.
Bought new from the manufacturer? or used? If new, then you should call the
manufacturer for technical support.
Since it is dedicated to controlling a machine, I doubt that there is any
critical data on the SCO disk. Check with the Manufacturer, make sure you
have all disks, licenses, tapes?, etc. to re-install SCO and whatever software
was provided to control the mail sorter, then re-install on your IDE disk or
a new SCSI disk then have the client contract with the manufacturer to set
it up as needed.
It came with a few tapes and no media
> .. no boot floppies, no rescue CD, nothing. So I'm trying to figure out
> what is on the tapes and what the filesystems on the disk are so I can
> try and match up what I see on the tape with where it should go on the
> disk.
>
> All I have done so far is look.
>>
>> The only reason to fool with a disk is if there is important data on the
>> disk that is not backed up to tape or other storage and has to be
>> recovered.
>>
>> In another post, you indicated that the only partitions that fsck
>> complains
>> about is part4 and part5. The root partition is part2 and if it passes
>> fsck
>> ok, then you can mount it and explore its contents.
> I'm not so sure since divvy reports no fs on /dev/part2.
>
> Does SCO not have a file like Linux's inittab that shows the devices and
> where they are mounted? I look at the inittab of the IDE drive I have
> that works and it doesn't help me in trying to figure out where to look
> on the messed up SCSI drive to get a clue as to the original filesystem
> layout. I thought I'd see entries for root, boot etc. but I don't.
The /etc/default/filesys is the mount table consulted on boot up to mount
file systems. All you will get from that is the /dev/name_of_part_X and
the mount point. There is no information on what disk or partition the
file system is located on. That's all encoded by the major/minor numbers
for the /dev/file_system_name entry. There is nothing there that will tell
you the correct size of the file system.
>
> I don't remember if I mentioned this or not but I used to administer a
> SCO box back in 2001 but it has been too long ago and I've forgotten
> most of what I knew about SCO ... now it is Linux and Solaris I work on.
>
>
In another post you wrote: PANIC: srmountfn Error 22 mounting rootdev hd (1/42)
cannot dump 32671 pages to dumpdev hd (1/41): space for only 0 pages
This error is indicative of a corrupted super block on the root file system.
(Note that 1/42 *IS* part2 on your disk.)
See: http://unix.derkeiler.com/Newsgroups/comp.unix.sco.misc/2004-09/0273.html
Or search Google Groups with fsdb srmountfn
In that post, I detail how to recover from srmountfn on a Xenix file system.
since I have not been successful in using the fsdb on HTFS file systems,
I can't say if the information is the same or not.
The 0 pages evidently is a standard part of the panic message and
doesn't mean anything.
And notice the date part. It may mean nothing.
What is the result of divvy /dev/part0? Or even /dev/hd10. Should
show the divvy table. My 0 is EAFS and 20K blocks, 1 is nonfs and 64K
blocks, 2 is HTFS and roughly 2M blocks. Default assigned boot, swap,
and root.
With the H iinit error, is it safe to do a fsck on the root, whatever
it is presently named?
That is good to know. I'll grep my /dev/part2 device and see if I can
smoke out anything that would point to root being there.
The IT guy at the printing place found a page in one of the manuals
with this breakdown which supports what you said:
0 boot EAFS
1 swap NON FS
2 root HTFS
3 eng HTFS
4 eng2 HTFS
5 eng3 HTFS
6 recover NON FS
7 d1057all Whole Disk
When I did the mkdev hd, I remember seeing divvy show eng3 and the
whole disk division and maybe boot but nothing else. This has been
the challenge is figuring out what the original divvy table looked
like so I can then either try repair those filesystems or restore them
from tape (no documentation on what the tapes are so it is like
working a puzzle).
>
> That's a whole different ball game. I was called to an AMF bowling center
> where they had a very customized SCO Xenix system running the pin setters.
> It was totally unlike any SCO I'd worked on.
>
> Bought new from the manufacturer? or used? If new, then you should call the
> manufacturer for technical support.
It is 1993/1994 Bell & Howell equipment that was acquired used ... the
original company no longer exists. The IT guy at the printing place
that bought it tried to work on it as well as a technician that
services these. They then posted to our local Linux Users group to
find Unix guru's and that is how I got involved. I've never been
faced with quite this kind of puzzle before with SCO (dealt with
similar things with Sun & Linux though). It appears it worked for a
while and would get to the login prompt but not now. There is a tape
of what I think is the root filesystem.
>
> > Does SCO not have a file like Linux's inittab that shows the devices and
> > where they are mounted? I look at the inittab of the IDE drive I have
> > that works and it doesn't help me in trying to figure out where to look
> > on the messed up SCSI drive to get a clue as to the original filesystem
> > layout. I thought I'd see entries for root, boot etc. but I don't.
>
> The /etc/default/filesys is the mount table consulted on boot up to mount
> file systems. All you will get from that is the /dev/name_of_part_X and
> the mount point. There is no information on what disk or partition the
> file system is located on. That's all encoded by the major/minor numbers
> for the /dev/file_system_name entry. There is nothing there that will tell
> you the correct size of the file system.
>
>
>
> > I don't remember if I mentioned this or not but I used to administer a
> > SCO box back in 2001 but it has been too long ago and I've forgotten
> > most of what I knew about SCO ... now it is Linux and Solaris I work on.
>
> In another post you wrote: PANIC: srmountfn Error 22 mounting rootdev hd (1/42)
> cannot dump 32671 pages to dumpdev hd (1/41): space for only 0 pages
>
> This error is indicative of a corrupted super block on the root file system.
> (Note that 1/42 *IS* part2 on your disk.)
>
> See:http://unix.derkeiler.com/Newsgroups/comp.unix.sco.misc/2004-09/0273....
>
> Or search Google Groups with fsdb srmountfn
>
> In that post, I detail how to recover from srmountfn on a Xenix file system.
> since I have not been successful in using the fsdb on HTFS file systems,
> I can't say if the information is the same or not.
Outstanding! Thanks. I've found online SCO Companion books too so
I've started to hit those.
I wonder if any of the HTFS versioning stuff can be leveraged in
situations like this ... still reading.
I'm going to image the good SCSI to another disk (I'm sure I have one
around here somewhere) so I can start to try things as I don't want to
mess with that one (it's the baseline) and I don't trust the original
SCSI as it probably is failing which is what lead to all this.
>
> The 0 pages evidently is a standard part of the panic message and
> doesn't mean anything.
>
> And notice the date part. It may mean nothing.
>
> What is the result of divvy /dev/part0? Or even /dev/hd10. Should
> show the divvy table. My 0 is EAFS and 20K blocks, 1 is nonfs and 64K
> blocks, 2 is HTFS and roughly 2M blocks. Default assigned boot, swap,
> and root.
>
> With the H iinit error, is it safe to do a fsck on the root, whatever
> it is presently named?
I did a fsck -ofull on /dev/part0 (boot) and it was happy. I don't
remember trying it on /dev/part2 as I thought divvy reported NON FS.
I need to image the disk I'm using as my baseline to another drive so
I can try more things.
Thanks!
Brian
Did divvy show the start and end blocks for the unnamed file systems?
(Please post the divvy table you see. Second request!) If not, you have
a big problem as trying to find the start of a division is not
a trivial matter.
In another post you said:
> I'm not sure what he used ... I'll ask and find out. The brand new SCSI
> disk does the exact same thing the old one did. Divvy looks the same,
> the same 3 bad blocks were complained about when I added the disk with
> mkdev hd the first time, when I do fsck -ofull on /dev/part4 and /dev/part5
> they both have the same warning about the filesystem being larger than the
> filesystem it is currently in etc. (which I exit out of and don't continue)
> so at this point it looks like a good copy of the original disk.
The warning that the file system is larger then the space allocated
should not be fatal. Fsck is warning you that you should take steps to
correct the problem. I've see that before when a client used Microlite Backup
Edge to move his system from an 9G disk to an 18G disk and answered "percentage"
when asked by the RE2 software how he wanted the partitions resized to fit the
new disk: "Size" keeps the fdisk partition and divvy file system the same size
as the original disk, "percentage" expands the fdisk partition and divvy file systems
as it can to utilize the additional disk space on the new disk.
Go ahead and run "fsck -n -ofull /dev/part3 > /tmp/logfsck 2>&1" That will not
alter the file system and log its results to /tmp/logfsck. Check to see what
fsck tells you about the file system.
Note that even if the file system is not "clean" you can mount it with -r (read only)
and create a current backup to tape. If fsck throws a lot of errors beyond:
"UNREF FILE I=inode-number OWNER=UID MODE=file-mode SIZE=file-size
MTIME=modification-time (RECONNECT)" you might not want to trust what you can read
from the read-only file system. If logfsck shows only minor problems, go ahead and
run fsck without the -n and let it fix what it can.
With all the information you have provided I'd suggest the following
sequence:
1) restore the suspected root backup tape to a file system (not the root)
on the IDE disk you are using to try to mount the problem disk. Use
dtype /dev/rStp0 to check the format of the tape (cpio, tar, etc.) Pray
that it is not tar as tar is inadequate for backing up the root file system
as it will not backup /dev nodes.
2) Figure out how old the root back up is. 1993-95? This century? You decide if
you can trust the backup to have all the information you need to replicate
the running system to a new disk.
3) Check the crontab on the restored root to see if you can identify any
scheduled backup that might have been set up. If you're lucky, they included
one of the supertar products to perform the backup. If it's a home brew
backup, check the script and see if it logs any information about the disk
layout in the backup log.
(No help to you but when I used to use my cpio script to backup systems
I always included the output of dfspace in the backup log:
> Start CPIO tape write: Thu Oct 07 12:29:22 2002
>
> / : Disk space: 1538.45 MB of 3678.40 MB available (41.82%).
> /app1 : Disk space: 2987.78 MB of 6718.58 MB available (44.47%).
> /app2 : Disk space: 2455.14 MB of 6718.58 MB available (36.54%).
> /stand : Disk space: 27.77 MB of 39.99 MB available (69.43%).
>
> Total Disk Space: 7009.15 MB of 17155.58 MB available (40.86%).
>
> Root Disk Division:
> 0: boot 8033 48991
> 1: swap 48992 345950
> 2: root 345951 4112636
> 7: hd0a 0 4120671
The dfspace information would at least tell you the sizes of the
various mounted file system so that you can experiment with creating
file systems of the same size to get the block size of the file systems.
I have since moved all my clients to Microlite Backup Edge which creates
RE2 ISO image with all that information recorded for you.)
4) Failing to find anything useful from /usr/spool/cron/crontabs/root, look around
and see if you can find any usable information to help you.
5) If you can dope out the original size of the root file system (I'd use
whatever divvy indicates for the start and end block for division 2),
create a new file system the same size on the IDE disk and then use:
"dd if=/dev/correct_file_system of=/tmp/sb bs=1k count=1"
to grab the superblock from the just created file system and then
write it to the /dev/part2 (damaged root file system):
"dd if=/tmp/sb of=/dev/part2 bs=1k" (Keep a copy of the original /dev/part2
superblock before you do this so if you get it wrong, you can go back and
try again.) Then run fsck on the /dev/part2 file system (use -n until you
see what fsck says).
NOTE: I have never done this on an HTFS file system. It may or may not work.
--
> Bela will likely chime in here, but I don't think it is possible to install
> SCO 5.0.5 with root anywhere but in part2. By default 0 is boot, 1 is swap,
> and 2 is root. Unless this is SCO UNIX 3.2v4.2 (pre SCO 5.0.0) in which
> case 0 is root and 1 is still swap.
I don't really enjoy being "invoked" like that...
It's possible to install OSR5 onto any division of any partition. To
install anywhere other than the default division-2-of-active-partition
requires hackery that few people know; the original poster's system
probably isn't that strange.
I haven't read the very latest posts on this thread yet, but so far it
looks like everyone is missing the probable cause here. It looks like
the drive's perceived geometry has changed. The OP should first look at
his live system, at the files /usr/adm/hwconfig and /usr/adm/syslog, to
get a feel for what these look like. For instance:
$ grep cyls /usr/adm/hwconfig
%disk 0x01F0-0x01F7 14 - type=W0/0 unit=0 cyls=60321 hds=255 secs=127
%Sdsk - - - cyls=4462 hds=255 secs=63 fts=stdb
%Sdsk - - - cyls=17849 hds=255 secs=63 fts=stdb
%disk 0x01F0-0x01F7 14 - type=W0/1 unit=1 cyls=60801 hds=255 secs=63
This system has two SCSI and two IDE drives, thus the two formats.
Next:
$ strings /usr/adm/hwconfig | grep cyls
14 - type=W0/0 unit=0 cyls=60321 hds=255 secs=127
- - cyls=4462 hds=255 secs=63 fts=stdb
- - cyls=17849 hds=255 secs=63 fts=stdb
14 - type=W0/1 unit=1 cyls=60801 hds=255 secs=63
Notice that `strings` cuts of the "%what" part -- this is because it's
separated by a TAB char, and `strings` thinks TAB isn't a printable
char. Now you can do:
# strings /dev/hd20 | grep cyls=
(where hd20 is the old drive's whole-disk device node), and you'll
likely get a _lot_ of output, some of which will be the original
geometry of the drive. Other parts will be various sorts of noise, so
it's up to you to separate the wheat from the chaff. If there were
multiple drives on the old system, you'll see several sets of geometry
to choose from. Multiply (cyls * hds * secs * 512) to get the size in
bytes; for the above 4, I get:
1,000,189,739,520 (~1TB)
36,701,199,360 (~36GB)
146,813,022,720 (~146GB)
500,105,249,280 (~500GB)
Calculate this and pick out which size correctly applies to the make &
model of the old drive.
Now look at /usr/adm/hwconfig on the new system. What is it giving for
the old drive's geometry?
If it matches your discovered geometry from the old drive, I'm wrong,
this isn't a geometry problem. Report that and we have a new basis for
further discovery.
If it doesn't match, it's a geometry problem. Then we need to think
about how to fix it.
Two classes of geometry problem. One is a case where the "hds" and
"secs" values match, but the "cyls" is much too low. This happens when
you have a drive larger than is supported by the new HBA driver. For
instance, a previous version of the LSI Logic 53c1030 driver, "lsil",
couldn't handle drives larger than 64GiB. It saw drive sizes as (actual
size mod 64GiB), so a 100GiB drive showed up as 36GiB, etc. The fix for
this is (1) update to the newest version of the driver for that HBA
(that fixes the "lsil" case); (2) if no newer driver exists, throw out
the HBA, get one with a working driver.
The other class of geometry problem is where the numbers just don't
match at all. Look at the 36GB drive:
%Sdsk - - - cyls=4462 hds=255 secs=63 fts=stdb
Many HBAs like a geometry of 64 heads, 32 sectors/track. They would
show this same drive as:
%Sdsk - - - cyls=35000 hds=64 secs=32 fts=stdb
(probably 35001 cylinders). In such a case, what you have to do is
"stamp" the drive with the correct (original) geometry so OSR5 will know
how to find stuff on it. This used to be very easy, you would just:
# dparam -w /dev/rhd20 # where /dev/rhd20 is the drive in question
# dparam /dev/rhd20 `dparam /dev/rhd20`
It's still that easy unless your new system is OSR507. 507 shipped with
a broken masterboot which makes this more difficult. I believe that's
been fixed along the way, so if it's 507, update the new system with
OSR507MP5 before proceeding. Then do the above commands.
After "stamping", reboot, then go back to `divvy` and see if the
filesystem sizes & types make any more sense. Run `dtype /dev/part1`
on each division's device node; do those filesystem types make sense?
Finally, run `fsck -n -o full /dev/part1` on each of the divisions that
looks like a mountable filesystem type. Do they still whine about wrong
sizes?
Ideally you should be doing all of this on a copy of the original data,
mirrored onto another drive of the same (or larger) size. You can do
the mirroring with something like Ghost or simply by:
# dd if=/dev/hd20 of=/dev/hd30 bs=64k
where hd20 & hd30 are the whole-disk device nodes for two drives. This
command is dangerous: you need to be 150% certain that /dev/hd30 is
really the trashable new empty drive!
BTW, nothing prevents you from using an IDE/SATA drive as the mirror.
The OSR5 "wd" driver tends to choose different geometries than many of
the SCSI HBA drivers, but this is irrelevant since you will be stamping
the new drive with the original drive's geometry.
What we refer to as "disk geometry" these days has nothing to do with
the actual number of cylinders, heads & sectors/track of the drive, it's
just an accounting trick to help the OS keep track of where things are
on the drive. For a new drive, the only requirements for the chosen
geometry are that it fit within constraints (<256 heads, <256 sectors,
<65536 cylinders) -- and that it multiply out to the actual size of the
drive.
That last requirement is actually almost never met, since the drive
is unlikely to have exactly [some number < 256] * [some number < 256]
* [some number < 65536] total sectors. The geometry must multiply
out to the drive's exact size _or less_, which is what almost always
happens. For a drive that already has partitions and divisions on it,
the requirement is: geometry Must Not Change even if you move the drive
from one HBA to another, or move the logical contents of the drive to
another drive using an image/mirroring technology like Ghost or `dd`.
OSR5 tries to enforce the Must Not Change clause by stamping SCSI
drives at install time. Three things go wrong with this: it doesn't
stamp IDE drives; stamping of SCSI drives was added relatively late
(506?); and the broken masterboot in 507 prevents the stamping from
working. And possibly a 4th problem (I'm not sure about this one): I'm
sure it tries to stamp the root drive at install time (subject to the
other 3 constraints), but I'm not too sure about drives added after
installation. Not sure if `mkdev hd` also tries to stamp new drives.
Starting with OSR507MP1 or MP2 or so, a new utility `geotab` stamps
drives with a new kind of geometry stamp which is supposed to be more
resiliant to movement between HBAs and/or imaging to different drive
types. Again, I forget whether it includes a tweaked `mkdev hd` script
that does this newfangled stamping to newly installed drives. (I should
remember since I invented the new stamp, wrote the utility that
implements it, and wrote the script that stamps all existing drives
during the installation of the supplement that adds `geotab`. But it's
been long enough that I can't say for sure, even though I know it
_should_ have tweaked `mkdev hd`...)
(Ok, I downloaded the newest wd BTLD and I still don't know the answer.
It replaces one of the mkdev scripts (.scsi), but the new version
doesn't mention either `dparam` or `geotab`. But it's also possible
that the _kernel_ stamps a geotab onto new drives. If not, something
probably should -- probably `mkdev hd`)
Hmmm, I've written a tome, will bcc to Tony Lawrence...
>Bela<
>
> Hmmm, I've written a tome, will bcc to Tony Lawrence...
>
> >Bela<
Thanks. I posted that at http://aplawrence.com/Bofcusm/2662.html but
probably wouldn't have noticed it if you hadn't done the cc. My SCO
work has dwindled away to almost nothing and I seldom visit this
group.
--
Tony Lawrence
http://aplawrence.com