SCO 5.0.7 system lock-up on attempting to add USB hard drive

Steve M. Fabac, Jr.

unread,

Nov 4, 2009, 1:05:58 AM11/4/09

to

(Resent without image attachment.)

The last Microlite newsletter mentioned using USB hard drive as a backup
resource.

Since I have client's with 5.0.7 and backing up to REV drive taking 8 to 10 hrs,
I thought I'd try the USB route on my 5.0.7 before attempting on the client's.

My system is 5.0.7 with MP5 on a SuperMicro X5DL8-GG system with Adaptec 2010S ZRC
RAID in RAID-0.

||* BackupEDGE for OpenServer 5 (ver 02.03.01) ||
|| SCO OpenServer Enterprise System (ver 5.0.7Hw) ||
|| SCO OpenServer Release 5.0.7 Graphics and NIC Drivers (ver 5.0.7Aa) ||
|| SCO Skunkware 2000 (ver 2000.1) ||
|| SCO Symmetrical Multiprocessing (ver 1.1.1Hw) ||
|| SCO VisionFS (ver 3.10.905) ||
|| patchck - package management tool (ver 09102702) ||
|| OSS672A - OpenServer 5 Network Header File Update (ver 1.0.0) ||
|| OSS674A - SCO Symmetrical Multiprocessing Update (ver 1.0.0) ||
|| SCO OpenServer Release 5.0.7 Maintenance Pack 5 (ver 1.0.0Mc)

I purchased a 320G SimpleTech BlackCherry USB hard drive and connected it to the
USB ports (The blackCherry has a usb cable with two connectors, possibly to spread
the power draw across two USB ports) on the system board.

I followed the steps outlined in the BackupEdge manual and the Microlite Newsletter
and succeeded in creating a file system on the USB drive. When I ran fdisk, I had
to delete the NTFS file system pre-configured on the drive and I specified only
half of the drive for the UNIX partition.

Wow, everything took a long time to complete: Divvy/mkfs from 12:40 to 02:57,
2 hrs 17 minutes to complete.

fsck -ofull /dev/backups started at 06:23 and finished at 08:37 2 hrs 17 minutes.

I ran a full backup to the USB drive and it took over 12 hours to complete.
(I've lost the backup report as I have restored the root file system twice
from DVD media using a root-only backup created before attempting the addition
of the usb hard drive, and had not printed it out.)

While the backup was running, I noticed that logging into the system was slow and
very sluggish. Dual Xeon with SCO MPX and hyper threading enabled.

Long story short: The USB ports on the X5DL8-GG system board are USB 1.0. 12MBits
and not USB 2.0.

I purchased a StarTech PCI625USB2I 6-port USB 2.0 card and installed in in
my system.

I was able to see the USB hard drive plugged into the StarTech ports and able to
list and restore files from the full system backup created when the card was
installed on the USB 1.0 ports.

However, when I ran a full system backup to the USB disk on the 2.0 ports the
backup hung while writing at about 1G in to the first backup segment and
further login to the system on the console or via network just hung. Logged
in screens appeared to run but when the job running finished (sar -d 5 50),
any typed command hung. The disk activity led on the USB hard drive was solid
on.

Thus begin two days of removing and re-installing the USB hard drive. But the
system hangs running mkfs (and before that, fsck -ofull) on the USB drive.

hard reset is required to reboot after the lock-up. Run divvy /dev/hd10 and
"c" to create a new file system runs for a short while then then box locks
up. Sar -d 5 50 on another screen shows the time when the system locks but
sar continues to run until stopped or if allowed to reach the count:

22:34:29 Sdsk-1 100.00 1.00 333.27 10664.54 0.00 48.00

22:34:34 Sdsk-0 2.79 1.00 4.98 15.14 0.00 5.60
Sdsk-1 100.00 1.00 333.27 10664.54 0.00 47.98

22:34:39 Sdsk-1 100.00 1.00 333.47 10670.92 0.00 48.00

22:34:44 Sdsk-0 0.20 1.00 1.79 3.98 0.00 1.11
Sdsk-1 100.00 1.00 333.27 10664.54 0.00 47.99

22:34:49 Sdsk-1 100.00 1.00 333.27 10664.54 0.00 48.00

22:34:54 Sdsk-1 100.00 1.00 21.71 694.82 0.00 47.98

22:34:59
22:35:04
22:35:09
22:35:14
22:35:19
22:35:24
22:35:29
22:35:34

mpstat in another logged in telnet session continues to show minimum CPU activity
with counts changing once each second. If I quit mpstat, it won't restart.

I edited /etc/conf/sdevice.d/scodb and enabled scodb. The next time I tried
to run divvy on the USB and create a new file system, the system locked up
as usual. The usb hard drive led is on solid. I then pulled the usb cable and
got a system panic trap E as shown in the attached Gif screen image.

I'm about ready to re-install 5.0.7 on the root and start over.

Anyone with experience with PCI USB 2.0 cards that can recommend one that might
be more compatible with SCO?

# usbprobe -A
Path - Address - Description
----------------------------
0 - 1 - Hub "EHCI Root Hub"
4 - 2 - Mass Storage "SimpleTech SimpleDrive PS"
0 - 1 - Hub "OHCI Root Hub"
0 - 1 - Hub "OHCI Root Hub"
0 - 1 - Hub "OHCI Root Hub"
0 - 1 - Hub "OHCI Root Hub"

# fdisk -f /dev/rhd10

1. Display Partition Table
2. Use Entire Disk for UNIX
3. Use Rest of Disk for UNIX
4. Create UNIX Partition
5. Activate Partition
6. Delete Partition
7. Create Partition

Current Hard Disk Drive: /dev/rhd10

+-------------+----------+-----------+---------+---------+---------+
| Partition | Status | Type | Start | End | Size |
+-------------+----------+-----------+---------+---------+---------+
| 1 | Active | UNIX | 4000001 | 9922559 | 5922559 |
| 4 | Inactive | OS/2 | 1 | 4000000 | 4000000 |
+-------------+----------+-----------+---------+---------+---------+

Total disk size: 9922815 tracks (256 reserved for masterboot and diagnostics)

Press <Return> to continue
# divvy /dev/hd10
+-------------------+------------+--------+---+-------------+------------+
| Name | Type | New FS | # | First Block | Last Block |
+-------------------+------------+--------+---+-------------+------------+
| backups | NON FS | no | 0 | 0| 186554811|
| | NOT USED | no | 1 | -| -|
| | NOT USED | no | 2 | -| -|
| | NOT USED | no | 3 | -| -|
| | NOT USED | no | 4 | -| -|
| | NOT USED | no | 5 | -| -|
| | NOT USED | no | 6 | -| -|
| hd1a | WHOLE DISK | no | 7 | 0| 186560607|
+-------------------+------------+--------+---+-------------+------------+
186554812 1K blocks for divisions, 5796 1K blocks reserved for the system

I've attached the BlackCherry USB hard drive to a Windows XP system, formated the
non-UNIX partition as NTFS and successfully backed up the Windows XP system to the
USB hard drive.
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670

Pat Welch

unread,

Nov 4, 2009, 9:01:09 PM11/4/09

to

Wow.

Maybe the pain would be reduced by using Edge to FTP the backup to
another system (Linux or Windows) with an FTP server, and attach the USB
drive to that system?

And set the FTP login to point to the USB drive.

Would be pretty fast over a 1GBs link and not bad at 100MBs.

--
----------------------------------------------------
Pat Welch, UBB Computer Services, a WCS Affiliate
SCO Authorized Partner
Microlite BackupEdge Certified Reseller
Unix/Linux/Windows/Hardware Sales/Support
(209) 745-1401 Cell: (209) 251-9120
E-mail: pat...@inreach.com
----------------------------------------------------

Nico Kadel-Garcia

unread,

Nov 4, 2009, 9:25:59 PM11/4/09

to

Or rsnapshot, which is vastly more efficient than merely duplicating
material and can preserve hard links and file ownership more
effectively. I've used precisely this sort of thing.

mbennett

unread,

Nov 5, 2009, 10:27:17 AM11/5/09

to

Steve,

I recently obtained a Tandberg RDX drive to test as a backup device
which could be used on nearly any O/S. I communicated with Microlite
support about this, and they indicated that the devices would work
fine on Linux (true) but they couldn't recommend them for SCO because
of issues such as you described. You might contact Microlite directly
and see if they can point you in a better direction.

I like the RDX technology, and it does work reliably so far on either
a Linux or Windows system, with an external USB connection. But for
your specific situation I lean in the direction that Pat is
recommending, to just make an ftp backup with Edge. It works great, I
use it every day on my 5.0.7 system.

Mark

Steve M. Fabac, Jr.

unread,

Nov 5, 2009, 1:59:13 AM11/5/09

to

>> card and installed it in my system.

>>
>> I was able to see the USB hard drive plugged
>> into the StarTech ports and able to list and
>> restore files from the full system backup created
>> when the card was installed on the USB 1.0 ports.
>>
>> However, when I ran a full system backup to the
>> USB disk on the 2.0 ports the backup hung while writing
>> at about 1G in to the first backup segment and
>> further login to the system on the console or via
>> network just hung. Logged in screens appeared to run but
>> when the job running finished (sar -d 5 50),
>> any typed command hung. The disk activity led on
>> the USB hard drive was solid on.
>>
>> Thus begin two days of removing and re-installing
>> the USB hard drive. But the system hangs running mkfs
>> (and before that, fsck -ofull) on the USB drive.
>>
>> hard reset is required to reboot after the lock-up.
>> Run divvy /dev/hd10 and "c" to create a new file

>> system runs for a short while and then then box

>> locks up. Sar -d 5 50 on another screen shows the
>> time when the system locks but sar continues to run
>> until stopped or if allowed to reach the count:
>>

>> mpstat in another logged in telnet session continues
>> to show minimum CPU activity with counts changing
>> once each second. If I quit mpstat, it won't
>> restart.
>>
>> I edited /etc/conf/sdevice.d/scodb and enabled scodb.
>> The next time I tried to run divvy on the USB and
>> create a new file system, the system locked up
>> as usual. The usb hard drive led is on solid.
>> I then pulled the usb cable and got a system panic
>> trap E as shown in the attached Gif screen image.

The image was deleted because I was unable to see it in
the first post. What it said was:

# s <-- typing s [enter] < mistyped ls >
/bin/ksh: s: not found <-- ksh still running
# s
/bin/ksh: s: not found
# w <-- typing w locks up
ss <-- typing "ss"
s <-- typing "s"
<-- [enter]
<-- [enter]
<-- [enter]
<pulled the USB cable at this point>

CPU2: Unexpected trap in kernel mode:
CPU2: cr0 0x8001003B cr2 0x00000000 cr3 0x1468B000 tlb 0x00000000
CPU2: ss 0x00000000 uesp 0x00000000 efl 0x00010202 ipl 0x00000000
CPU2: cs 0x00000158 eip 0xF00CCE6B err 0x00000000 trap 0x0000000E
CPU2: eax 0x00000000 ecx 0x00000202 edx 0xF0857B20 ebx 0xF2425A1C
CPU2: esp 0xE0000A98 ebp 0xE0000AB4 esi 0xF0857B20 edi 0xF1F9A80C
CPU2: ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000
CPU2: cpu 0x00000002
CPU2:
PANIC: k_trap - Kernel mode trap type 0x0000000E

debug0:1> s
udi_channel_event_ind+F movl %eax,(%eax)_

>> I'm about ready to re-install 5.0.7 on the root and start over.
>>
>> Anyone with experience with PCI USB 2.0 cards that can recommend one
>> that might
>> be more compatible with SCO?
>>
>> # usbprobe -A
>> Path - Address - Description
>> ----------------------------
>> 0 - 1 - Hub "EHCI Root Hub"
>> 4 - 2 - Mass Storage "SimpleTech SimpleDrive PS"
>> 0 - 1 - Hub "OHCI Root Hub"
>> 0 - 1 - Hub "OHCI Root Hub"
>> 0 - 1 - Hub "OHCI Root Hub"
>> 0 - 1 - Hub "OHCI Root Hub"
>>
>> # fdisk -f /dev/rhd10
>>

Pat,

Good point on the level of pain involved in getting this
to work. It's obviously not a cake walk to add USB hard
drive to a 3 - 4 year old system where the system board
only provides USB 1.0 and where installing 5.0.7 can be
made unreliable just by failing to remove the automatically
installed SCO licensing update prior to installing MP5
(This system had the licensing patch and MP3 prior to
my having to remove them to install MP5 for the latest
USB updates.)

The FTP route is the only solution for the clients
still running 5.0.6 as 5.0.6 does not have
USB 2.0 compatibility.

I have used FTP backup for a client where they wanted
to run incremental backup at 10:00, 14:00, 17:00 and 23:00
with a full backup to FTP at 01:00 and a full backup to
DVD-RAM at 03:30

My ideal configuration would be seven 80G USB hard drives
that the user would swap out daily just like they previously
did with the REV media. That will allow the drives to
be rotated off-site for safe storage.

That sounds expensive but at $80 for each USB hard drive,
they are more economical then $60 for a 35G REV cartridge
and $370 for a REV drive (with one cart).

I have seen REV drives go bad and REV cartridges after
100+ media uses start to give problems on verify.

I tried a simple experiment of changing the size of the
file system on the BlackCherry drive to 10G and it formatted
without locking up the system.

Steve M. Fabac, Jr.

unread,

Nov 5, 2009, 2:15:31 AM11/5/09

to

Steve M. Fabac, Jr. wrote:

> (Resent without image attachment.)
>
> The last Microlite newsletter mentioned using USB hard drive as a backup resource.
>
>
> Since I have client's with 5.0.7 and backing up to REV drive taking 8 to 10 hrs, I
> thought I'd try the USB route on my 5.0.7 before attempting on the client's.
>
> My system is 5.0.7 with MP5 on a SuperMicro X5DL8-GG system with Adaptec 2010S ZRC
> RAID in RAID-0.
>
> ||* BackupEDGE for OpenServer 5 (ver 02.03.01) ||
> || SCO OpenServer Enterprise System (ver 5.0.7Hw) ||
> || SCO OpenServer Release 5.0.7 Graphics and NIC Drivers (ver 5.0.7Aa) ||
> || SCO Skunkware 2000 (ver 2000.1) ||
> || SCO Symmetrical Multiprocessing (ver 1.1.1Hw) ||
> || SCO VisionFS (ver 3.10.905) ||
> || patchck - package management tool (ver 09102702) ||
> || OSS672A - OpenServer 5 Network Header File Update (ver 1.0.0) ||
> || OSS674A - SCO Symmetrical Multiprocessing Update (ver 1.0.0) ||
> || SCO OpenServer Release 5.0.7 Maintenance Pack 5 (ver 1.0.0Mc)
>
> I purchased a 320G SimpleTech BlackCherry USB hard drive and connected it to the
> USB ports (The blackCherry has a usb cable with two connectors, possibly to spread
> the power draw across two USB ports) on the system board.
>

Update:

I purchased two more USB drives one is a Seagate 160G FreeAgent Go
(ST901603FGA2E1-RK) and a Buffalo Ministation Cobalt 320G
(Model HD-PE320U2/BK) and two new USB 2.0 PCI cards: Iogear GIC251U,
and a Vantec UGT-PC210.

I connected the Seagate drive to the StarTech PCI625USB2I previously
used with the BlackCherry drive.

I was able to perform mkdev hd and delete the existing NTFS partition
and select "use the whole disk for UNIX." The configuration ran
smoothly and the file system was created within 5 minutes without
locking up the machine.

With Backup Edge set to use the disk, I ran a backup and verify and
the verify failed:

SUMMARY - BACKUP
Serial Number = TCBxxxxxxxx
Date = Wed Nov 04 05:14:57 2009
Files Encountered = 383590
Total Data = 40.57GB
Data Written = 16.92GB
Volume Left = 69.71MB
Segments Used = 17
SW Compression = 61%
> Elapsed Time = 02:59:17
Data Transfer Speed = 5.879 GB/hr
= 100.344 MB/min
= 1753646 bytes/sec
Relative Speed = 14.093 GB/hr
= 240.540 MB/min
= 4203745 bytes/sec
Exit Status = 0
--------------------------------------------------------------------------
./u/OLDTBL/hi_req.idx.Z, 76768 blocks <!Byte> <!File Unreadable> <!Bad Checksum
edge: Directory Not In Proper Format! (./u/OLDTBL/hi_req.idx.Z)
./u/OLDTBL/hi_req1.idx.Z, 76703 blocks <!Byte> <!File Unreadable> <!Bad Checksu
edge: Directory Not In Proper Format! (./u/OLDTBL/hi_req1.idx.Z)
./u/OLDTBL/hi_isd1.idx.Z, 42769 blocks <!Byte> <!File Unreadable> <!Bad Checksu
edge: Directory Not In Proper Format! (./u/OLDTBL/hi_isd1.idx.Z)
./usr/lib/edge/tmp/edgesco5.elf, 0 blocks (compressed) <!Byte> <!File Unreadabl
edge: Directory Not In Proper Format! (./usr/lib/edge/tmp/edgesco5.elf)
./usr/lib/drivers/visionfs/vision3.1/VOL.000.003, 0 blocks (compressed) <!Byte>
edge: Directory Not In Proper Format! (./usr/lib/drivers/visionfs/vision3.1/VOL.
(More Messages Follow)
SUMMARY - BYTE-BY-BYTE VERIFICATION
Serial Number = TCBxxxxxxxxx
Date = Wed Nov 04 06:43:21 2009
Segments Used = 17
Data Read = 16.93GB
> Elapsed Time = 01:28:22
Data Transfer Speed = 11.973 GB/hr
= 204.348 MB/min
= 3571249 bytes/sec
Files Encountered = 383590
Files Excluded = 3
Files Modified = 28
Files Not Checked = 2
Special Files = 60307
Verified Successfully = 323178
> FAILED Verification = 69
Change Log = /usr/lib/edge/lists/simple_job//changedfiles_master.log
WARNING = Verification FAILED
Exit Status = 14
Time Reading Volume 0 = 01:28:22
Total Verify Time = 01:28:22
Verification Failed - Bit-level Compare Error!
At: Listing / Verify Failed
Summary: BACKUP_PASS/VERIFY_FAIL (mpp:simple_job_master)
[End of Summary]

Backup completed Wed Nov 04 06:44:22 2009 5:29:19

The backup time is a little faster then the 35G REV I'm
replacing and the Verify time is much faster except for the
errors.

I executed badtrk -f /dev/rdsk/1sa to see if there are bad tracks
on the disk. It has been running for 4+ hours and has reached only
29636629 blocks in the scan (9.4% of 312560639 blocks):

Scanning block 29714095, 4294967292 % complete, bad block count = 0
Scan interrupted ..

No bad blocks found.

User Tty Login@ Idle JCPU PCPU What
root ttyp0 12:43am 4:25 19:51 3:43 badtrk -f /dev/rdsk/1sa

At the time I aborted badtrk sar -d 5 50 was showing:

16:17:59 device %busy avque r+w/s blks/s avwait avserv (-d)
16:18:04 Sdsk-1 99.60 1.00 332.60 332.60 0.00 2.99

Average Sdsk-1 99.92 1.00 333.31 333.31 0.00 3.00

Starting badtrk again and selecting "2. Scan a specified range of blocks"
I see:

scsi version = 4
vendor = Seagate
product = FreeAgent Go

1. Scan entire UNIX partition
2. Scan a specified range of blocks

Enter your choice or q to quit: 2

This device spans blocks 63 to 312560639

Enter start block number or q to quit:

Selecting 2 and specifying 65 to 500000 and then running
sar -d 5 5 shows:

SCO_SV mpp 3.2v5.0.7 Xeon 11/04/2009

17:34:06 device %busy avque r+w/s blks/s avwait avserv (-d)
17:34:11 Sdsk-1 99.80 1.00 249.50 15968.19 0.00 4.00

17:34:16 Sdsk-1 100.00 1.00 250.50 16031.94 0.00 4.00

17:34:21 Sdsk-1 99.80 1.00 249.50 15968.19 0.00 4.00

17:34:26 Sdsk-0 6.39 1.03 6.59 15.17 0.30 9.70
Sdsk-1 100.00 1.00 250.50 16031.94 0.00 4.00

17:34:31 Sdsk-1 99.80 1.00 249.50 15968.19 0.00 4.00

Average Sdsk-0 1.27 1.03 1.31 3.03 0.30 9.70
Sdsk-1 99.96 1.00 249.90 15993.63 0.00 4.00

Immediately upon running badtrk with start block and end block
set as shown:

Enter your choice or q to quit: 2

This device spans blocks 63 to 312560639

Enter start block number or q to quit: 312000000

Enter end block number or q to quit: 312560639

Select type of scan.

1. Thorough scan (6 Mbytes/min approx)
2. Quick scan (18 Mbytes/min approx)

Enter your choice or q to quit: 2

Do you want this to be a destructive scan ? (y/n) n

sar -d 5 5 shows:

17:55:45 device %busy avque r+w/s blks/s avwait avserv (-d)
17:55:50 Sdsk-1 100.00 1.00 333.27 333.27 0.00 3.00

17:55:55 Sdsk-1 100.00 1.00 333.27 333.27 0.00 3.00

17:56:00 Sdsk-1 100.00 1.00 333.47 333.47 0.00 3.00

17:56:05 Sdsk-1 99.80 1.00 332.60 332.60 0.00 3.00

17:56:10 Sdsk-1 100.00 1.00 333.93 333.93 0.00 3.00

Average Sdsk-1 100.00 1.00 333.31 333.31 0.00 3.00

At 48% complete in the above badtrk scan:

# w; sar -d 5 2
6:07pm up 17:31, 5 users, load average: 1.41, 1.22, 0.97
User Tty Login@ Idle JCPU PCPU What
root ttyp0 12:43am 12 19:59 8 badtrk -f /dev/rdsk/1sa
root ttyp1 12:44am - 2 - w
root ttyp2 1:20am 1:39 1 - /bin/ksh
root ttyp3 1:19am 16:48 - - mpstat
smf ttyp4 8:48am 9:19 - - -sh

SCO_SV mpp 3.2v5.0.7 Xeon 11/04/2009

18:07:40 device %busy avque r+w/s blks/s avwait avserv (-d)
18:07:45 Sdsk-1 100.00 1.00 332.27 332.27 0.00 3.01

18:07:50 Sdsk-1 100.00 1.00 333.47 333.47 0.00 3.00

Average Sdsk-1 100.00 1.00 332.87 332.87 0.00 3.00
#

So at 12 minutes badtrk had reached 48% of the
500,000+ block range

Ok, so now a confusing result:

Enter your choice or q to quit: 2

This device spans blocks 63 to 312560639

Enter start block number or q to quit: 312000001

Enter end block number or q to quit: 312560639

Select type of scan.

1. Thorough scan (6 Mbytes/min approx)
2. Quick scan (18 Mbytes/min approx)

Enter your choice or q to quit: 2

Do you want this to be a destructive scan ? (y/n) n

# sar -d 5 6

SCO_SV mpp 3.2v5.0.7 Xeon 11/04/2009

21:48:29 device %busy avque r+w/s blks/s avwait avserv (-d)
21:48:34 Sdsk-1 45.51 1.00 100.60 6438.32 0.00 4.52

21:48:39 Sdsk-1 99.80 1.00 249.50 15968.19 0.00 4.00

21:48:44 Sdsk-1 100.00 1.00 250.50 16031.94 0.00 4.00

21:48:50 Sdsk-1 99.80 1.00 249.50 15968.19 0.00 4.00

21:48:55 Sdsk-1 100.00 1.00 250.10 16006.36 0.00 4.00

21:49:00 Sdsk-1 100.00 1.00 250.00 16000.00 0.00 4.00

Average Sdsk-1 90.91 1.00 225.09 14405.84 0.00 4.04

And badtrk results pressing enter every five seconds then
del before the scan reaches 100%:

Scanning block 312100545, 17 % complete, bad block count = 0
Scanning block 312205569, 36 % complete, bad block count = 0
Scanning block 312315841, 56 % complete, bad block count = 0
Scanning block 312427777, 76 % complete, bad block count = 0
Scanning block 312487297, 86 % complete, bad block count = 0
Scan interrupted ..

No bad blocks found.

So to recap: badtrk 1) all of UNIX dies (gets horribly slow)
at less then 9% of the total disk space scanned. Then
rescan selectively from 312000000 to 312560639 remains slow.
Then rescan from 312000001 to 312560639 is back to 16000 blks /s

I'd say that SCO USB hard disk functionality is somewhat
in the crapper.

Will report on tonight's backup run.

Brian K. White

unread,

Nov 5, 2009, 3:01:33 PM11/5/09

to

Careful with seagate freeagent external drives. (desktop or portable)

They go to sleep after 15 idle minutes, and then when the host tries to
access the drive while it's asleep, the drive returns the wrong sort of
response code while it spins back up and the kernel (at least linux)
thinks the drive has been pulled or failed.

On linux there are two possible fixes. One tells the kernel to wait for
the drive to spin up instead of assuming it's failed. The other is to
change the sleep timer setting saved in the drive itself so that it no
longer goes to sleep. In the latter case you can also optionally save
the new setting in the drive so that it survives resets and power
cycles.

I don't know if or how you could do the equivalent of either of those on
OSR5 and I don't know if the problem even exists on osr5 (maybe osr5
already waits for a usb drive to spin up without having to be told to)

Just something to be aware of and find out for sure by testing.
Mount the drive, Ensure it's idle for 20 minutes, try to write a file to
it and see if the os waits for it to spin up gracefully.

If nothing else, you can mount the drive on linux (use a knoppix live cd
if nothing else available) and use sdparm to change the sleep setting in
the drive.

sdparm --clear STANDBY --save --six /dev/sdX

Where X is whatever letter the usb drive got, /dev/sdb , /dev/sdc , etc.

With "--save" the setting survives power cycles and resets.
So you can move it back to sco and no more spindown problem.

--
bkw

Steve M. Fabac, Jr.

unread,

Nov 6, 2009, 6:02:35 PM11/6/09

to

Thanks Bryan!

Looks like good information.

Let's consider power down as a potential cause of the problems that
I am seeing.

How then do we square the fact that the device starts badtrk at
16000+ blks/s and then switches to 300/s within minutes.
Can't we expect that the drive is not going to sleep as it
is being scanned? If it is, what weird behavior!

Here's another test: The backup fails for verify errors and so
Backup Edge deletes the database it generated during the verify.
I have gone back and selected list -> index only to build an
index of the "failed" archive. The index runs to completion
(well the first one on 11/4 "/etc/ptest" is the last file
placed on the archive.) and does not time out. Restoring a
file not identified as "bad" during the verify works.
(Well, somewhat. Files after the verify ends/aborts
prematurely are not retrievable.)

> ./.lock, 0 blocks
> Symbolic link to ===> /opt/K/SCO/php4/4.4.2Ba/.lock <V>
> ., 0 blocks (Directory)
> ./etc/ptest, 0 blocks (compressed) <V>
>
> SUMMARY - BYTE-BY-BYTE VERIFICATION
> Serial Number = TCBxxxxxxxx
> Date = Wed Nov 04 10:33:48 2009

Restoring a file that is listed as failed does not work:

From changedfiles_master.log:

../u/OLDTBL/hi_req.idx.Z, 76767 blocks <!Byte> <!File Unreadable>
==> Bytes differ at byte 363719
../u/OLDTBL/hi_req1.idx.Z, 76702 blocks <!Byte> <!File Unreadable>
==> Bytes differ at byte 363719
../u/OLDTBL/hi_isd1.idx.Z, 42768 blocks <!Byte> <!File Unreadable>
==> Bytes differ at byte 7555335

I ran list/index to build an index on the backup archive and then
used edge.restore to get one of the files and test the restored
file vs the original file:

cd /u/OLDTBL
# mv hi_isd1.idx.Z ohi_isd1.idx.Z
# edge.restore hi_isd1.idx.Z
edge.restore: info: defaulting to Primary Resource mpp:fsp!fsp0'
edge.restore: notice: getting archive information
edge.restore: notice: beginning restore
edge.restore: notice: wd for restore is '/'
Data Transfer In Progress
x ./u/OLDTBL/hi_isd1.idx.Z
Operation Finishing

Directory Not In Proper Format!

edge.restore: error: exit status is 10
#
# ls -lt *hi_isd1.idx.Z
-rw-rw-rw- 1 smsd group 21884928 Oct 7 21:58 hi_isd1.idx.Z
-rw-rw-rw- 1 smsd group 21897481 Oct 7 21:58 ohi_isd1.idx.Z
#
# cmp -l ohi_isd1.idx.Z hi_isd1.idx.Z | less
cmp:
7555336 277 377
12892776 277 377
16345588 277 377
EOF on hi_isd1.idx.Z
(END)

As to time-out, the above tests were performed the
next morning after the backup failed. Backup Edge
automatically unmouts the USB drive after the
backup/verify then automatically mounts it again
when you restore a file using edge.restore file_name.

Certainly the Seagate drive would have time to go to
sleep while it is unmounted. And it seems to wake-up
when running edge.restore file_name.

a ./u/tmp/csvfiles.zip, 165111 blocks Will not compress...(already ends in ".zip)
a ./u/UNLOAD_pre_confirm/hi_requisition.unl, 409441 blocks...compressing==> 38127
blocks!! (91%)

# grep csvfiles.zip verify_master.log
./u/tmp/csvfiles.zip, 165111 blocks <V>

# grep hi_requisition.unl verify_master.log
./u/UNLOAD_pre_confirm/hi_requisition.unl, 0 blocks (compressed) <V>

Restore non=compressd file:

# cd /u/tmp
# ls -lt csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 csvfiles.zip
# mv csvfiles.zip ocsvfiles.zip

Create test.sh:

date > log
edge.restore -zSEG_NUM=18 csvfiles.zip &
while test -f mark
do
ls -lt *.zip
echo " "
sleep 3
done | tee -a log

Run sar -d 3 20 > /tmp/sar.out

./test.sh

cat log
Fri Nov 6 14:42:04 CST 2009
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 253952 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 15491072 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 30990336 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 46620672 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 60907520 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 76406784 Nov 6 14:42 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

-rw-r--r-- 1 root sys 84536389 May 26 00:07 csvfiles.zip
-rw-r--r-- 1 root sys 84536389 May 26 00:07 ocsvfiles.zip

# sum -r *.zip
52002 165111 csvfiles.zip
52002 165111 ocsvfiles.zip
#

cat /tmp/sar.out:

SCO_SV mpp 3.2v5.0.7 Xeon 11/06/2009

14:41:59 device %busy avque r+w/s blks/s avwait avserv (-d)
14:42:02
14:42:05 Sdsk-0 4.30 1.00 9.93 216.56 0.00 4.33

14:42:08 Sdsk-0 1.32 1.00 30.13 60.93 0.00 0.44
Sdsk-1 36.42 1.00 37.75 229.14 0.00 9.65

14:42:11 Sdsk-0 8.00 1.00 120.67 241.33 0.00 0.66
Sdsk-1 62.67 1.00 86.00 2602.00 0.00 7.29

14:42:14 Sdsk-1 100.00 1.00 332.12 10041.72 0.00 5.66

14:42:17 Sdsk-1 100.00 1.00 333.44 10074.17 0.00 5.65

14:42:20 Sdsk-1 100.00 1.00 332.13 10038.03 0.00 5.24

14:42:23 Sdsk-0 6.62 1.00 79.14 158.28 0.00 0.84
Sdsk-1 100.00 1.00 306.62 9265.56 0.00 4.62

14:42:26 Sdsk-1 100.00 1.00 333.44 10084.11 0.00 4.75

14:42:29 Sdsk-0 2.33 1.00 34.88 69.77 0.00 0.67
Sdsk-1 43.19 1.00 91.03 2713.62 0.00 4.74

14:42:32 Sdsk-0 100.00 1.00 313.20 9112.87 0.01 15.08

14:42:35 Sdsk-0 100.00 1.00 879.80 27831.13 0.00 12.29

14:42:38
14:42:41
14:42:45

Repeat for compressed file:

# pwd
/u/UNLOAD_pre_confirm
# ls -lt hi_requisition.unl
-rw-r--r-- 1 informix informix 209633627 Feb 20 2009 hi_requisition.unl
# mv hi_requisition.unl ohi_requisition.unl
#

Update test.sh:
date > log
edge.restore -zSEG_NUM=18 hi_requisition.unl &
while test -f mark
do
ls -lt *hi_requisition.unl
echo " "
sleep 3
done | tee -a log

Run sar and test.sh
# sar -d 3 10 > /tmp/sar.out & ./test.sh

cat log:
# cat log
Fri Nov 6 15:03:21 CST 2009
-rw-r--r-- 1 informix informix 209633627 Feb 20 2009 ohi_requisition.unl

-rw------- 1 root sys 86887173 Nov 6 15:03 hi_requisition.unl
-rw-r--r-- 1 informix informix 209633627 Feb 20 2009 ohi_requisition.unl

-rw-r--r-- 1 informix informix 209633627 Feb 20 2009 hi_requisition.unl
-rw-r--r-- 1 informix informix 209633627 Feb 20 2009 ohi_requisition.unl

# sum -r *hi_requisition.unl
62745 409441 hi_requisition.unl
62745 409441 ohi_requisition.unl
[1] + Done sar -d 3 10 > /tmp/sar.out & ./test.sh

# cat /tmp/sar.out

SCO_SV mpp 3.2v5.0.7 Xeon 11/06/2009

15:03:21 device %busy avque r+w/s blks/s avwait avserv (-d)
15:03:24 Sdsk-1 100.00 1.00 196.37 5031.68 0.00 5.43

15:03:27 Sdsk-1 100.00 1.00 263.88 7942.47 0.00 4.42

15:03:30 Sdsk-0 56.95 1.01 281.79 563.58 0.02 2.02

15:03:33
15:03:36
15:03:39
15:03:42
15:03:45
15:03:48
15:03:51 Sdsk-0 100.00 1.00 211.59 5968.21 0.05 11.49

Average Sdsk-0 30.02 1.01 49.37 653.61 0.03 6.08
Sdsk-1 22.27 1.00 45.86 1292.05 0.00 4.86

The above files are near the beginning of the archive:

# wc -l /usr/lib/edge/lists/simple_job/*_master.log
438112 /usr/lib/edge/lists/simple_job/backup_master.log
94 /usr/lib/edge/lists/simple_job/changedfiles_master.log
284833 /usr/lib/edge/lists/simple_job/verify_master.log

#
# grep -n pre_confirm.hi_requisition.unl /usr/lib/edge/lists/simple_job/backup_m
aster.log
18188:a ./u/UNLOAD_pre_confirm/hi_requisition.unl, 409441 blocks...compressing==
> 38127 blocks!! (91%)
#

18,188 into 438,112. When I tried to restore files beyond
the point (and around the point) where the verify gives up,
I get:

Looking at the files verified 11/5 when the verify
fails I see the following:

./util/backup/usr1/test1/dead.letter, 1 blocks <V>
./util/backup/usr1/test1/am, 4 blocks <V>
./util/backup/usr1/test1/.profile.bak.032408, 0 blocks (compressed) <V>
./util/backup/usr1/test1/.profile.bak.032508, 0 blocks (compressed) <V>
./util/backup/usr1/test1/.profile.bak.040208, 0 blocks (compressed) <V>
./util/backup/usr1/test1/.profile_original, 0 blocks (compressed) <V>
./util/backup/usr1/test1/.prwarn_time, 0 blocks <V>
./util/backup/usr1/test1/.sh_history, 0 blocks (compressed) <V>
./util/backup/usr1/test1/ICHSTDEL.LST, 0 blocks (compressed) <V>

SUMMARY - BYTE-BY-BYTE VERIFICATION
Serial Number = TCBxxxxxxxx

Date = Thu Nov 05 13:38:33 2009
Segments Used = 7
Data Read = 6.03GB

# grep -n dead.letter /usr/lib/edge/lists/simple_job/backup_master.log
286330:a ./util/backup/usr1/test1/dead.letter, 1 blocks

# pwd
/util/backup/usr1/test1
# edge.restore -zSEG_NUM=18 dead.letter
edge.restore: info: defaulting to Primary Resource mpp:fsp!fsp0'
edge.restore: notice: getting archive information
edge.restore: notice: beginning restore
edge.restore: notice: wd for restore is '/'
Data Transfer In Progress
Operation Finishing

Directory Not In Proper Format!

edge.restore: error: exit status is 10
#

Further information on hardware:

The working PCI 2.0 card is the $19.95 StarTech from RadioShack.
The Iogear GIC251U is not recognized by the X5DL8-GG except as
UHIC Class 310 (USB 1.1) and does not work at all. I've not
tried the Vantec yet.

I tried the Buffalo Ministation Cobalt 320G today and it will not
even pass the mkdev hd and the system locks up tight and
throws the same Kernel Panic Trap E when the USB cable is
pulled.

I'm going to install another SCSI drive in the SCA cage and
test backing up to it per Tom Podner's suggestion to verify
that BE is working when it is not hampered by the SCO USB
drivers and my hardware.

Nico Kadel-Garcia

unread,

Nov 7, 2009, 7:30:32 AM11/7/09

to

On Nov 6, 6:02 pm, "Steve M. Fabac, Jr." <smfa...@att.net> wrote:

> Further information on hardware:
>
> The working PCI 2.0 card is the $19.95 StarTech from RadioShack.
> The Iogear GIC251U is not recognized by the X5DL8-GG except as
> UHIC Class 310 (USB 1.1) and does not work at all. I've not
> tried the Vantec yet.

OK, Steve? Hold up here, you're sending a lot of material.

In general, activating and re-attaching and doing things interesting
with USB is very driver dependent. In particular, yanking aany cable
to a hard drive that has its file system mounted ... nasty, and
begging for things to fail badly. But these features are not things
you can expect SCO to support well: it requires extensive testing
across a broad variety of consumer hardware, for an operating system
that was *not* sold for consumer use, it was sold for servers. One
could reasonably expect servers to use SCSI, not USB, or to have the
budget for an external NAS.

Rather than spending dozens of hours debugging this, have you
considered simply investing in an NFS capable NAS? Or running your
storage disk on another, more contemporary and flexible OS such as
Linux?

Steve M. Fabac, Jr.

unread,

Nov 7, 2009, 11:47:07 PM11/7/09

to

Nico Kadel-Garcia wrote:
> On Nov 6, 6:02 pm, "Steve M. Fabac, Jr." <smfa...@att.net> wrote:
>
>> Further information on hardware:
>>
>> The working PCI 2.0 card is the $19.95 StarTech from RadioShack.
>> The Iogear GIC251U is not recognized by the X5DL8-GG except as
>> UHIC Class 310 (USB 1.1) and does not work at all. I've not
>> tried the Vantec yet.
>
> OK, Steve? Hold up here, you're sending a lot of material.
>
> In general, activating and re-attaching and doing things interesting
> with USB is very driver dependent. In particular, yanking aany cable
> to a hard drive that has its file system mounted ... nasty, and
> begging for things to fail badly.

Agreed. No one in their right mind would pull a USB cable from a
running system for no good reason. However the system was locked.
Commands running (mpstat, sar, etc) continued to run. stopping
any running command (back to root #) then trying to run another
command just hangs. No panic, no nothing until I pulled the USB
cable. Then trap E. The alternative to pulling the cable was
just hard reset. Scary stuff when running RAID-0 over four disks.

But these features are not things
> you can expect SCO to support well: it requires extensive testing
> across a broad variety of consumer hardware, for an operating system
> that was *not* sold for consumer use, it was sold for servers. One
> could reasonably expect servers to use SCSI, not USB, or to have the
> budget for an external NAS.
>
> Rather than spending dozens of hours debugging this, have you
> considered simply investing in an NFS capable NAS? Or running your
> storage disk on another, more contemporary and flexible OS such as
> Linux?

As I posted in response to Brian (sorry about the "Bryan"!!), I like FTP
with Backup Edge, works well. Problem is how to rotate backup
media off-site for fire, flood, equipment theft, etc... protection.

Microlite supports Amazon S3 backup but I'm not a fan of placing
sensitive client data in their anonymous hands. And not to mention
the long time it would take to back-up 40-60 Gigs over any
reasonable Internet connection.

If I can make it work, it's just another tool in my belt.
If not, then I know not to try it on a client's system. And
the documentation on the problems I've encountered will lend
authority to my possible future recommendation to moving to
new OS or hardware.

Nico Kadel-Garcia

unread,

Nov 8, 2009, 7:16:14 AM11/8/09

to

Got it. That's what I'd use "rsnapshot" for, with the server on a
contemrporary and supportable OS and the clients are just rsync and
SSH based and a little tool to restrict SSH keys to be used only for
rsync, "validate-rsync" and discoverable via Google. I mirror live
operating systems to separate systems with removable hard drives, with
the copies hard-linked together for identical files for efficiency.
Then I tape backup or duplicate *those* to a remote site, to keep the
backup load off my core servers. That way, if I need an off-site
recovery, I can recovery the remote tape or drive for much faster
local access. If you need commercial grade security, "Iron Mountain"
provides exactly this sort of service.