Does anyone know what SCO is attepmting to do when calling int13 with
AH=FA? Could it really be checking for PC Tools??? I'm wondering
primarilly to see how critical programming this function of INT13
might be. (If it is, maybe I'll try my hand at it...)
Thanks for any info you might have...
Carl
Boggle!
Sounds more like a botch, than a bochs.
Would it not be simpler to just install OSR5 on a different machine...?
Or have I missed the point completely?
--
Richard Howlett
mailto:ric...@howie.org.uk
There's a thread reguarding VMWare in this news group which says basically
the same thing. For the moment, the OS is trying to do some things which
the emulator's can't catch, thus it fails.
Bela was kind enough to go into some of the details in that thread. Look
there first.
I've not seen or heard of 'bochs' before howver. How about providing a URL
so we can go have a look (if we're interested).
bkx
This is appararently trying to detect LAN boot PROMs supplied by either
LANWorks or Dirk Koeppen EDV-Beratungs-GmbH. The code expects these
calls to return with the carry flag set if the PROMs it's probing for
are not present.
I believe that is the defined behavior for any case where a supported
BIOS API interrupt is being called with an unsupported subfunction -- it
should return with general purpose registers unchanged, and carry set.
(Not 100% sure about the part about general purpose registers being left
unchanged.)
Bochs should automatically implement this for all unknown subfunctions.
You should only be having a problem if Bochs actually tries to supply
some other API that would respond to INT 13 function FA (and apparently
it doesn't, or it wouldn't give you an error saying that it doesn't know
that function...)
>Bela<
> "Carl Sopchak" <carl.s...@cegis123.com> wrote in message
> news:c12f3009.02042...@posting.google.com...
> > I've been trying to install OSR5 on top of linux using 'bochs' (a
> > pentium emulator). During the boot of the install disk (and if I
> > recall correctly, any OSR5 bootable diskette), I get an error from
> > bochs stating that it doesn't know function AH=FA for interrupt 13
> > (I'm paraphrasing). I've been working with the bochs people to get
> > around this (and a hack that ignores this function succeeds to boot
> > the floppy). I've been trying to find information on what SCO is
> > trying to do with this call, but all I've run across is a mention of
> > PC Tools VSAFE and VWATCH functions.
> >
> > Does anyone know what SCO is attepmting to do when calling int13 with
> > AH=FA? Could it really be checking for PC Tools??? I'm wondering
> > primarilly to see how critical programming this function of INT13
> > might be. (If it is, maybe I'll try my hand at it...)
>
> There's a thread reguarding VMWare in this news group which says basically
> the same thing. For the moment, the OS is trying to do some things which
> the emulator's can't catch, thus it fails.
Actually the problems with VMWare are of a different nature. I haven't
heard of any BIOS incompatibilities with VMWare; the problems are in
hardware compatibility issues -- OpenServer's IDE and floppy drivers do
peculiar things at the hardware level, that aren't expected by VMWare.
> Bela was kind enough to go into some of the details in that thread. Look
> there first.
>
> I've not seen or heard of 'bochs' before howver. How about providing a URL
> so we can go have a look (if we're interested).
I'd also be interested in a summary of issues encountered between Bochs
and OpenServer...
>Bela<
*blink* ok. My memory is failing me :(
First, I'm not installing on another PC because I only have one client
(now fairly inactive) that runs SCO, and it is not cost justified.
Worst case is that I'll keep my old dual P200 PC to support them. I
just thought it'd be easier and more convenient if I could run both
linux & sco at the same time on my new p4 1.7GHz box...
Info on bochs: It's a Pentium emulator, similar to VMWare in that it
allows you to install guest OS's on linux. It's home page is
http://bochs.sourceforge.net. BTW, it's pronounced like the English
word "box". Plex86 is another VM, if anyone's interested
(http://www.plex86.org). Both are still "young", but there are people
using both. (There's a lot of "cross development" between them, too.)
And both are Open Source (free). Also, linux-abi is "native linux"
support for running SCO binaries. It works with some of the programs
that I *must* run, but not all (seg faults). That's why I'm trying
bochs (and plex86, and wine [for windoze pgms, www.winehq.org]; I gave
up on VMWare because they say they have no plans on supporting OSR5).
I'll check out the VMWare thread. Could provide useful info, even if
it is not directly related to Int13/FAh. Thanks for the pointer...
As far as other issues trying to run OSR5 on bochs, I've got a hacked
bios that gets around initial boot problems, but the kernel
initialization causes it to crash at F dptrinit (I once made it to F
fdi_init). I know I've run across the list describing these
initialization steps, but can't put my finger on it. Can anyone point
me to it?
Bela, thanks for the info on the LAN boot proms. Do you mind if I
copy your response (with credit, of course) into the bochs bug report
that I initiated on the bochs site? I'm sure it would be helpful to
their development.
If people are interested, I could keep this newsgroup updated on my
attempts to get OSR5 running on Linux. Let me know.
Thanks again,
Carl
> I'll check out the VMWare thread. Could provide useful info, even if
> it is not directly related to Int13/FAh. Thanks for the pointer...
It might. Also, your experiences with bochs might be helpful to the
people struggling with VMWare -- all indications are that VMWare has no
interest in trying to work with OpenServer, so maybe those people should
stop beating their heads against it and use something that tries to
work.
> As far as other issues trying to run OSR5 on bochs, I've got a hacked
> bios that gets around initial boot problems, but the kernel
What initial boot problems are you talking about? Just the INT13:FA
problem? That is, I believe, a bug in the bochs BIOS. It should accept
any unknown BIOS call, returning with carry set (and as I said,
registers unchanged, but I'm not so sure about that. Probably couldn't
hurt, though.)
Now, bochs probably has a really good reason not to do that: in order
to ferret out yet-unknown calls that they need to implement in order to
improve the emulation. But that should be a setting that you control --
you should be able to set it to behave like a system that doesn't have
the service; or to balk, like it currently does. You would only use the
latter setting if you were a bochs developer (or were trying to debug a
failure).
> initialization causes it to crash at F dptrinit (I once made it to F
> fdi_init). I know I've run across the list describing these
> initialization steps, but can't put my finger on it. Can anyone point
> me to it?
See http://osr5doc.ca.caldera.com:457/HANDBOOK/tshootD.kinilet.html for
documentation on the boot letter sequence. But it only briefly covers
the "F" series, which is where you're likely to have the most trouble.
When it says "F xxinit", it's initializing driver "xx". The ones that
are most likely to cause you trouble are hardware drivers. The two you
mention are the drivers for older DPT SCSI and RAID host adapters, and
for the Future Domain 18x0 chipset (Future Domain 16x0, 600, 700, and
Adaptec 292x host adapters).
bochs probably emulates a single variety of SCSI host adapter, something
that has simple hardware and is easy to emulate. As long as it isn't
one of those two, you can just disable those host adapter drivers. The
syntax is:
Boot
: defbootstr disable=dptr,fdi_
OpenServer drivers have two names, which are _usually_ the same, but in
the case of that Future Domain driver they are different. One is "fdhb"
and the other is "fdi_". I'm not sure which you must specify in the
"disable" statement. Actually it looks like there's no harm in
attempting to disable unrecognized drivers, so just give it:
Boot
: defbootstr disable=dptr,fdhb,fdi_
If it hangs at other "F xxinit" points, add those to your disable list.
Once you've successfully installed, the host adapter drivers that don't
apply to your system will be linked out of the kernel, so you won't have
to keep doing this.
> Bela, thanks for the info on the LAN boot proms. Do you mind if I
> copy your response (with credit, of course) into the bochs bug report
> that I initiated on the bochs site? I'm sure it would be helpful to
> their development.
That's fine with me. Plus this message with the more details and
ranting. ;-}
> If people are interested, I could keep this newsgroup updated on my
> attempts to get OSR5 running on Linux. Let me know.
Yes, please do. A fair number of people have asked about this sort of
thing, and everything's archived forever (until google gets bored with
it) in groups.google.com...
>Bela<
Bela Lubkin <be...@caldera.com> wrote in message news:<2002042318...@mammoth.ca.caldera.com>...
> Carl Sopchak wrote:
>
> > I'll check out the VMWare thread. Could provide useful info, even if
> > it is not directly related to Int13/FAh. Thanks for the pointer...
>
> It might. Also, your experiences with bochs might be helpful to the
> people struggling with VMWare -- all indications are that VMWare has no
> interest in trying to work with OpenServer, so maybe those people should
> stop beating their heads against it and use something that tries to
> work.
VMWare's stated policy to not support OSR5 is why I stopped even
trying to get that to work. bochs and plex86, as far as I can tell,
both seem to be in active development (something you kinda have to
watch out for with Open Source projects, I've found...). I was really
quite please with bochs' responsiveness to the boot issues. There are
new issues that need resolving, though. We'll see if the
responsiveness is similar <grin>.
>
> > As far as other issues trying to run OSR5 on bochs, I've got a hacked
> > bios that gets around initial boot problems, but the kernel
>
> What initial boot problems are you talking about? Just the INT13:FA
> problem? That is, I believe, a bug in the bochs BIOS. It should accept
> any unknown BIOS call, returning with carry set (and as I said,
> registers unchanged, but I'm not so sure about that. Probably couldn't
> hurt, though.)
I couldn't boot from the floppy OR the cd-rom. The CD-Rom problem, if
I recall correctly, was that it was checking for a boot signature
(0x55AA, I believe) that wasn't found. It also had the INT13:FA
problem, since it ended up booting from a floppy image that was on the
CD. Both have been fixed with the hacks made by the bochs developer.
They should be committed to the bochs CVS in the near future (so I was
told), and I would imagine made part of the next release.
>
> Now, bochs probably has a really good reason not to do that: in order
> to ferret out yet-unknown calls that they need to implement in order to
> improve the emulation. But that should be a setting that you control --
> you should be able to set it to behave like a system that doesn't have
> the service; or to balk, like it currently does. You would only use the
> latter setting if you were a bochs developer (or were trying to debug a
> failure).
I agree that it should be configurable, and so did the bochs developer
when I suggested it to him. He said he added it to his to-do list...
I found the letter descriptions in the handbook. I ran across another
one of your posts (from quite a while back) that suggtested the
disable=, which I used (although I used disable=fdi, not disable=fdi_)
and seemed to work. Now, however, OSR5 doesn't find a hard disk. I
have a post on the bochs discussion forums (and added a bug to their
tracking list) about this, trying to find out what controller they
emulate. It appears that it's a single-channel IDE controller,
though... (Limits you to one HD + one CD, or two HDs.)
Just had a thought... Are there any limits to the geometry of a drive
that OSR5 can use? bochs uses "drive images" (files) to emulate a
hard disk, and you tell bochs what the geometry is. Maybe that's
where the problem is? The image I'm using has a geometry of
C/H/S=1015/16/63 (512Mb). (I tried using biosgeom [from another of
your long-ago posts], to no avail...)
>
> > Bela, thanks for the info on the LAN boot proms. Do you mind if I
> > copy your response (with credit, of course) into the bochs bug report
> > that I initiated on the bochs site? I'm sure it would be helpful to
> > their development.
>
> That's fine with me. Plus this message with the more details and
> ranting. ;-}
I was going to post this just on the bug report. I'll point them to
this thread on google, in case they're interested. Thanks.
>
> > If people are interested, I could keep this newsgroup updated on my
> > attempts to get OSR5 running on Linux. Let me know.
>
> Yes, please do. A fair number of people have asked about this sort of
> thing, and everything's archived forever (until google gets bored with
> it) in groups.google.com...
Will do. I'll keep this thread alive with new issues that pop up, and
when I get it going (hopefully in the not too, too distant future!), I
post a summary of what I had to do.
>
> >Bela<
Carl
Just out of curiosity, why are you using bochs under linux instead of
installing OSR5 [presumably 5.0.6] onto its own partition on the disk? If
you need to use OSR5 only occasionally, would it really disrupt things to
reboot your system at those times?
>
> Thanks again,
>
> Carl
It might <grin>! It's not real uncommon for me to have several things
going at once, some long-running. Also getting out of everything
that's running (always have mail client, calendar, shell prompt, &
netscape going, often have several others going), boot to OSR5, shut
down OSR5 when I'm done, reboot linux, and then try to remember what I
was doing before being interrupted is a pain and time consuming. Just
thought it'd be easier if I could run OSR5 in an emulator. It'd be
cheaper for my client, as well...
Oh, yeah, I don't have 5.0.6 (just 5.0.5), so there would be an
upgrade cost. (From what I understand, 505 won't run on a P4. I'm
hoping there's no problem running it on an emulated Pentium, even if
that emulation is executing on a P4. What IS the issue with 505 and
P4's anyway?)
Besides, there are other reasons (primarilly educational) for going
this route...
Carl
Just an aside here. If you do have a problem running OSR5 when in
the emulator mode you are still going to have to reboot into OSR5
adn try to duplicate the problem in order to determine whether it
is an OSR5 problem or an emulator.
The 'not running' on a P4 is nature of the P4 and it can run hot
while the timing loops that are aware of the P4 take care of this
in 5.0.6. As Jeffl pointed out the thermal control circuits in the
P4 cause it to just run slower and slower and he watched one go
from about 2GHz to under 600Hz with each reboot [that is from
memory]. Darn site better than the AMD that will actually smoke,
and burn the chip up in about 1 second if the fan fails.
See the gory pictures at www.tomshardware.com. AMD has fixed that
with a fix that is simply not useable in a Unix system. If the
heat rises - as if the fan failed - the power supply will just shut
off. I'd hate to have to run fsck and a few hunder GB of drive
space if that happened. On the P4 it just runs slower.
With warranted low-end semi-brand name white box systems with NO
operating system [surprise surprise] going for $399 you can almost
afford to buy a system to do nothing but experment with.
Brand name is Microtel. 1GH Celeron, 128MB RAM, 40GB ultra-dma,
cdrom and 56K modem, no monitor $399.
A system with a 1.6Ghz P4, 400Mhz FSB, and 20GB ultra-dma HD is
$498. Add $100 more for 256MB ram and a 40GB HD.
All without the MS $100 [approx] SW penalty.
From the place you'd never expect. www.walmart.com.
Finding any system w/o Windows without having to build it yourself
is hard to do anymore.
>Besides, there are other reasons (primarilly educational) for going
>this route...
That's one of the best reasons. Cheaper too than any other
educational method.
Bill
--
Bill Vermillion - bv @ wjv . com
I doubt it :-)
Considering that you could buy a separate box for this for a few hundred
dollars.. how much time have you spent on this?
>
> Oh, yeah, I don't have 5.0.6 (just 5.0.5), so there would be an
> upgrade cost. (From what I understand, 505 won't run on a P4. I'm
> hoping there's no problem running it on an emulated Pentium, even if
> that emulation is executing on a P4. What IS the issue with 505 and
> P4's anyway?)
>
> Besides, there are other reasons (primarilly educational) for going
> this route...
Sure- THAT I agree with :-)
--
Tony Lawrence
SCO/Linux Support Tips, How-To's, Tests and more: http://pcunix.com
Free Unix/Linux Consultants list: http://pcunix.com/consultants.html
> I found the letter descriptions in the handbook. I ran across another
> one of your posts (from quite a while back) that suggtested the
> disable=, which I used (although I used disable=fdi, not disable=fdi_)
> and seemed to work. Now, however, OSR5 doesn't find a hard disk. I
> have a post on the bochs discussion forums (and added a bug to their
> tracking list) about this, trying to find out what controller they
> emulate. It appears that it's a single-channel IDE controller,
> though... (Limits you to one HD + one CD, or two HDs.)
Well, that's pretty much the same area that trips up VMWare. The OSR5
IDE driver ("wd") has rather peculiar code to detect IDE controllers and
drives. The driver dates back to the WD1010 (full-length 16-bit ISA
ST506 controller), and possibly earlier than that. It "knows" about a
lot of ancient controller quirks and takes steps to avoid them. These
steps apparently still work with the latest in "real" IDE hardware, but
not with most emulation environments.
> Just had a thought... Are there any limits to the geometry of a drive
> that OSR5 can use? bochs uses "drive images" (files) to emulate a
> hard disk, and you tell bochs what the geometry is. Maybe that's
> where the problem is? The image I'm using has a geometry of
> C/H/S=1015/16/63 (512Mb). (I tried using biosgeom [from another of
> your long-ago posts], to no avail...)
Those numbers are certainly well within any limits. I think 5.0.5 balks
at IDE drives larger than 8GB, and 5.0.6 has a higher limit, but I'm not
really sure. But a 512MB image with the above geometry should be fine.
Might be a bit cramped for an install, but you're a long way from having
to worry about that yet.
>Bela<
Are these steps documented somewhere? If bochs knew what they were,
perhaps they can accommodate them...
> Might be a bit cramped for an install, but you're a long way from having
> to worry about that yet.
Hope I have to, at some point <grin>...
>
> >Bela<
Carl
Let me clarify: not reboot out of linux into osr5, turn on my osr5 box
(assuming dust hasn't destroyed it <grin>)...
Carl
I don't need a separate box - I got one that works fine with OSR5
already loaded up. Just trying to make life a little easier for me,
and a bit quicker response when (if?) my client calls. I don't charge
clients for time I spend doing this kind of stuff, because it has
little or no value to them. (So it IS cheaper for THEM <grin>.) I've
spent a few solid days on it so far, and I'm willing to spend several
more. If it doesn't work out, oh well... As I said...
> > Besides, there are other reasons (primarilly educational) for going
> > this route...
>
>
> Sure- THAT I agree with :-)
Carl
>Carl Sopchak wrote:
>
>> "Bob Bailin" <72027...@compuserve.com> wrote in message
>> news:<aa6db2$r9i$1...@suaar1ac.prod.compuserve.com>...
>>>Just out of curiosity, why are you using bochs under linux
>>>instead of installing OSR5 [presumably 5.0.6] onto its
>>>own partition on the disk? If you need to use OSR5 only
>>>occasionally, would it really disrupt things to reboot your
>>>system at those times?
>> It might <grin>! It's not real uncommon for me to have several
>> things going at once, some long-running. Also getting out of
>> everything that's running (always have mail client, calendar,
>> shell prompt, & netscape going, often have several others
>> going), boot to OSR5, shut was doing before being interrupted
>> is a pain and time consuming. Just thought it'd be easier if I
>> could run OSR5 in an emulator. It'd be cheaper for my client,
>> as well...
>I doubt it :-)
>Considering that you could buy a separate box for this for a few
>hundred dollars.. how much time have you spent on this?
And another alternative that I run on a test machine is to have
replaceable IDE drives. Dedicate a drive to an OS and don't worry
about multiple boot. Need to change, shutdown, swap tray, and
reboot. As you indiate the separate box I use that. The one with
the swappable drive can be ME, 98, UW7, OSR5, RH7.7, and a couple
of others. There are so many 4-8GB drives not being used by so
many that you can often just get them given to you and those are
large enough for test applications. And you really don't care if
you toast one or not playing around.
Let me clarify your clarification with my own clarification :-)
You had said you wanted to run OSR5 under the emulator.
My comment was that if you had OSR5 problems, then you would have
to boot into a native OSR5 and NOT run it under an emulator to
determine if the problem was in OSR5 or in the way the emulator
handled it.
Basic Trouble-Shooting 101. Start at the lowest level with the
last amount of components and trace each step one at a time and
never skip anything - even if in your heart know 'know' that
couldn't be the problem.
I've got the install CD booting, I've disabled the fdi and dptr
drivers, but now I get "WARNING: hd: no root disk controller found".
The emulated bios shows "IDE0-0: Generic 1234 ATA-2 Hard-Disk device"
for the disk image that it uses, so I know OSR should be using the wd
driver (right??).
Using biosgeom shows the disk as 1014/16/63; the program that I used
to create the disk image stated _1015_/16/63. I'm assuming that the
difference in cylinders is one's counting from 0, the other from 1.
(I'll try to confirm this...)
I've tried the following boot string options (appended to "defbootstr
disable=fdi,dptr"):
- Result is "WARNING: hd: no root disk controller found"
Sdsk=wd(0,0,0)
Sdsk=wdha(0,0,0) {later found this doesn't make sense}
pci.bios
phoenix486
scsi.noscan
wd.noscan {this doesn't seem to make much sense, either...}
wd.udma=off {never know...}
wd.geom=[x]
where [x] is one of "physical", "logical", "translate",
"1014,16,63",
"1015,16,63"
wd.debug[=x]
where [=x] is one of "" (no parm), "=udma", "=all"
- Result is "Test Open Failed "hd=wd", No Root Disk Controller"
hd=wd
hd=Sdsk Sdsk=wd(0,0,0) {except "hd=Sdsk" replaced "hd=wd"}
These last two might give me a clue. Upon looking at the log file
created by bochs (with all sorts of messaging options turned on), I
found "write cmd 0x70 (SEEK) not supported". Is this how OSR
determines if a disk is present??
If that's not the case, are there other boot parameters (in particular
for the wd driver) that might tell OSR exactly where to look?
Any help would be appreciated...
Thanks,
Carl
> I've got the install CD booting, I've disabled the fdi and dptr
> drivers, but now I get "WARNING: hd: no root disk controller found".
>
> The emulated bios shows "IDE0-0: Generic 1234 ATA-2 Hard-Disk device"
> for the disk image that it uses, so I know OSR should be using the wd
> driver (right??).
>
> Using biosgeom shows the disk as 1014/16/63; the program that I used
> to create the disk image stated _1015_/16/63. I'm assuming that the
> difference in cylinders is one's counting from 0, the other from 1.
> (I'll try to confirm this...)
That's probably the issue. Not a real problem anyway. (If they truly
disagreed and the OS was _higher_ then it would eventually -- when the
"disk" was >99% full -- attempt to write past the end of the image, get
some sort of failure, and behave weirdly since it wouldn't expect that
-- but since the OS is coming in lower, at worst it'll just miss out on
one cyl's worth of storage.)
> I've tried the following boot string options (appended to "defbootstr
> disable=fdi,dptr"):
>
> - Result is "WARNING: hd: no root disk controller found"
> Sdsk=wd(0,0,0)
> Sdsk=wdha(0,0,0) {later found this doesn't make sense}
Neither does "Sdsk=wd(0,0,0)". OSR5 does _not_ see IDE disks as any
sort of SCSI device. They are their own thing, unrelated to SCSI and
not using any part of the OSR5 SCSI driver framework. This confusion
comes about partly because OSR5 _does_ see other IDE devices as SCSI
(anything "ATAPI"). ATAPI is basically a slightly braindamaged version
of SCSI, hidden behind a translation layer. OSR5 retranslates that back
to SCSI so that you can use the OSR5 SCSI peripheral drives with ATAPI
devices.
> pci.bios
I'm not familiar with this one.
> phoenix486
I was the cause of this -- had an old 486 machine with a buggy Phoenix
BIOS, and since I was in a position to influence the development
process, I was able to get a workaround in. I'm sure no current machine
or even an emulator BIOS could possibly duplicate the stupid bug that
BIOS had.
> scsi.noscan
> wd.noscan {this doesn't seem to make much sense, either...}
These control what sorts of SCSI or SCSI-like probing is done on the
buses. Shouldn't affect IDE hard disks, which aren't even vaguely SCSI.
> wd.udma=off {never know...}
That might help, or (apparently) not.
> wd.geom=[x]
> where [x] is one of "physical", "logical", "translate",
> "1014,16,63",
> "1015,16,63"
This doesn't seem like a geometry issue. (Might have those next, if you
can get past the recognition issue.)
> wd.debug[=x]
> where [=x] is one of "" (no parm), "=udma", "=all"
> - Result is "Test Open Failed "hd=wd", No Root Disk Controller"
That's too bad. "wd" really needs more debugging output in paths other
than UDMA setup.
> hd=wd
> hd=Sdsk Sdsk=wd(0,0,0) {except "hd=Sdsk" replaced "hd=wd"}
>
> These last two might give me a clue. Upon looking at the log file
> created by bochs (with all sorts of messaging options turned on), I
> found "write cmd 0x70 (SEEK) not supported". Is this how OSR
> determines if a disk is present??
"hd=wd" is what you want. It also should make no difference -- you are
only forcing what's supposed to be determined automatically by whether
the various hard disk drivers (primarily "wd" for IDE and "Sdsk" for
SCSI disks) actually found any disks. So if "wd" recognition succeeded,
you wouldn't need "hd=wd", and if it failed, "hd=wd" wouldn't help.
But, anyway, _yes_, this is part of how "wd" detects hard disks. It
does a "seek 0" and if it succeeds, goes on to some other tests. If it
fails then there is definitely no hard disk.
So, to get past this hurdle, bochs needs to support the IDE SEEK
command. Which should be quite simple. For OSR5's purposes, it should
return success if there is a(n emulated) _hard disk_ drive (not an
emulated ATAPI CD/tape/whatever) at the given controller/drive
coordinates, and the requested track is within its range. Besides
success/failure, it doesn't actually have to _do_ anything. IDE SEEK is
just supposed to position the head -- the OS is supposed to know what
it's doing, e.g. optimizing performance by moving the head near where
it's anticipated the next I/O will be done. If the actual location was
known then it would just issue a READ or WRITE. Since there's no "head"
with an emulated drive, any SEEK that's within range just "succeeds".
> If that's not the case, are there other boot parameters (in particular
> for the wd driver) that might tell OSR exactly where to look?
Unfortunately, no. You would like an override that tells it "look you
idiot, there _is_ an IDE drive at primary/master, don't do any of this
silly voodoo trying to detect it". That override doesn't exist.
See if you can get IDE SEEK support in bochs, then we'll bounce off the
next hurdle...
>Bela<
Carl
Bela Lubkin <be...@caldera.com> wrote in message news:<2002042603...@mammoth.ca.caldera.com>...
> Carl Sopchak wrote:
> > pci.bios
>
> I'm not familiar with this one.
From boot(HW):
pci.bios32
Uses PCI BIOS ROM 32-bit routines, if the BIOS supports them,
instead of accessing the hardware directly.
It was the "instead of accessing the hardware directly" that gave me a
glimmer of hope... (Alas, to no avail...)
>
> "hd=wd" is what you want. It also should make no difference -- you are
> only forcing what's supposed to be determined automatically by whether
> the various hard disk drivers (primarily "wd" for IDE and "Sdsk" for
> SCSI disks) actually found any disks. So if "wd" recognition succeeded,
> you wouldn't need "hd=wd", and if it failed, "hd=wd" wouldn't help.
I figured as much, but since I had nothing else to try...
>
> But, anyway, _yes_, this is part of how "wd" detects hard disks. It
> does a "seek 0" and if it succeeds, goes on to some other tests. If it
> fails then there is definitely no hard disk.
>
> So, to get past this hurdle, bochs needs to support the IDE SEEK
> command. Which should be quite simple. For OSR5's purposes, it should
> return success if there is a(n emulated) _hard disk_ drive (not an
> emulated ATAPI CD/tape/whatever) at the given controller/drive
> coordinates, and the requested track is within its range. Besides
> success/failure, it doesn't actually have to _do_ anything. IDE SEEK is
> just supposed to position the head -- the OS is supposed to know what
> it's doing, e.g. optimizing performance by moving the head near where
> it's anticipated the next I/O will be done. If the actual location was
> known then it would just issue a READ or WRITE. Since there's no "head"
> with an emulated drive, any SEEK that's within range just "succeeds".
I'll copy this to the bochs bug I initiated for this specific issue
(which, BTW, has had no reply from the bochs team :-( ). Perhaps it
will get them to act. Or maybe I'll try coding this. (If it's more
than a success-indicated return [e.g., keeping track of the seek for
subsequent reads], I might need some pointers...)
>
> > If that's not the case, are there other boot parameters (in particular
> > for the wd driver) that might tell OSR exactly where to look?
>
> Unfortunately, no. You would like an override that tells it "look you
> idiot, there _is_ an IDE drive at primary/master, don't do any of this
> silly voodoo trying to detect it". That override doesn't exist.
Is it me, or doesn't this capability make sense? It sure would be
nice to tell OSR "I have a stupid IDE drive on bus 0, master,
geometry=c/h/s. Use it!" Oh well...
>
> See if you can get IDE SEEK support in bochs,
I'll try...
> then we'll bounce off the
> next hurdle...
Great. Thanks again for the help.
>
> >Bela<
Now, I can't pretend that I have much clue as to if I coded the
function right, but the code was lifted from the seek portion of the
read command implementation for the hard drive, so I'm guessing it's
at least close. (I also found and briefly looked at the ATA-2 spec,
which is what is supposed to be implemented by bochs, and the spec and
code actually seem understandable [!] and consistent.)
Bela, what does OSR5 check for on return from the SEEK? What does it
do next to see if the drive exists? (Can you tell me the next several
steps without giving away company secrets?)
The bochs log file has (does this help?):
00290159500d[HD ] IO write to 01f3 = 01
00290159509d[HD ] IO write to 01f4 = 00
00290159518d[HD ] IO write to 01f5 = 00
00290159530d[HD ] IO write to 01f6 = a0
00290159833d[HD ] IO write to 01f7 = 70
00290159833d[HD ] write cmd 0x70 executed <== my debug stmt
00290159833d[HD ] concat_image_t.lseek(0)
00290159833d[HD ] SEEK completed <== my debug stmt
00292142586d[HD ] IO write to 03f6 = 04
00292142586d[HD ] hard drive: RESET
00292143282d[HD ] IO write to 03f6 = 00
00292143282d[HD ] Reset complete {DISK}
00293833375d[HD ] 8-bit read from 03f6 = 50 {DISK}
00293833390d[HD ] 8-bit read from 01f1 = 01 {DISK}
00378278000p[XGUI ] >>PANIC<< POWER button turned off.
(the last line is where I hit the virtual "Big Red Switch")
Thanks for any help you can give.
Carl
> Bela Lubkin <be...@caldera.com> wrote in message news:<2002042603...@mammoth.ca.caldera.com>...
> > So, to get past this hurdle, bochs needs to support the IDE SEEK
> > command. Which should be quite simple. For OSR5's purposes, it should
> > return success if there is a(n emulated) _hard disk_ drive (not an
> > emulated ATAPI CD/tape/whatever) at the given controller/drive
> > coordinates, and the requested track is within its range. Besides
> > success/failure, it doesn't actually have to _do_ anything. IDE SEEK is
> > just supposed to position the head -- the OS is supposed to know what
> > it's doing, e.g. optimizing performance by moving the head near where
> > it's anticipated the next I/O will be done. If the actual location was
> > known then it would just issue a READ or WRITE. Since there's no "head"
> > with an emulated drive, any SEEK that's within range just "succeeds".
>
> I'll copy this to the bochs bug I initiated for this specific issue
> (which, BTW, has had no reply from the bochs team :-( ). Perhaps it
> will get them to act. Or maybe I'll try coding this. (If it's more
> than a success-indicated return [e.g., keeping track of the seek for
> subsequent reads], I might need some pointers...)
As long as we're making progress without their help, it's probably
kinder not to ask them to jump. Even if they had immediately fixed
this, you would just have come back with the next issue. If, in the
end, you can go to them with a short list of precise changes to make,
they'll have a much easier time of it.
> > > If that's not the case, are there other boot parameters (in particular
> > > for the wd driver) that might tell OSR exactly where to look?
> >
> > Unfortunately, no. You would like an override that tells it "look you
> > idiot, there _is_ an IDE drive at primary/master, don't do any of this
> > silly voodoo trying to detect it". That override doesn't exist.
>
> Is it me, or doesn't this capability make sense? It sure would be
> nice to tell OSR "I have a stupid IDE drive on bus 0, master,
> geometry=c/h/s. Use it!" Oh well...
Right, that's what I was implying. But it doesn't currently exist and
would be a bit of a mess to add. It's on my mental list of things to do
to "wd" one of these years...
>Bela<
> Well, I tried coding the SEEK, and I know my code's being executed
> (via use of debug statements), but OSR5 still comes up with "no root
> disk"...
>
> Now, I can't pretend that I have much clue as to if I coded the
> function right, but the code was lifted from the seek portion of the
> read command implementation for the hard drive, so I'm guessing it's
> at least close. (I also found and briefly looked at the ATA-2 spec,
> which is what is supposed to be implemented by bochs, and the spec and
> code actually seem understandable [!] and consistent.)
I was trying to say that all your implementation needs to do is report
success if it should have been successful. i.e., in pseudocode:
IDE_SEEK(drive, cyl):
if (drive is not valid)
return failure
if (cyl is outside drive's range)
return failure
return success
An IDE SEEK positions the head. But an IDE READ or WRITE tells the
drive where to read or write -- there is no need to SEEK first. One
would only SEEK in a situation where you knew you were going to need
some data from "over there" in a short while, but not yet. You could
save a little time by moving the head before requesting the action.
Probably makes the most sense for writes -- you could easily have a
situation where you're going to write soon, but the data hasn't quite
finished being generated.
What I'm trying to say is, IDE SEEK has no real effect other than its
error checking (which is what "wd" is using it for) and performance.
But an emulated drive has no physical head to move, so the performance
effect is irrelevant. So your implementation should do nothing but the
error checks.
> Bela, what does OSR5 check for on return from the SEEK? What does it
> do next to see if the drive exists? (Can you tell me the next several
> steps without giving away company secrets?)
It's not the company secrets, it's the difficulty of understanding
exactly what the code does, with all its layers of function calls,
interrupts to be handled, etc.
> The bochs log file has (does this help?):
> 00290159500d[HD ] IO write to 01f3 = 01
> 00290159509d[HD ] IO write to 01f4 = 00
> 00290159518d[HD ] IO write to 01f5 = 00
> 00290159530d[HD ] IO write to 01f6 = a0
> 00290159833d[HD ] IO write to 01f7 = 70
> 00290159833d[HD ] write cmd 0x70 executed <== my debug stmt
> 00290159833d[HD ] concat_image_t.lseek(0)
> 00290159833d[HD ] SEEK completed <== my debug stmt
> 00292142586d[HD ] IO write to 03f6 = 04
> 00292142586d[HD ] hard drive: RESET
> 00292143282d[HD ] IO write to 03f6 = 00
> 00292143282d[HD ] Reset complete {DISK}
> 00293833375d[HD ] 8-bit read from 03f6 = 50 {DISK}
> 00293833390d[HD ] 8-bit read from 01f1 = 01 {DISK}
> 00378278000p[XGUI ] >>PANIC<< POWER button turned off.
> (the last line is where I hit the virtual "Big Red Switch")
Help me with the log format -- what are the digits and 'd' or 'p' before
each entry -- time in microseconds? What's the letter? And what does
the panic / "I hit the switch" stuff mean? Do you mean: nothing seemed
to be happening, so you gave up and hit reset?
It might help if the log included the program counter address (EIP) from
which each I/O was done.
... OK, if I assume that the numbers are 1uS timestamps, that gives us
just under 2 sec between the SEEK and the RESET. The driver sets a
watchdog timer, then issues the SEEK and expects a completion interrupt.
The timer is supposed to be 1 second, but I'm sure a bunch of factors
could cause it to be off by a factor of 2 in an emulated environment.
So it looks like it's hitting its timeout code because it never got the
completion interrupt. That causes it to see no disk.
You need to go back to the IDE READ implementation from which you
derived the SEEK; remove the parts that have anything to do with
actually "seeking" since they're irrelevant; add in whatever causes it
to generate a virtual completion interrupt.
>Bela<
Sometimes I'm a bit slow <grin> but I almost always get it!
Now that I know why the SEEK "action" is irrevelant, I can remove that
part. Is this the same for an ATAPI CD?
I believe the number is microseconds. the letter is (d)ebug, (i)nfo,
(e)rror, or (p)anic. After the kernel reports that the hd was not
found, it shows "press any key to continue". Since I know the install
will fail w/o a hd, I press the virtual power button, which causes the
last panic.
>
> It might help if the log included the program counter address (EIP) from
> which each I/O was done.
Being *very* new to programming bochs (about an hour under my belt), I
don't know how big of a deal this mught be. I'll spend a little time
looking into that...
>
> ... OK, if I assume that the numbers are 1uS timestamps, that gives us
> just under 2 sec between the SEEK and the RESET. The driver sets a
> watchdog timer, then issues the SEEK and expects a completion interrupt.
> The timer is supposed to be 1 second, but I'm sure a bunch of factors
> could cause it to be off by a factor of 2 in an emulated environment.
>
> So it looks like it's hitting its timeout code because it never got the
> completion interrupt. That causes it to see no disk.
>
> You need to go back to the IDE READ implementation from which you
> derived the SEEK; remove the parts that have anything to do with
> actually "seeking" since they're irrelevant; add in whatever causes it
> to generate a virtual completion interrupt.
The "SEEK completed" message in the log file is written just before I
call the completion interrupt routine. Perhaps this is not the right
SEEK to be looking at?? (Working with other people's code,
particularly something as complex as this emulation is, can be
frustrating!)
>
> >Bela<
Thanks again for the help, Bela.
Carl
> Now that I know why the SEEK "action" is irrevelant, I can remove that
> part. Is this the same for an ATAPI CD?
I can take this question several ways, and I don't really know the
answer to any of the possible questions... If you're asking: does
OSR5's "wd" driver require an ATAPI CD to respond to IDE SEEK commands
in order to be recognized; I _think_ the answer is "no".
If you're asking whether any sort of SEEK command for an ATAPI CD would
be irrelevant, I would guess "yes". The answer to that boils down to
a question: does the device's READ/WRITE command interface specify the
coordinates in those commands? Both IDE and SCSI READ/WRITE commands
include the coordinates (cyl/head/sec or LBA); I would expect ATAPI to
be the same. So all a SEEK could do is test media size, and (possibly)
give some performance benefit. Oh, and possibly set the heads in the
"right" position for a shutdown.
Ok...
> > It might help if the log included the program counter address (EIP) from
> > which each I/O was done.
>
> Being *very* new to programming bochs (about an hour under my belt), I
> don't know how big of a deal this mught be. I'll spend a little time
> looking into that...
Good. I wouldn't expect it to be too difficult -- it obviously has
control at the moment when an I/O is being done, or it wouldn't be able
to print those messages. (And it wouldn't be able to emulate
hardware...) All that's needed is to dredge up the EIP. There's a good
chance it'll already have it in some bochs-specific variable or
structure; if not, it'll be on the stack somewhere that'll be relatively
easy to access.
> > ... OK, if I assume that the numbers are 1uS timestamps, that gives us
> > just under 2 sec between the SEEK and the RESET. The driver sets a
> > watchdog timer, then issues the SEEK and expects a completion interrupt.
> > The timer is supposed to be 1 second, but I'm sure a bunch of factors
> > could cause it to be off by a factor of 2 in an emulated environment.
> >
> > So it looks like it's hitting its timeout code because it never got the
> > completion interrupt. That causes it to see no disk.
> >
> > You need to go back to the IDE READ implementation from which you
> > derived the SEEK; remove the parts that have anything to do with
> > actually "seeking" since they're irrelevant; add in whatever causes it
> > to generate a virtual completion interrupt.
>
> The "SEEK completed" message in the log file is written just before I
> call the completion interrupt routine. Perhaps this is not the right
> SEEK to be looking at?? (Working with other people's code,
> particularly something as complex as this emulation is, can be
> frustrating!)
I think we're making pretty good progress. It _is_ complex and
difficult, it _will_ take time, but as long as forward progress is made,
I am encouraged.
Can you add a message _after_ you've sent the completion interrupt?
"wd" only does two SEEKs (remember, it's mostly an irrelevant action) --
the one we're working around, to detect drives; and one in its shutdown
code, to "park" the heads. We can't be hitting the "park" instance
because it wouldn't be doing that unless it thought it had successfully
detected a hard disk...
Looking at wdintr(), the interrupt handler, it looks like you should
have seen several I/O reads in the interrupt routine. Since those
aren't present, I'm pretty sure it never saw the interrupt. The
recovery path for not seeing the interrupt involves timing out after 1
second, then sending a RESET to the controller/drive coordinates that
were under test; which is what we see (except it takes 2 seconds).
Also, in this reset code:
> > > 00292142586d[HD ] IO write to 03f6 = 04
> > > 00292142586d[HD ] hard drive: RESET
> > > 00292143282d[HD ] IO write to 03f6 = 00
The corresponding source looks like:
iooutb(wdcbase[ctlr] + WDSCTRL, 0x04); /* RESET */
suspend(300); /* 300us */
iooutb(wdcbase[ctlr] + WDSCTRL, 0x00); /* re-enable controller */
The time stamps show that the two happened 696us apart, which again
makes me think there's a factor of 2 happening somewhere in bochs'
timing. (Just a side issue...)
>Bela<
they have fast processors, but slow (100Mhz front side busses) and slow (133Mhz)
memory.
I suspect that the processor will spend a lot of time waiting for memory or
data.
What do you think ?
But, the price is great and I love getting a computer without the OS.
--
- bill -
bill at TechServSys dot com
(don't just reply - it goes to a spamtrap)
Yeah, but neither Bill or I was suggesting these as main machines. This
is the kind of thing you use as a "beater box"- something that you can
test things out on, where nothing important gets stored so it can be
wiped out at any time.
I keep two beaters; one permanently attached to a kvm switch with my
usial system, the other just loose. There are also a few other
"slightly dead" boxes that could be resurrected if I needed them and
otherwise serve as a source of parts now and then or convenient stands
at others.
>> Brand name is Microtel. 1GH Celeron, 128MB RAM, 40GB ultra-dma,
>> cdrom and 56K modem, no monitor $399.
>>
>> A system with a 1.6Ghz P4, 400Mhz FSB, and 20GB ultra-dma HD is
>> $498. Add $100 more for 256MB ram and a 40GB HD.
>> All without the MS $100 [approx] SW penalty.
>> From the place you'd never expect. www.walmart.com.
>they have fast processors, but slow (100Mhz front side busses)
>and slow (133Mhz) memory. I suspect that the processor will
>spend a lot of time waiting for memory or data. What do you
>think ? But, the price is great and I love getting a computer
>without the OS.
As Tony menitioned in this reply we didn't recommend these as
servers.
But - let me copy that above paragraph again so you can re-read it.
>> A system with a 1.6Ghz P4, 400Mhz FSB, and 20GB ultra-dma HD is
>> $498. Add $100 more for 256MB ram and a 40GB HD.
Surely looks like I said 400MHz FSB doesn't it? Note the price
of $498. The $398 version is the one with the 100MHz FSB.
*mumble*mumble* you guys suck! *mumble*mumble*exchange rate*mumble*
bkx
I got past the "no root hd". It appears that OSR is writing 0x08 to
the Device Control Register (0x03f6), which should be enabling
interrupts on the hd. The emulated controller was not enabling them
on both devices, so the subsequent SEEK was not triggering the
interrupt, which caused OSR not to see the disk. Here's part of the
log:
00249399529d[HD ] IO write to 03f6 = 08
00249399529d[HD ] s[0].controller.control.disable_irq = 02 <== hd
DISABLED
00249399529d[HD ] s[1].controller.control.disable_irq = 00 <== CD
Enabled
00249712652d[HD ] IO write to 01f3 = 01
00249712661d[HD ] IO write to 01f4 = 00
00249712670d[HD ] IO write to 01f5 = 00
00249712682d[HD ] IO write to 01f6 = a0
00249712985d[HD ] IO write to 01f7 = 70
00249712985d[HD ] write cmd 0x70 executed, EIP=f00138d2
00249712985d[HD ] s[0].controller.control.disable_irq = 02
00249712985d[HD ] s[1].controller.control.disable_irq = 00
00249712985d[HD ] SEEK completed. error_register = 04
00249712985d[HD ] raise_interrupt called, disable_irq = 02
00249712985d[HD ] Not raising interrupt <== not
seeing hd
00249712985d[HD ] SEEK interrupt completed
I changed the code to set both devices' disable_irq based on the value
written to 03f6. Bela, do you know if this is the correct behaviour?
(BTW, the prior hd command issued was looking at device 1 [cd], so the
program was only setting disable_irq for that one device...)
Oh, yeah, you may have noticed that I found the value of EIP... (Does
it look reasonable?) I can throw it into any debugging statements
that you'd like it in in the future... I also fignred out the timing
issue (the "factor of 2" thing): There's a bochs configuration
parameter for # instructions per second to emulate. It had to be
tweeked (from 2,000,000 to 700,000). With the new value, the logs
showed exactly 300us between the seek attempt and the reset (in a
version that failed to see the hd)...
As far as I know, that clears up ALL of the PRIOR issues!! Yippie!
Unfortunately, [I bet you guessed I'd say that <grin>] the install
only went on a few more screens before bochs panic'ed. After telling
the install program to install from CD, I got the bochs error "start
disc not implemented" (for ATAPI command 0x1b). I'll look into seeing
what needs to be done for this, but from the comment in the program,
it may be a bit more complex than prior issues. (But then again,
maybe not. I'm winging it here!) Bela, can you give me some insight
as to what the install program is trying to do here, what it expects
from the machine, etc.?
Thanks again, Bela, for all of the help.
Carl
> Unfortunately, [I bet you guessed I'd say that <grin>] the install
> only went on a few more screens before bochs panic'ed. After telling
> the install program to install from CD, I got the bochs error "start
> disc not implemented" (for ATAPI command 0x1b). I'll look into seeing
> what needs to be done for this, but from the comment in the program,
> it may be a bit more complex than prior issues. (But then again,
> maybe not. I'm winging it here!) Bela, can you give me some insight
> as to what the install program is trying to do here, what it expects
> from the machine, etc.?
I'll reply to the rest later, but let me address the stumbling block
now...
That command is being issued by the SCSI CD-ROM driver (ATAPI is
basically just a thinly disguised version of SCSI). It just wants to
make sure the drive is spinning the disc. bochs can just ignore that.
You'll probably next trip on a "lock unit" or something like that: the
driver telling the drive not to let you eject the disc. Again, bochs
can just ignore it.
Regarding the EIP values: yes, it looked reasonable; now, can you
twiddle the log format so that every line includes the corresponding EIP
(hopefully in a constant column, like the timestamp, for ease of
scanning)?
>Bela<
> I got past the "no root hd". It appears that OSR is writing 0x08 to
> the Device Control Register (0x03f6), which should be enabling
> interrupts on the hd. The emulated controller was not enabling them
> on both devices, so the subsequent SEEK was not triggering the
> interrupt, which caused OSR not to see the disk. Here's part of the
> log:
>
> 00249399529d[HD ] IO write to 03f6 = 08
> 00249399529d[HD ] s[0].controller.control.disable_irq = 02 <== hd DISABLED
> 00249399529d[HD ] s[1].controller.control.disable_irq = 00 <== CD Enabled
> 00249712652d[HD ] IO write to 01f3 = 01
> 00249712661d[HD ] IO write to 01f4 = 00
> 00249712670d[HD ] IO write to 01f5 = 00
> 00249712682d[HD ] IO write to 01f6 = a0
> 00249712985d[HD ] IO write to 01f7 = 70
> 00249712985d[HD ] write cmd 0x70 executed, EIP=f00138d2
> 00249712985d[HD ] s[0].controller.control.disable_irq = 02
> 00249712985d[HD ] s[1].controller.control.disable_irq = 00
> 00249712985d[HD ] SEEK completed. error_register = 04
> 00249712985d[HD ] raise_interrupt called, disable_irq = 02
> 00249712985d[HD ] Not raising interrupt <== not seeing hd
> 00249712985d[HD ] SEEK interrupt completed
>
> I changed the code to set both devices' disable_irq based on the value
> written to 03f6. Bela, do you know if this is the correct behaviour?
No, and I'm having trouble matching the above actions up to code in the
driver. This is where I really need the EIP addresses for every action:
if I have all the addresses, I can figure out which parts of the code
correspond to some of the more unique, obvious actions; and from that I
can figure out where all of the actions are coming from.
It would also help to have seen some of the log above the "IO write to
03f6" part.
> (BTW, the prior hd command issued was looking at device 1 [cd], so the
> program was only setting disable_irq for that one device...)
>
> Oh, yeah, you may have noticed that I found the value of EIP... (Does
> it look reasonable?) I can throw it into any debugging statements
> that you'd like it in in the future... I also fignred out the timing
> issue (the "factor of 2" thing): There's a bochs configuration
> parameter for # instructions per second to emulate. It had to be
> tweeked (from 2,000,000 to 700,000). With the new value, the logs
> showed exactly 300us between the seek attempt and the reset (in a
> version that failed to see the hd)...
Ok, good...
> As far as I know, that clears up ALL of the PRIOR issues!! Yippie!
>
> Unfortunately, [I bet you guessed I'd say that <grin>] the install
> only went on a few more screens before bochs panic'ed. After telling
> the install program to install from CD, I got the bochs error "start
> disc not implemented" (for ATAPI command 0x1b). I'll look into seeing
> what needs to be done for this, but from the comment in the program,
> it may be a bit more complex than prior issues. (But then again,
> maybe not. I'm winging it here!) Bela, can you give me some insight
> as to what the install program is trying to do here, what it expects
> from the machine, etc.?
I replied to this earlier.
I get the impression from the "wd" driver code that exactly what is
supposed to happen after a write to port 3F6 varies according to what
other I/O has been happening recently -- like that port is supposed to
access several different registers, or something like that. I'm not
looking at an IDE/ATAPI hardware spec while poking through the code, and
it's pretty bewildering...
>Bela<
First, some info to help based on your last two posts:
bochs implements ATA-2 for the hd, and ATAPI-4 for the CD. The specs
for these can be found at http://www.t13.org (page down some...). The
ATA-2 spec is actually a draft (never to be completed), so you can
just download it from that page. ATAPI-4 is completed, so it costs
$$, but you can download the ATAPI-6 draft... Also, you can get the
ATAPI CD-ROM Packet extension info at
http://akrip.sourceforge.net/8020r26.pdf. This describes the
"subcommands" of the ATA Packet command (basically SCSI commands
embedded in the ATA spec). As the ATA-2 spec states, writes to port
0x3f6 is the Device Control Register, and reads from 0x3f6 is the
Alternate Status Register, so, yes, how it is used is *very much*
dependant on prior I/O to the drive. After looking at the specs
briefly, I can see why the code could be quite bewildering -
especially if it wasn't *very* well commented!
Here's what I've done so far:
- I added EIP to the general log entry format. Future logs will show
this...
- I changed the panic for the start disk & read TOC command to a nop,
although I think, to "meet the spec", I should have implemented a read
of the TOC on the disk... However, SCO didn't seem to mind not
getting the TOC. Maybe I'll get to that some day (when necessary
<grin>)...
Now, where we're at:
During the install, I got the message "WARNING: wd0: timeout on fixed
disk dev 1/0" several times. Since it is referring to the second IDE
controller, which doesn't exist, I guess it's safe to ignore... (Why
does it keep trying to do anything with it? The kernel never
recognised a second controller at startup...)
Later in the install (just before and during the Installation Progress
screen [with progress bar]), I also got a whole lot of "WARNING: wd0:
timeout on fixed disk dev 1/42" messages. I'm hoping that SCO does a
retry on these... Got any clue as to what I should look for in order
to get rid of these?
The install also seems to be running VERY slow! It's been running for
almost 5 hours now, and still only at 5% installed (acording to the
progress bar)! The messages (particularly "copying file" messages) do
seem to update fairly regularly on the Installation Status line, so I
know work is being done. I imagine that the timeouts ain't helping in
this regard.
Other than that, it is running, with no other issues!!! (Apparently,
the "lock unit" was already implemented...) Now, I just need to wait
for it to complete!
I am going to let it run overnight. I'll post back tomorrow as soon
as it finishes...
Carl
> First, some info to help based on your last two posts:
>
> bochs implements ATA-2 for the hd, and ATAPI-4 for the CD. The specs
> for these can be found at http://www.t13.org (page down some...). The
> ATA-2 spec is actually a draft (never to be completed), so you can
> just download it from that page. ATAPI-4 is completed, so it costs
> $$, but you can download the ATAPI-6 draft... Also, you can get the
> ATAPI CD-ROM Packet extension info at
> http://akrip.sourceforge.net/8020r26.pdf. This describes the
> "subcommands" of the ATA Packet command (basically SCSI commands
> embedded in the ATA spec). As the ATA-2 spec states, writes to port
> 0x3f6 is the Device Control Register, and reads from 0x3f6 is the
> Alternate Status Register, so, yes, how it is used is *very much*
> dependant on prior I/O to the drive. After looking at the specs
> briefly, I can see why the code could be quite bewildering -
> especially if it wasn't *very* well commented!
The code is very old, patched hundreds of times for various reasons,
commented as to why, but the layering of comments, fixes, etc. makes it
hard to follow...
> Here's what I've done so far:
>
> - I added EIP to the general log entry format. Future logs will show
> this...
> - I changed the panic for the start disk & read TOC command to a nop,
> although I think, to "meet the spec", I should have implemented a read
> of the TOC on the disk... However, SCO didn't seem to mind not
> getting the TOC. Maybe I'll get to that some day (when necessary
> <grin>)...
>
>
> Now, where we're at:
>
> During the install, I got the message "WARNING: wd0: timeout on fixed
> disk dev 1/0" several times. Since it is referring to the second IDE
> controller, which doesn't exist, I guess it's safe to ignore... (Why
> does it keep trying to do anything with it? The kernel never
> recognised a second controller at startup...)
You're misunderstanding the message. "1/0" is the major and minor
number of the device that's having the problem. 1 is the major number
of the "hd" driver. Minor number 0 of that driver means "0th drive,
entire disk". So it's complaining about a timeout on the 0th drive,
which in your case is (virtual) IDE primary/master.
> Later in the install (just before and during the Installation Progress
> screen [with progress bar]), I also got a whole lot of "WARNING: wd0:
> timeout on fixed disk dev 1/42" messages. I'm hoping that SCO does a
> retry on these... Got any clue as to what I should look for in order
> to get rid of these?
It's got to be another case of virtual interrupts not being received.
Dev 1/42 is minor #42 of the hd drive; minor 42 is "0th drive, active
partition, division #2". By convention, that's where the install puts
your root filesystem.
> The install also seems to be running VERY slow! It's been running for
> almost 5 hours now, and still only at 5% installed (acording to the
> progress bar)! The messages (particularly "copying file" messages) do
> seem to update fairly regularly on the Installation Status line, so I
> know work is being done. I imagine that the timeouts ain't helping in
> this regard.
No, they wouldn't be helping. Looks like there is a 60-second timeout
after it decides the disk is "dead" (and issues the timeout message),
before it will try the disk again.
It's normal for the install to peg at "5%" for a long time: the
percentages displayed are _really_ crude. It's always seemed to me that
it displays "5%" for about 70% of the install.
> Other than that, it is running, with no other issues!!! (Apparently,
> the "lock unit" was already implemented...) Now, I just need to wait
> for it to complete!
>
> I am going to let it run overnight. I'll post back tomorrow as soon
> as it finishes...
Ok, great.
I'm worried that there will be corruption due to all the timeouts. The
timeouts you're hitting are accompanied by a log comment which I will
include here (edited to fix context):
- [problem] when power was removed from a drive: the whole driver
would hang!
- When issuing a command, set a timeout of 'wdcmdtime' seconds (or
'wdprmtime' seconds for a SET_PARM command) during which time all
commands are expected to complete. If no interrupt occurs within
this time assume that either the disk has hung, or that there is no
disk to respond (it was never there, or it has gone away.)
- Reset the controller and try a SET_PARM command: if this succeeds
carry on as usual (assume a temporary disk hang); if it fails assume
that the disk has gone west. Fail all further commands to this disk
during the next 'wdchktime' seconds, then try further commands in
the same way to allow for the possibility of it springing into life
again.
- The value of 'wdcmdtime' must be carefully chosen to allow for disks
which spin-down to save power during periods of inactivity, or which
self-calibrate. Likewise 'wdprmtime' should be "long enough", but
as short as possible to prevent the driver sleeping too long when
probing (probably) dead disks.
- Timeout values are tunable in (new) space.c
Unfortunately you can't easily patch those timeout values since you
aren't linking a new kernel, but using a premade one from the boot CD.
After further code reading, I think it fully retries all of these
timeouts. So your install is incredibly slow, but may be successful.
This gives you an opportunity to play around with some of these
parameters once the install is done.
First, either at the start of your next install attempt or when booting
the new completed install from the "hard disk", try disabling UDMA.
Boot with:
Boot
: defbootstr wd.udma=off
I say this because I was trying to figure out how you might be getting
these timeouts, and one code path that looked suspicious had to do with
UDMA. So it would be interesting to see whether the timeouts go away if
you don't try to use it. (Unfortunately, I would expect the performance
benefit of emulated UDMA over emulated PIO to be much higher than the
benefit of real UDMA over real PIO... -- you'll take a bad performance
hit for this.)
Second, if you've installed successfully, edit
/etc/conf/pack.d/wd/space.c and reduce wdcmdtime, wdrsttime, wdprmtime,
wdchktime. Though that's a bit silly -- we should figure out what part
of the hardware emulation isn't meeting the driver's expectations, and
fix one or the other.
>Bela<
The install finally finished (I'm not sure how long it took, as I let
it run overnight)! I shut down bochs, rebooted from the hard drive,
SCO OSR 5.0.5 booted, and I was able to log in as root!!! So it looks
like, with the mods I and the rom bios developer (Christophe Bothamy)
made, SCO is an installable guest OS within bochs! I will post my
mods to the bochs site (in bochs-developers discussion board, and in
my bug entry) within the next day or two.
I'd like to thank Christophe for his quick, responsive help with the
bios. A VERY big thanks goes out to Bela for providing so much info
on what OSR is trying to do, and for sticking with me. Thanks guys!!
Now, we're not totally out of the woods yet, though :-(...
First, the clock is racing at about 17 times what it should be (it
went ahead 2:34:34 in 9 minutes)! This MAY be just a tweek of the
emulated Instructions Per Second parameter, I'll have to check. This
may also be the reason why I got sooo many disk timeouts during the
install. However, we tweeked this parameter so that the log file
matched coded waits during the kernel boot, or at least that's what we
thought we were doing. I've noticed that the log file number is
'clock ticks'. Is that microseconds, or some multiple(fraction)
thereof??
Also, during the install, I did not configure the mouse or network
card. I'm still undecided about configuring the mouse, as I don't
have a real good reason to use it (for my specific purposes). Maybe
that will come later when I feel like "playing"... I do have a need
for the network card (mainly to get files to the emulated disk), so
that will be next. I'll report how it goes... (I believe that there
are other ways to get files onto the emulated hard disk, but the
network would be the easiest [if it works :-)].)
Wish me luck...
Carl
- I had to change the IPS from 700,000 to 51,000,000 (!) in order for
the date command within SCO to advance by 1 minute every real minute.
I'll let you know if there are side effects to this...
- I added three entries to the bochs bug reports, in case anyone else
trying to get OSR5 running in bochs needs them. (I guess they really
should have been put in the Patch section, but I was unaware that
there was one 'til after I posted them. A bochs admin might move them
to where they belong...) The first (bochs bug # 551109) adds the EIP
to the log file timestamp (per Bela's request). The second (# 551111)
adds a button to the gui that writes a marker into the log file.
(This simplifies finding the portion of the log that you're interested
in.) The last (#551115) is the patch to the hard drive emulation code
(harddrv.cc) to send responses back to OSR5 that it expects.
Now, I'll try getting the network going...
Carl
Well, getting the network "card" going wasn't all that bad! What I
did:
- I needed to re-compile bochs with networking implemented (ran
./configure --enable-ne2000)
- Added the "ne2k:" line to the bochs config file, as suggested in
that file's comments (for linux), except that I used IRQ10 instead of
9 (9 wasn't listed as being available in "scoadmin network"...)
- Used scoadmin to configure a Novell NE2000 card and TCP/IP
- bochs uses "raw" network card access, so I had to "setuid root" for
the bochs executable. (I don't like this, but until I can find out
how to get [as a regular user] CAP_NET_RAW capability, I guess I'm
stuck...)
It seemed to work as soon as I rebooted the new kernel! I was even
able to telnet to my old SCO box, as well as ftp some files from it.
I did have bochs panic once during an ftp transfer, but after
restarting bochs and trying the file transfer again, it went OK. I
guess I won't be able to debug that one <grin>...
Everything else seems to be running quite smoothly, so far, except for
the clock. Bochs seemed quite slow with the very high emulated IPS
value, so for the time being, I'm living with the fast clock
[periodically resetting it back to "real time"]. Bela, can you give
me any insights as to how OSR5 gets the time? The emulated clock
speed seems to be dependant on what is going on with OSR5, so even
setting an IPS to any one value is not going to totally solve this
issue. This may be another thing that I'll have to fix...
Oh, yeah, I get a very occasional wd0 timeout, but I think I can live
with that (at least for now). And I have not figured out how to
switch between multiscreens within bochs. Is this configurable within
OSR5?
Other than that, I'm pleasantly surprised with how well bochs is
handling OSR5.
Printing, I think, will be my next hurdle. (Bochs directs the output
to a file, so unless I can write a pipe to split it into print jobs,
this might be tough!)
I'll keep you informed on my progress.
Carl
P.S., The three patches that I posted on the bochs bug tracker are now
part of the patches directory within the CVS tree. Hopefully, they
will become a permanent part of bochs in the not-too-distant future!
>
> Printing, I think, will be my next hurdle. (Bochs directs the output
> to a file, so unless I can write a pipe to split it into print jobs,
> this might be tough!)
How about a named pipe? http://pcunix.com/SCOFAQ/scotec7.html#netdevice
Actually, that's what I was planning on doing. I'm just not sure if
bochs closes the emulated printer port's file inbetween spool jobs.
(If I had to guess, I'd say 'probably not', because it might overwrite
the prior spool job.) If not, the 'cat' will just hang until I exit
bochs - not my desired result. I have some thoughts as to possible
ways to get around this, though. (Both linux side and bochs side.
I'll play, and post what works [if anything <g>].)
Thanks for the suggestion, tho!
Carl
I decided to use a larger hard disk image (2Gb), so I re-installed the
OS, applied release supplements, brought over users and files from my
old box, etc. It all went smoothly, except if you try to boot bochs
from the install CD with the bochs ne2000 emulator active it will
crash. This can be turned on/off via the bochs configuration file
(.bochsrc). So, you have to install the OS, THEN configure
networking. I also found that bochs panics intermittently in the
ne2000 emulation during ftp file transfers. This was not consistent,
though, and seemed to "go away" when I lowered the emulated
Instructions Per Second (ips: in .bochsrc). If I have nothing better
to do some day (or if this becomes an issue for me), I may try to
figure out why it's panicking. (Don't hold your breath :-). )
The issues that I still need to resolve are:
- The clock does not advance consistently. This is a bochs problem,
and from what I can tell is being worked on. I guess I'm just going
to wait for the fix to come through (even though this may take some
time). For the time being, I'm just starting the clock at midnight,
so at least the dates are correct (or close)...
- bochs emulates a parallel port, but sends the data to a file. I
need to get this to go to the print spooler. My plan of attack here
is to have that output file be a named pipe, then have a process in
the background read the pipe, split it into separate files (based on a
delay of data being sent to it), and sending these new files to the
spooler. Shouldn't be a huge deal. I just need to get to it.
- Pressing F1, F2, ... is switching which multiscreen is active within
OSR5. The application that I need OSR5 for uses (unshifted) function
keys extensively for commanding it what to do next. I therefore need
to change what keys are used for the multiscreen switch. I think I
saw that documented somewhere, back in '96 or so :-).
- The bochs emulated display is small on my screen. I have a 21"
monitor set at 1600x1200[-ish] resolution, so the screen is about
4"x6". I'd like to see it about twice that size, so my poor, old,
tired eyes get a break.
Other than these issues, it seems to be working pretty good.
(However, I can't say I've put it "through it's paces", by any
streatch of the imagination!) The emulation speed is pretty slow, but
on my 1.7GHz P4, it's bearable.
Once again, thanks to all for the help. I'll report back once I get
the above issues solved...
Carl
The solution that I am working on for the window size is to use a
bigger font for the text-mode VGA emulation. (That is, using a bigger
X-Windows font to represent the standard font emulated.) This is
going pretty well, but there are a few issues that I need to clean up.
It involves modifying the bochs code to translate some sizes (window,
cursor) between the emulated 8x16 VGA font and the 11x19 VGA font that
I want to use. (This 11x19 font is MUCH easier to read!)
I thought I had the multiscreen problem licked, by just changing the
key map file (/usr/lib/keyboard/keys) so that <ctrl>1 switches to
screen 1, <ctrl>2 to screen 2, etc. This apparently worked, and had
the apparent side effect of "undoing" the problem where Fn (unshifted)
was switching screens. I could think of no good reason for this side
effect, but since it was desired, I didn't give it much thought...
...Until today. Now, pressing the unshifted number keys is switching
the screen! (Pressing either 2 or <ctrl>2 both switch screens, so the
<ctrl> keypress is not toggling what OSR5 seems to think it's state
is.) I believe this all has to do with a bug in bochs, where the
state of the shift keys is not being manipulated properly.
This leaves me with a few questions:
- How does the OSR5 console driver determine the state of the shift
keys before it looks up what it should do in the keys file?
- How does OSR5 reset the keyboard? (The bochs folks thought the way
OSR5 recognised the presence of a disk was esoteric; maybe they think
this keyboard reset method is as well [and have not implemented it
yet].)
- I noticed in the bochs log that the LED status seems to be cleared.
Does OSR5 expect that these shift states are cleared at the same time?
(I haven't determined [yet] if bochs clears them...)
Does anyone know what OSR5 does here?
Thanks for the help,
Carl
> Well, I've been working on the window size and multiscreen problems.
>
> The solution that I am working on for the window size is to use a
> bigger font for the text-mode VGA emulation. (That is, using a bigger
> X-Windows font to represent the standard font emulated.) This is
> going pretty well, but there are a few issues that I need to clean up.
> It involves modifying the bochs code to translate some sizes (window,
> cursor) between the emulated 8x16 VGA font and the 11x19 VGA font that
> I want to use. (This 11x19 font is MUCH easier to read!)
>
> I thought I had the multiscreen problem licked, by just changing the
> key map file (/usr/lib/keyboard/keys) so that <ctrl>1 switches to
> screen 1, <ctrl>2 to screen 2, etc. This apparently worked, and had
> the apparent side effect of "undoing" the problem where Fn (unshifted)
> was switching screens. I could think of no good reason for this side
> effect, but since it was desired, I didn't give it much thought...
> ...Until today. Now, pressing the unshifted number keys is switching
> the screen! (Pressing either 2 or <ctrl>2 both switch screens, so the
> <ctrl> keypress is not toggling what OSR5 seems to think it's state
> is.) I believe this all has to do with a bug in bochs, where the
> state of the shift keys is not being manipulated properly.
That's plausible...
> This leaves me with a few questions:
>
> - How does the OSR5 console driver determine the state of the shift
> keys before it looks up what it should do in the keys file?
It tracks individual up/down state for all of the shift _keys_ (i.e. it
knows, or thinks it knows, whether Left-Ctrl is currently being held
down, independently of whether Right-Ctrl is currently being held down;
and so on). It tracks these by the keyboard make/break codes. Then,
obviously if any of the Ctrl keys are being held down at the moment,
Ctrl is in effect for the lookup in the keymap; etc.
The keyboard driver puts the keyboard in XT scancode mode (7-bit make
codes, break codes are make+0x80) by default. AT scancode mode (8-bit
make codes, with something like "0xF0 make" as break code) is supported
by the code, but has various limitations that make it better to use XT
mode; besides, it won't go into AT mode without your explicitly setting
it up. Linux probably defaults to AT mode and bochs probably captures
the make/break stream directly if it can, then translates it AT->XT for
OSR5, since OSR5 has programmed the virtual keyboard controller to XT
mode. This may be a less tested code path of bochs since I would guess
most other hosted OSes use AT scancode mode by default.
> - How does OSR5 reset the keyboard? (The bochs folks thought the way
> OSR5 recognised the presence of a disk was esoteric; maybe they think
> this keyboard reset method is as well [and have not implemented it
> yet].)
I'm not going to go look this up in the code right now, ask again if
other routes don't help...
> - I noticed in the bochs log that the LED status seems to be cleared.
> Does OSR5 expect that these shift states are cleared at the same time?
> (I haven't determined [yet] if bochs clears them...)
No, LEDs are programmed by OSR5 to reflect what it thinks the current
lock states are (which should generally be right since the lock states
it's going to apply are the ones it's tracking). Programming the LEDs
isn't supposed to affect any state besides the visible state of the
LEDs.
>Bela<
I think I figured out where the problem is. Bochs is pretty slow, and
it takes a while for OSR5 to boot up. So, while it's doing that, I
frequently switch to another X window and do other stuff. It so
happens that I almost always use <alt><tab> to switch windows. This
is caught by X, but is also passed on to OSR5. That is, at least the
MAKE codes are passed on, but the BREAK codes are NOT. Therefore,
OSR5 THINKS the <alt> key is pressed (since it got the make but not
the break), even though it is not (when the bochs X window regains
focus). If I just press and release the <alt> key (after returning to
the bochs window), things start working normally again. (That's also
why I had a hard time duplicating the problem on Friday: I wasn't
switching between X windows.)
I'll have to think through how this could get fixed (or if it's
something that will have to be lived with). It's kinda related to the
<alt><ctrl><Fn> switching between multiscreens in OSR5 and
pseudo-terminals (same as multiscreens) in Linux. I would guess one
possible solution is to totally intercept all keyboard input before X
acts upon it (possible???) while bochs is the window in focus. Then,
you'd have to use the mouse to change X window focus. I'm not too
sure I like that... Hummmm...
I've also been working on the font size issue. Some changes to bochs
in this reguard has been committed to the CVS tree, which I'll have to
look at. But this should be solved soon...
Thanks again,
Carl
> I think I figured out where the problem is. Bochs is pretty slow, and
> it takes a while for OSR5 to boot up. So, while it's doing that, I
> frequently switch to another X window and do other stuff. It so
> happens that I almost always use <alt><tab> to switch windows. This
> is caught by X, but is also passed on to OSR5. That is, at least the
> MAKE codes are passed on, but the BREAK codes are NOT. Therefore,
> OSR5 THINKS the <alt> key is pressed (since it got the make but not
> the break), even though it is not (when the bochs X window regains
> focus). If I just press and release the <alt> key (after returning to
> the bochs window), things start working normally again. (That's also
> why I had a hard time duplicating the problem on Friday: I wasn't
> switching between X windows.)
Oh, I should have suggested hitting and releasing Alt. Actually I have
a little brain macro that I use for such problems on all OSes
(frequently helpful on Windows 98, for instance): I hit and release, one
at a time: left-ctrl, left-alt, left-shift, right-ctrl, right-alt,
right-shift; put mouse cursor in a safe place, then left-mouse,
middle-mouse, right-mouse; then hit and release twice each: numlock,
capslock, scrolllock. This fixes a _lot_ of problems with a lot of
OSes.
In your specific case, I would think that OSR5 would see a BREAK code
for Alt when you Alt-Tab back to it. If Bochs sends it the MAKE code on
the way out and doesn't send it the BREAK on the way back in, that looks
like a Bochs bug.
Of course if you Alt-Tab _away_ from Bochs, then use a mouse click to
reactivate it, I'm not really sure what Bochs could do about it. In
that case you probably legitimately should have to hit and release Alt
to notify the hosted OS that you are no longer pressing Alt.
> I'll have to think through how this could get fixed (or if it's
> something that will have to be lived with). It's kinda related to the
> <alt><ctrl><Fn> switching between multiscreens in OSR5 and
> pseudo-terminals (same as multiscreens) in Linux. I would guess one
> possible solution is to totally intercept all keyboard input before X
> acts upon it (possible???) while bochs is the window in focus. Then,
> you'd have to use the mouse to change X window focus. I'm not too
> sure I like that... Hummmm...
You probably don't want to completely change Bochs' input architecture
to deal with this...
> I've also been working on the font size issue. Some changes to bochs
> in this reguard has been committed to the CVS tree, which I'll have to
> look at. But this should be solved soon...
>
> Thanks again,
Sure,
>Bela<