Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Simh. How to triple the speed

479 views
Skip to first unread message

Alan Greig

unread,
Aug 22, 2005, 7:19:50 PM8/22/05
to

Anyone running the current distributed VAX simh binaries (3.4-0) with
ethernet support should note that these binaries are compiled with debug
code and no optimization. The problem is not present in the version
without ethernet support. The upshot of this is that your emulated VAX
will go nearly three times as fast if you recompile from source and set
the optimization level to -02 in the makefile.

I've been in touch with Bob Supnik and he is aware of this so hopefully
updated binaries will appear soon on http://simh.trailing-edge.com/

I can supply a pre-built vax.exe if you trust Windows executables and
can't wait.

--
Alan Greig

Mark Hittinger

unread,
Aug 22, 2005, 10:20:18 PM8/22/05
to
Alan Greig <grei...@netscape.net> writes:
> The upshot of this is that your emulated VAX
>will go nearly three times as fast if you recompile from source and set
>the optimization level to -02 in the makefile.

Faster still if you use the Intel compiler and turn on all the optimizations.

Later

Mark Hittinger
bu...@pu.net

VAXman-

unread,
Aug 23, 2005, 8:19:31 AM8/23/05
to

Bugger the Weendoze exes... where are the OS X exes???

--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM

"Well my son, life is like a beanstalk, isn't it?"

Alan Greig

unread,
Aug 23, 2005, 10:11:15 AM8/23/05
to

VAXman- @SendSpamHere.ORG wrote:

>
> Bugger the Weendoze exes... where are the OS X exes???

The source is highly portable C with very little OS dependency. Should
compile under OS X easily. Possibly even with no changes to the
makefile. In fact a quick web search finds references to folks running
various simh emulators under OS X - although I couldn't find a VAX
specific reference.

I built the Windows version with gcc (the mingw mininal gnu for windows
compiler) and you should be able to get 20 VUPS or more out of this with
the fastest home PC (which I don't have).

It will also build out of the box under VMS with DEC C on Alpha/Itanium.
Anyone want to build it on Itanium and post some VUPS feedback?

--
Alan Greig

Alan Greig

unread,
Aug 23, 2005, 10:29:08 AM8/23/05
to

Mark Hittinger wrote:

> Alan Greig <grei...@netscape.net> writes:
>
>> The upshot of this is that your emulated VAX
>>will go nearly three times as fast if you recompile from source and set
>>the optimization level to -02 in the makefile.
>
>
> Faster still if you use the Intel compiler and turn on all the optimizations.

Fancy posting any VUPS figures? A deja search for calculate_vups in
comp.os.vms will turn up at least one DCL approximation.

If anyone has a more accurate VUPS calculator could you let me know. I
used to have a set of C programs I used to calculate VUPS but I can't
currently read the TK70 they reside on.

--
Alan Greig

Bob Koehler

unread,
Aug 23, 2005, 1:13:39 PM8/23/05
to
In article <00A48B43...@SendSpamHere.ORG>, VAXman- @SendSpamHere.ORG writes:
> In article <q4tOe.24756$Il.1...@fe2.news.blueyonder.co.uk>, Alan Greig <grei...@netscape.net> writes:
>>
>>Anyone running the current distributed VAX simh binaries (3.4-0) with
>>ethernet support should note that these binaries are compiled with debug
>>code and no optimization. The problem is not present in the version
>>without ethernet support. The upshot of this is that your emulated VAX
>>will go nearly three times as fast if you recompile from source and set
>>the optimization level to -02 in the makefile.
>>
>>I've been in touch with Bob Supnik and he is aware of this so hopefully
>>updated binaries will appear soon on http://simh.trailing-edge.com/
>>
>>I can supply a pre-built vax.exe if you trust Windows executables and
>>can't wait.
>
> Bugger the Weendoze exes... where are the OS X exes???
>

I had no problem building SIMH under OS X. I also didn't bother
hanging on to it.


Chris Allen

unread,
Aug 23, 2005, 7:00:48 PM8/23/05
to
Alan Greig wrote:
> Fancy posting any VUPS figures?

I have simh compiled on a FreeBSD PC with gcc option -O2 (and
networking). The computer is a P4 1.5GHZ with 754MB RAM and 128MB
allocated to the simulator. I consistantly get between 6.2 VUPS and
6.6 VUPS on it with *almost* nothing running in the background. To
contrast this with a real VAX my VAX 4000-108 gets 32 VUPS. I tested
this with some DCL scripts I found on comp.os.vms. I doubt it's that
accurate but I was very surprised to see the simh VAX perform so poorly.

Dan Foster

unread,
Aug 23, 2005, 7:21:18 PM8/23/05
to

That's not bad. SIMH emulates the MicroVAX 3900, which has a VUPS rating
of 3.8.

Mr. Quayle, a CHARON reseller, mentioned it was ~95 VUPS on a 'high end
PC' (making me think 3 GHz Xeon or some such) for CHARON, last year.

Could SIMH do better? Probably. CHARON, I understand, blows it out of
the water, so it may be doing something special 'behind the scenes'. As
CHARON is not open source, don't think anyone will ever know exactly how
it manages to handle its performance aspect.

I don't see much in way of threading (it's there, but kind of minimal)
with SIMH, and the load average jumps to 1.0 when I run it, so I gather
it's not yielding the CPU even when 'idle' unlike with other emulators
like VMware. That indirectly suggests it's missing out on other
performance optimization tricks.

SIMH seems to be designed more to be functional with particular
attention paid to correctness than to be high performing per se.

Could you make the DCL scripts available or mention more details such
that I could locate them via a web search? I think I've heard of the
ones you used but can't remember enough useful details to find them again.

I'd be happy to try it on SIMH+VMS 7.3 setup running on a P4/3.0 GHz and
report back results.

-Dan

Alan Greig

unread,
Aug 23, 2005, 8:16:22 PM8/23/05
to

Dan Foster wrote:

>
> That's not bad. SIMH emulates the MicroVAX 3900, which has a VUPS rating
> of 3.8.

> Mr. Quayle, a CHARON reseller, mentioned it was ~95 VUPS on a 'high end
> PC' (making me think 3 GHz Xeon or some such) for CHARON, last year.

I've just tried the demo of Charon VAX (I know there may be faster
Charon products so take this as indicitive of the demo only as per the
agreement I clicked on)and it gets 11 VUPS on my PC. Simh gets me 7.8
so there is obviously some scope for speeding up simh. But simh is a
free emulator which doesn't have full time staff and commercial funded
development. I think simh does rather well. I'd be amazed if anyone is
getting 95 VUPS out of Charon VAX unless it's a cluster emulated on an
SMP system. And you can do that with simh has well. Simh also emulates a
huge range of proecssors other than VAX of course.

>
> Could you make the DCL scripts available or mention more details such
> that I could locate them via a web search? I think I've heard of the
> ones you used but can't remember enough useful details to find them again.

Watch as it may wrap.

$! CALCULATE_VUPS:
$!
$ set noon
$ orig_privs = f$setprv("ALTPRI")
$ process_priority = f$getjpi(0,"PRIB")
$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
$ cpu_round_divide = cpu_round_add + 1
$ init_counter = cpu_multiplier * 525
$ init_loop_maximum = 205
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 10$:
$ loop_index = loop_index + 1
$ if loop_index .ne. init_loop_maximum then goto 10$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ init_vups = ((init_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ loop_maximum = (init_vups * init_loop_maximum) / 10
$ base_counter = (init_counter * init_vups) / 10
$ vups = 0
$ times_through_loop = 0
$ 20$:
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 30$:
$ loop_index = loop_index + 1
$ if loop_index .ne. loop_maximum then goto 30$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ new_vups = ((base_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ if new_vups .eq. vups then goto 40$
$ vups = new_vups
$ times_through_loop = times_through_loop + 1
$ if times_through_loop .le. 5 then goto 20$
$ 40$:
$ new_privs = f$setprv(orig_privs)
$ set message /nofacility/noidentification/noseverity/notext
$! ASSIGN/SYSTEM/EXEC 'vups' MACHINE_VUPS_RATING
$ set message /facility/identification/severity/text
$ write sys$output "Approximate System VUPs Rating : ", -
vups / 10,".", vups - ((vups / 10) * 10)
$ exit
$


> I'd be happy to try it on SIMH+VMS 7.3 setup running on a P4/3.0 GHz and
> report back results.
>
> -Dan

--
Alan Greig

Bill Gunshannon

unread,
Aug 23, 2005, 9:05:37 PM8/23/05
to
In article <q%OOe.34046$Il.1...@fe2.news.blueyonder.co.uk>,

I couldn't resist. I ran it on the VAX here.

Approximate System VUPs Rating : 28.0

That should keep the students happy.

bill

--
Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves
bi...@cs.scranton.edu | and a sheep voting on what's for dinner.
University of Scranton |
Scranton, Pennsylvania | #include <std.disclaimer.h>

Dan Foster

unread,
Aug 23, 2005, 9:13:43 PM8/23/05
to
Thanks for VUPS.COM.

My system:

3.0 GHz Pentium 4 + 800 MHz FSB + 1 MB L2 cache + Hyperthreading
("Northwood")
OS: Linux (2.6.12 kernel with SMP enabled for HT use)
Load avg of 0.60 before running SIMH
Load avg of 1.00 after starting up SIMH
Basic VMS 7.2/VAX installation -- no SSH server running, etc.
SIMH 3.4 compiled with gcc -O2, no debug options

12.2 VUPS reported for 3 runs and 11.8 for 1 run.

A little bit more than 3x the original VUPS of the MicroVAX 3900.

I doubt that running VMS 7.3 would significantly change this, given the
test is CPU-bound rather than I/O-bound, so I didn't bother to do a 7.3
installation for this quick test.

This, of course, assumes that VUPS.COM itself has reasonable ballpark
accuracy. I do not know this for sure. I wonder what the CHARON folks
used to measure the VUPS rating?

-Dan

Dan Foster

unread,
Aug 23, 2005, 9:26:24 PM8/23/05
to
In article <slrndgnifc...@zappy.catbert.org>, Dan Foster <use...@evilphb.org> wrote:
>
> My system:
>
> 3.0 GHz Pentium 4 + 800 MHz FSB + 1 MB L2 cache + Hyperthreading
> ("Northwood")

Minor correction: 512 KB L2 cache.

My apologies.

(Though I doubt the cache size makes a real difference in this case.)

-Dan

Dave Froble

unread,
Aug 23, 2005, 9:39:35 PM8/23/05
to

On a VAXstation 4000 model 90A:

Approximate System VUPs Rating : 26.2

I seem to remember that this system was rated at about 32 VUPs, and if
so, your procedure may need a bit of tuning.

--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. Fax: 724-529-0596
DFE Ultralights, Inc. E-Mail: da...@tsoft-inc.com
170 Grimplin Road
Vanderbilt, PA 15486

Alan Greig

unread,
Aug 23, 2005, 9:43:00 PM8/23/05
to

Dave Froble wrote:

>
>
> On a VAXstation 4000 model 90A:
>
> Approximate System VUPs Rating : 26.2
>
> I seem to remember that this system was rated at about 32 VUPs, and if
> so, your procedure may need a bit of tuning.

It's not mine. It's just one I know has kicked around for years in
slightly different formats. A google groups search of comp.os.vms was
where I found my current copy. May even have originated within DEC.

--
Alan Greig

Stanley F. Quayle

unread,
Aug 23, 2005, 10:01:10 PM8/23/05
to
On 23 Aug 2005 at 23:21, Dan Foster wrote:
> Mr. Quayle, a CHARON reseller, mentioned it was ~95 VUPS on a 'high
> end PC' (making me think 3 GHz Xeon or some such) for CHARON, last
> year.

CHARON-VAX is now available in a 6640 version, which requires a 4-way
box. On a 2+ GHz Operton, it's faster than any VAX ever built (> 200
VUPs). And Moore's Law keeps making it faster all the time.

> CHARON, I understand, blows it out of the water, so it may be doing
> something special 'behind the scenes'. As CHARON is not open source,
> don't think anyone will ever know exactly how it manages to handle its
> performance aspect.

Just ask. It's called "Accelerated CPU Emulation". It caches frequent
sequences of instructions and pre-compiles them. It's available in the
"Plus" versions of the product.

> the load average jumps to 1.0 when I run it, so I gather it's not yielding the
> CPU even when 'idle' unlike with other emulators like VMware.

CHARON-VAX comes with a little kernel module to detect the VMS idle
loop and drop the load.

> That indirectly suggests it's missing out on other performance optimization
> tricks.

The biggest issue is knowing where the idle loop is in an operating
system. Or even knowing what operating system is in use (we support
VAXeln and Digital Unix, too). Since CHARON-VAX is intended for server
applications, most customers don't care that 1 processor is running
100%.

I expect that you have to lose some performance coming out of that
"sleep" state. There's no free lunch, after all.

JF Mezei

unread,
Aug 23, 2005, 11:02:44 PM8/23/05
to
Alan Greig wrote:
>What does the DCL VUPS calculator say for your MV-II out of curiousity?


$ write sys$output f$getsyi("HW_NAME")
MicroVAX II
$ @calculate_vups
Approximate System VUPs Rating : 0.6

$ set proc/prio=15
$ @calculate_vups
Approximate System VUPs Rating : 0.8


-------------------------------------------

$ write sys$output f$getsyi("HW_NAME")
VAXstation 3100/SPX
$ @calculate_vups
Approximate System VUPs Rating : 2.8

Setting priority to 15 on the 3100 didn't make a difference.


Now, DEC promised my all mighty Microvax II would do 0.9. We paid big
money back in 1987. I want that money back along with interest :-) :-)
:-) :-)

Stanley F. Quayle

unread,
Aug 23, 2005, 11:27:17 PM8/23/05
to
> I'd be amazed if anyone is getting 95 VUPS out of Charon VAX unless it's
> a cluster emulated on an SMP system.

Not sure what you mean. If you have an emulated 6630 VAX, it's a
single VAX with 3 processors, not a cluster of 3 VAX systems.

Alan Greig

unread,
Aug 24, 2005, 7:16:25 AM8/24/05
to

Stanley F. Quayle wrote:

Ok, thanks. I didn't realise that Charon can now emulate an SMP VAX on
an SMP host.


--
Alan Greig

Alan Greig

unread,
Aug 24, 2005, 8:16:20 AM8/24/05
to

Stanley F. Quayle wrote:

>
> CHARON-VAX is now available in a 6640 version, which requires a 4-way
> box. On a 2+ GHz Operton, it's faster than any VAX ever built (> 200
> VUPs). And Moore's Law keeps making it faster all the time.

And just imagine how many VUPs it would get if it ran VMS native. I am
sure *well* over 10 times that figure. And goodness knows what we would
get from the 64-way SMP Horus Opteron - Anyone fancy a 50,000+ VUPs VMS
box?

>
> CHARON-VAX comes with a little kernel module to detect the VMS idle
> loop and drop the load.

I think it would be fairly easy to modify simh to detect the VMS idle
loop and, at least, drop the process priority. I leave simh running at
"below normal" Windows priority 6 (default 8) in any case most of the
time. It gets the full cpu when I'm actually working in it and doesn't
hinder normal use of XP so I can run it all the time. Actually it slows
my search for aliens (SETI at home) but I can live with that :-)

--
Alan Greig

Bill Gunshannon

unread,
Aug 24, 2005, 8:54:50 AM8/24/05
to
In article <430BE34B...@teksavvy.com>,

Just out of curiosity, I have a 4 CPU box. Is the VUPS rating
returned by this DCL for one CPU or for all of them? My guess
is that it is for only one.

Alan Greig

unread,
Aug 24, 2005, 9:12:47 AM8/24/05
to

Bill Gunshannon wrote:

>
> Just out of curiosity, I have a 4 CPU box. Is the VUPS rating
> returned by this DCL for one CPU or for all of them? My guess
> is that it is for only one.

It's per processor so multiply by 4.

> bill
>

--
Alan Greig

alexd...@themail.co.uk

unread,
Aug 24, 2005, 10:10:31 AM8/24/05
to
Stanley F. Quayle wrote:
> CHARON-VAX is now available in a 6640 version, which requires a 4-way
> box. On a 2+ GHz Operton, it's faster than any VAX ever built (> 200
> VUPs).

A VAX 7860 (6 CPU) clocks in at 306 VUPs.

With the Nemonix upgraded boards (which are fully supported by HP, and
can be put on your HP hardware support contact), it clocks in at 407
VUPs.

http://www.nemonixengineering.com/pdfs/VAX_7000_Series1-01.pdf

Alex

John Vottero

unread,
Aug 24, 2005, 10:59:32 AM8/24/05
to
"Alan Greig" <grei...@netscape.net> wrote in message
news:oyZOe.34513$Il.2...@fe2.news.blueyonder.co.uk...

Why not run SETI on VMS too?


Chris Allen

unread,
Aug 24, 2005, 12:51:47 PM8/24/05
to

Dan Foster wrote:

> Chris Allen wrote:
> >
> > I have simh compiled on a FreeBSD PC with gcc option -O2 (and
> > networking). The computer is a P4 1.5GHZ with 754MB RAM and 128MB
> > allocated to the simulator. I consistantly get between 6.2 VUPS and
> > 6.6 VUPS on it with *almost* nothing running in the background. To
> > contrast this with a real VAX my VAX 4000-108 gets 32 VUPS. I tested
> > this with some DCL scripts I found on comp.os.vms. I doubt it's that
> > accurate but I was very surprised to see the simh VAX perform so poorly.
>
> That's not bad. SIMH emulates the MicroVAX 3900, which has a VUPS rating
> of 3.8.

I see now (and after reading other posts) ~6.5 isn't bad at all.

Arne Vajhøj

unread,
Aug 24, 2005, 3:19:16 PM8/24/05
to
Dan Foster wrote:
> Could SIMH do better? Probably. CHARON, I understand, blows it out of
> the water, so it may be doing something special 'behind the scenes'. As
> CHARON is not open source, don't think anyone will ever know exactly how
> it manages to handle its performance aspect.

Virtual machines and JIT compilation is wellknown
technologies today (Java and .NET).

It is not difficult to imagine Charon using some
of theese techniques.

Arne

Stanley F. Quayle

unread,
Aug 24, 2005, 6:47:24 PM8/24/05
to
On 24 Aug 2005 at 14:59, John Vottero wrote:
> Why not run SETI on VMS too?

I do, on Alpha. You can't do it on VAX because the VAX doesn't have
IEEE floating-point capability. (That's documented somewhere in the
SETI FAQ.)

hea...@aracnet.com

unread,
Aug 24, 2005, 6:40:24 PM8/24/05
to
Dan Foster <use...@evilphb.org> wrote:
> I don't see much in way of threading (it's there, but kind of minimal)
> with SIMH, and the load average jumps to 1.0 when I run it, so I gather
> it's not yielding the CPU even when 'idle' unlike with other emulators
> like VMware. That indirectly suggests it's missing out on other
> performance optimization tricks.

SIMH expects 100% of the CPU, I believe part of the issue is the knowing
when the emulated system is actually ideal. IIRC this has been solved under
KLH10 and possibly modified copies of SIMH by modifying TOP20 itself.

> SIMH seems to be designed more to be functional with particular
> attention paid to correctness than to be high performing per se.

Part of the "problem" is that SIMH is designed to be as portable as
possible. Of course for most people this could be considered a good thing,
as it means it runs most places. Another issue is that the SIMH package
emulates a *LOT* of architectures, rather than just one.

Zane

hea...@aracnet.com

unread,
Aug 24, 2005, 7:00:22 PM8/24/05
to
Alan Greig <grei...@netscape.net> wrote:
> $! CALCULATE_VUPS:
> $!
> $ set noon
> $ orig_privs = f$setprv("ALTPRI")
> $ process_priority = f$getjpi(0,"PRIB")
> $ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
> $ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9

Just for the fun of it, I decided to run this on my PWS 433au.

$ @calculate_vups
Digital Personal WorkStation
Approximate System VUPs Rating : 439.0
$

Zane

Dave Froble

unread,
Aug 24, 2005, 9:43:37 PM8/24/05
to

I ran that procedure on a VAXstation 4000 model 90A, got somewhere
around 26. Thought I'd try an Alpha:

This is a AlphaStation 200 4/233, hardware model type 1151
$ @calcvups
Approximate System VUPs Rating : 22.8

I'd stir the pot concerning VAX vs Alpha, but benchmarks I've run in the
past show the Alpha to be faster.

Me thinks the DCL procedure needs some tinkering.

Thierry Dussuet

unread,
Aug 25, 2005, 4:57:46 AM8/25/05
to
On 2005-08-25, Dave Froble <da...@tsoft-inc.com> wrote:
> hea...@aracnet.com wrote:
>> Alan Greig <grei...@netscape.net> wrote:
>>
>>>$! CALCULATE_VUPS:
>>>$!
>>>$ set noon
>>>$ orig_privs = f$setprv("ALTPRI")
>>>$ process_priority = f$getjpi(0,"PRIB")
>>>$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
>>>$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
>
> This is a AlphaStation 200 4/233, hardware model type 1151
> $ @calcvups
> Approximate System VUPs Rating : 22.8
>
> I'd stir the pot concerning VAX vs Alpha, but benchmarks I've run in the
> past show the Alpha to be faster.
>
> Me thinks the DCL procedure needs some tinkering.

Just the two lines above, cpu_multiplier and cpu_round_add

Thierry

Galen

unread,
Aug 25, 2005, 7:31:20 AM8/25/05
to

VAX...@SendSpamHere.ORG wrote:
>
> Bugger the Weendoze exes... where are the OS X exes???
>
Here's one!

I have a simh V3.4 Vax running VMS V7.3 on my PowerMac G4/433 under OS
X V10.3.9. No problems compiling, no major ones running simh itself,
and [of course] none whatsoever with VMS. I haven't tried any kind of
benchmarks but can say that performance is not breathtaking, though
subjectively I think it's faster than a speeding 11/780. (Haven't tried
it leaping over tall buildings in a single bound, and it's probably not
as powerful as a "loco Motif" :-)

Galen

P.S. simh uses pcap to hook into a Mac OS network interface. I did
discover and fix a minor glitch that caused simh not so see some of the
available interfaces. The fix hasn't shown up in the official
distribution yet.

Also, I've had a lot of trouble getting TCP/IP (Pathworks) to run. I
have dedicated a 2nd Ethernet port on my Mac to simh, hooked into the
same D-Link NAT firewall/router as the primary Ethernet. For some
reason pcap thinks the interface is down unless OS X has an IP address
assigned to the interface. And if you do so, THEN you get "duplicate IP
address" messages on the VMS side because TCPware sees responses to its
ARP requests coming from the OS X-assigned IP address. And the router
reports IP address spoofing!

Bob Koehler

unread,
Aug 25, 2005, 8:41:43 AM8/25/05
to
In article <oX%Oe.1289$u_6....@newssvr17.news.prodigy.com>, "John Vottero" <Jo...@mvpsi.com> writes:
>
> Why not run SETI on VMS too?

SETI does not run on VAXen, real or simmed. We do run SETO on VMS
on our hobbyist Alphas, don't you?

Bob Koehler

unread,
Aug 25, 2005, 8:47:31 AM8/25/05
to
In article <deit0...@enews3.newsguy.com>, hea...@aracnet.com writes:
> Dan Foster <use...@evilphb.org> wrote:
>> I don't see much in way of threading (it's there, but kind of minimal)
>> with SIMH, and the load average jumps to 1.0 when I run it, so I gather
>> it's not yielding the CPU even when 'idle' unlike with other emulators
>> like VMware. That indirectly suggests it's missing out on other
>> performance optimization tricks.
>
> SIMH expects 100% of the CPU, I believe part of the issue is the knowing
> when the emulated system is actually ideal. IIRC this has been solved under
> KLH10 and possibly modified copies of SIMH by modifying TOP20 itself.

You mean when it's idle? Some VAX simulators (both Charon and SIMH
IIRC) will detect the loop that VMS goes into at the end of shutting
down, and break out of the simulation. IMHO there should be a way
to detect entry into the idle loop and suspend the simulation until
the next interrupt (which is the only way to get out of the idle
loop). Maybe the simulations require host CPU to poll for incoming
interrupts (which may not be host interrupts).

Galen

unread,
Aug 25, 2005, 11:45:38 AM8/25/05
to

A very preliminary idea along this line, in two parts.

First, maybe simh could implement a simple mechanism (say, a simple
minded I/O "device") by which a hosted system (not necessarily a VAX)
could specify something along either or both of these lines: A) a range
of physical addresses where the idle loop lives (for systems where this
is possible); B) a register bit that the hosted system can set to
indicate it is going idle. These would allow simh to detect an idle
condition, though I don't know how practical this would be to
implement.

Second, a small piece of code to execute on the hosted system. This
would take care of providing simh the information in A) or B) above.
How practical would either of these be on a hosted VMS system?

As you say, Bob, I think simh itself would periodically have to wake
itself in order to check if there are interrupts to be generated within
the hosted system. Otherwise the hosted VMS system clock would freeze,
and all other kinds of dire things would happen inside VMS as well.

JF Mezei

unread,
Aug 25, 2005, 2:19:36 PM8/25/05
to
Bob Koehler wrote:
> You mean when it's idle? Some VAX simulators (both Charon and SIMH
> IIRC) will detect the loop that VMS goes into at the end of shutting
> down, and break out of the simulation.


Have the writers of SimH and Charron VAX asked the VMS engineers if it
were possible to provide a user writter iddle loop ? (eg: some sysgen
parameter or just replacing some shareable image).

This way, they could plug in an "iddle loop" which would do the
equivalent of a LIB$WAITEF or whatever until resources are needed again.

Out of curiosity, what does the VMS iddle loop look like ? Is it just a
raw "goto myself" efficient infinite loop which relies on process
management to give it CPU and remove CPU access, or does the loop run in
a deeper mode and has some sort of check (flag, event flag etc) inside
the loop to see if it should exit and return control to the rest of the
system ?

Dave Froble

unread,
Aug 25, 2005, 4:14:40 PM8/25/05
to

I wasn't paying attention when I scanned the procedure. Another 'ah
shit' for me.

Stanley F. Quayle

unread,
Aug 25, 2005, 5:13:24 PM8/25/05
to
> Ok, thanks. I didn't realise that Charon can now emulate an SMP VAX on
> an SMP host.

This capability is essential for reaching the high end of the VAX
martket.

Using the CHARON-VAX 6630 Plus product, SRI got 258 VUPs (on a 4-way
Winbox).

Funny, when you tackle SMP successfully, it's easy to add more emulated
processors. Expect a 6660 announcement from SRI next week. Should
produce 400+ VUPs.

Stanley F. Quayle

unread,
Aug 25, 2005, 5:22:53 PM8/25/05
to
JF Mezei wrote:
> Have the writers of SimH and Charron VAX asked the VMS engineers if it
> were possible to provide a user writter iddle loop ? (eg: some sysgen
> parameter or just replacing some shareable image).

Since none of the systems I'm replacing are V7.3 (over half are
V5.5-x), I'm sure this would be waaaaay down the priority list.

Actually, there's already a way to create a kernel-code module to hook
into the idle loop. CHARON-VAX comes with such a module, for V5.5 and
V7.3. It uses an out-of-band (not part of the emulated VAX) mechanism
to tell the emulator to sleep until a VAX interrupt comes along.

Galen

unread,
Aug 26, 2005, 5:54:36 AM8/26/05
to

I wrote:

> I haven't tried any kind of benchmarks but can say that performance is not
> breathtaking, though subjectively I think it's faster than a speeding 11/780.

Well, there's one performance test I have run--the boot time benchmark.
Though I don't have actual numbers to provide, my simh Vax boots quite
a bit faster than a real 11/780. Partly, I suppose, because it doesn't
have to load microcode from an 8" floppy. :-)

Galen

Bob Koehler

unread,
Aug 26, 2005, 8:27:40 AM8/26/05
to
In article <1124984738.6...@g43g2000cwa.googlegroups.com>, "Galen" <glta...@gmail.com> writes:
>
> Second, a small piece of code to execute on the hosted system. This
> would take care of providing simh the information in A) or B) above.
> How practical would either of these be on a hosted VMS system?

I thought simply patch replacing the 10$: BRB 10$ with a HALT followed
by a NOP would be a good first step. It then depends on how the
simulator reacts to a CPU halt (real VAXen often have settings to
control this.)

> As you say, Bob, I think simh itself would periodically have to wake
> itself in order to check if there are interrupts to be generated within
> the hosted system. Otherwise the hosted VMS system clock would freeze,
> and all other kinds of dire things would happen inside VMS as well.

Is the VAX CPU clock simulation not dependent on a host timer? I
think on a real VAX all paths out of the idle loop start with a
hardware interrupt (since no other software is using the VAX CPU, it
can't start with a software interrupt).

Perhaps it's too complicated in SIMH to make a portable way to trap
host initiated events that would map to simulated VAX hardware
interrupts, but a commercial product like Charon-VAX with a limited
set of hosts make do so, and provide the VMS patch as part of the
installation process.

Bob Koehler

unread,
Aug 26, 2005, 8:33:32 AM8/26/05
to
In article <430E0BB7...@teksavvy.com>, JF Mezei <jfmezei...@teksavvy.com> writes:
>
> Out of curiosity, what does the VMS iddle loop look like ? Is it just a
> raw "goto myself" efficient infinite loop which relies on process
> management to give it CPU and remove CPU access, or does the loop run in
> a deeper mode and has some sort of check (flag, event flag etc) inside
> the loop to see if it should exit and return control to the rest of the
> system ?

The VMS idle loop is architecture dependent. The last time I looked
at the VAX listings it was simply the smallest possible infinite loop:

10$: BRB 10$

The Alpha (and I assume IA64) "idle loop" will actually do work for
you: maintaining a cache of zero filled pages to speed demand-zero
paging (VAXen simply use a MOVCx instruction to clear a page when
such a page fault occurs).

When the demand zero cache is full Alpha will actually go into a
tight loop, the code for which is similar but I never had an Alpha
listings kit to see it.

Bob Koehler

unread,
Aug 26, 2005, 8:35:29 AM8/26/05
to
In article <1125004973.6...@g14g2000cwa.googlegroups.com>, "Stanley F. Quayle" <st...@stanq.com> writes:

> Actually, there's already a way to create a kernel-code module to hook
> into the idle loop. CHARON-VAX comes with such a module, for V5.5 and
> V7.3. It uses an out-of-band (not part of the emulated VAX) mechanism
> to tell the emulator to sleep until a VAX interrupt comes along.

So it's already doing what we just discussed. In what release was
that added? I don't recall seeing it in the hobbyist Charon-VAX
(Pico-VAX) days.

Stanley F. Quayle

unread,
Aug 26, 2005, 11:00:19 AM8/26/05
to
On 26 Aug 2005 at 7:35, Bob Koehler wrote:
> > Actually, there's already a way to create a kernel-code module to
> > hook into the idle loop. CHARON-VAX comes with such a module, for
> > V5.5 and V7.3. It uses an out-of-band (not part of the emulated
> > VAX) mechanism to tell the emulator to sleep until a VAX interrupt
> > comes along.
>
> So it's already doing what we just discussed. In what release was
> that added? I don't recall seeing it in the hobbyist Charon-VAX
> (Pico-VAX) days.

It's been in CHARON-VAX/XM, /XK, /XL since V3.0. The "hobbyist"
version of CHARON-VAX is CHARON-VAX/Industrial V0.0.

Installing the kernel patch is optional -- it's marked in the release
notes as not for critical production use...

--Stan Quayle
Quayle Consulting Inc.

----------
Stanley F. Quayle, P.E. N8SQ +1 614-868-1363
8572 North Spring Ct., Pickerington, OH 43147 USA
stan-at-stanq-dot-com http://www.stanq.com
"OpenVMS, when downtime is not an option"


Lee Witten

unread,
Aug 27, 2005, 12:07:07 AM8/27/05
to
"Stanley F. Quayle" <st...@stanq.com> wrote in
news:1124848870.7...@z14g2000cwz.googlegroups.com:

> On 23 Aug 2005 at 23:21, Dan Foster wrote:
>> Just ask. It's called "Accelerated CPU Emulation". It caches
> frequent sequences of instructions and pre-compiles them. It's
> available in the "Plus" versions of the product.

Sounds like something I read about in the DTJ around 10 or so years ago...

> The biggest issue is knowing where the idle loop is in an operating
> system. Or even knowing what operating system is in use (we support
> VAXeln and Digital Unix, too).

One of my perverse hobbies was reading the source code for the idle loop
(called idle_thread()) in DU, from the very first drops from OSF/RI (back in
the MIPS PMAX days) through DU 5. Boy, did that code grow! At first it was
the simple spin loop that only broke when the cpu got an interrupt (either
due to I/O completion, timer expiry, or an inter-processor interrupt
triggered by another cpu). Then, various hackers added more and more garbage
to it. I imagine detecting what truly is the idle loop part of the idle
thread is very hard to do!

> Since CHARON-VAX is intended for
> server applications, most customers don't care that 1 processor is
> running 100%.

I agree.

--lw--

Lee Witten

unread,
Aug 27, 2005, 12:08:24 AM8/27/05
to
Dan Foster <use...@evilphb.org> wrote in
news:slrndgnifc...@zappy.catbert.org:

> Thanks for VUPS.COM.
>
> My system:
>
> 3.0 GHz Pentium 4 + 800 MHz FSB + 1 MB L2 cache + Hyperthreading
> ("Northwood")
> OS: Linux (2.6.12 kernel with SMP enabled for HT use)
> Load avg of 0.60 before running SIMH

This doesn't sound right. Maybe you should put the system into single user
mode to see what performance can be achieved on a more idle system?

Dan Foster

unread,
Aug 27, 2005, 12:39:00 AM8/27/05
to

I just did as you suggested. VUPS reported was 12.4 for all runs while
in single user mode with a host system load average of 0.02 or less
prior to starting SIMH.

-Dan

Dan Foster

unread,
Aug 27, 2005, 12:39:50 AM8/27/05
to
In article <Xns96BF13...@199.125.85.9>, Lee Witten <nos...@nospam.com> wrote:
> One of my perverse hobbies was reading the source code for the idle loop
> (called idle_thread()) in DU, from the very first drops from OSF/RI (back in
> the MIPS PMAX days) through DU 5. Boy, did that code grow! At first it was
> the simple spin loop that only broke when the cpu got an interrupt (either
> due to I/O completion, timer expiry, or an inter-processor interrupt
> triggered by another cpu). Then, various hackers added more and more garbage
> to it. I imagine detecting what truly is the idle loop part of the idle
> thread is very hard to do!

Just out of curiosity, what did they add to the idle loop for DU by v5?

-Dan

Dave Froble

unread,
Aug 27, 2005, 4:35:58 AM8/27/05
to

Actually, with some of the newer processors, with the capability of
cutting back on power consumption, heat, and such when there is no
demand for the processor, running the processor 100% isn't such a good idea.

Alan Greig

unread,
Aug 27, 2005, 6:45:07 AM8/27/05
to

Lee Witten wrote:

Why doesn't it sound right? 12.2 VUPs is in line with what I'd expect
from simh with a 3Ghz Pentium 4 - maybe a little more squeezed out
single user. What figures are you seeing? I get 7.8 (Windows XP gcc) out
of an AMD XP 1800+ compiling -O3 and -march=athlon. I get about 7.5 with
-O2 and no architecture tuning so lets use that as the base. Now 1800
+33% = 3000 (3Ghz Pentium equiv AMD rating) and 7.5 + 33% = 12.5. Ok a
little faster than 12.2 but in the same ballpark.

It's the gamers with their overclocking motherboards up at 4000+ rating
who would be heading towards 20VUPS. If you know a trick to increase
performance I'd love to know.

--
Alan Greig

Alan Greig

unread,
Aug 27, 2005, 6:49:06 AM8/27/05
to

Alan Greig wrote:

> -O2 and no architecture tuning so lets use that as the base. Now 1800
> +33% = 3000 (3Ghz Pentium equiv AMD rating) and 7.5 + 33% = 12.5. Ok a

Obviously I mean 66% above!!


--
Alan Greig

Bill Gunshannon

unread,
Aug 27, 2005, 11:17:29 AM8/27/05
to
In article <11h0980...@corp.supernews.com>,

Dave Froble <da...@tsoft-inc.com> writes:
>
> Actually, with some of the newer processors, with the capability of
> cutting back on power consumption, heat, and such when there is no
> demand for the processor, running the processor 100% isn't such a good idea.

Why does this sound like a good emulation of the old Motorola instruction
HCF (Halt & Catch Fire)? :-)

bill

--
Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves
bi...@cs.scranton.edu | and a sheep voting on what's for dinner.
University of Scranton |
Scranton, Pennsylvania | #include <std.disclaimer.h>

Glenn Everhart

unread,
Aug 27, 2005, 11:29:55 AM8/27/05
to
Bill Gunshannon wrote:
> In article <11h0980...@corp.supernews.com>,
> Dave Froble <da...@tsoft-inc.com> writes:
>
>>Actually, with some of the newer processors, with the capability of
>>cutting back on power consumption, heat, and such when there is no
>>demand for the processor, running the processor 100% isn't such a good idea.
>
>
> Why does this sound like a good emulation of the old Motorola instruction
> HCF (Halt & Catch Fire)? :-)
>
> bill
>
How does SIMH do then on pdp11? The pdp11 has the WAIT instruction which at
least all the RSX (and I believe all DEC OSs) versions used for idle loop.
You could tell on an 11/45 or 11/70 since the front panel would display R0 in the
lights during WAIT, and the idle loop used different rotating light patterns
for R0 so you could tell the machine had not lost its mind.

As I recall the VMS loop, it ends on a simple branch to self, which in principle
should be detectable regardless of anything else as an idle loop condition. (This
assumes my memory is right...) Even in a user mode program, if the processor is
executing a BRB . or BRW . it obviously isn't going to do much till it gets an
interrupt.

Lee Witten

unread,
Aug 27, 2005, 11:00:59 PM8/27/05
to
Dan Foster <use...@evilphb.org> wrote in
news:slrndgvrkm...@zappy.catbert.org:
> Just out of curiosity, what did they add to the idle loop for DU by
> v5?

Lots of housekeeping things, like zeroing out pages that the system could
use in the future. This shouldn't bother me, but a lot of these things
should have been billed to the process needing the work, not the idle
thread. It also made the %idle statistics hard to interpret - was the
system really idle, or was it zeroing pages that the system really needed?

--lw--

Bob Koehler

unread,
Aug 29, 2005, 8:39:29 AM8/29/05
to
>
> One of my perverse hobbies was reading the source code for the idle loop
> (called idle_thread()) in DU, from the very first drops from OSF/RI (back in
> the MIPS PMAX days) through DU 5. Boy, did that code grow! At first it was
> the simple spin loop that only broke when the cpu got an interrupt (either
> due to I/O completion, timer expiry, or an inter-processor interrupt
> triggered by another cpu). Then, various hackers added more and more garbage
> to it. I imagine detecting what truly is the idle loop part of the idle
> thread is very hard to do!

There was a published paper extending the known values if digits
in pi which was accomplished by pathcing the idle loop on a VAX
11/780. Steps had to be taken to snapshot the data so the
calculation could survive reboots.

I think an early Alpha could reproduce the result in far less than
the mean time between boots. A pi calculator was included as an
example with the Macro-64 assembler.

Keith Parris

unread,
Aug 31, 2005, 10:03:42 AM8/31/05
to
Bob Koehler wrote:
> The VMS idle loop is architecture dependent. The last time I looked
> at the VAX listings it was simply the smallest possible infinite loop:
>
> 10$: BRB 10$

You must have looked prior to V5.0 with its introduction of SMP.

There is a branch-to-self line like this in EXCEPTION.EXE which is
executed at the end of shutdown right after VMS prints the "use console
to halt system" message. That instruction (branch-to-self) is probably
fairly easy for an emulator to detect.

With SMP, the idle loop on each CPU in a VAX essentially keeps checking
to try to find a process which that CPU can run.

There appear to be hooks for an architecture-specific EXE$PROC_IDLE
routine to be called, but I haven't been able to find anywhere in the
7.3 listings showing where that gets set or used.

S

unread,
Oct 4, 2005, 6:44:03 AM10/4/05
to
hea...@aracnet.com wrote:
> SIMH expects 100% of the CPU, I believe part of the issue is the knowing
> when the emulated system is actually ideal. IIRC this has been solved under
> KLH10 and possibly modified copies of SIMH by modifying TOP20 itself.

It does not expect that much of the CPU: I just compiled the latest with
Visual C++ and optimizations, and on this machine (dual AthlonMP 2400+)
it eats only about 50% CPU time while giving about 10 VUPs (not
impressive at all). The load on the two CPUs is not the same (60 - 40%)
but I don't know how to check who's to blame. Maybe with gcc/mingw it
would go better, but I wasn't able yet to figure out how to get that
thing compiling anything.

S

Alan Greig

unread,
Oct 4, 2005, 3:25:50 PM10/4/05
to

S wrote:
>
>
> It does not expect that much of the CPU: I just compiled the latest with
> Visual C++ and optimizations, and on this machine (dual AthlonMP 2400+)
> it eats only about 50% CPU time while giving about 10 VUPs (not

That's about the right number of VUPS for An Athlon 2400+. The second
processor won't give you any speed boost unless it's competing with
something else for cpu.

> impressive at all). The load on the two CPUs is not the same (60 - 40%)
> but I don't know how to check who's to blame. Maybe with gcc/mingw it
> would go better, but I wasn't able yet to figure out how to get that
> thing compiling anything.

There's a couple of C header files in the wrong directories which breaks
the supplied mingw build procedure. If you examine the make error output
just look for the file it didn't find and move it where it was looking
for it. Repeat twice.

>
> S

--
Alan Greig

0 new messages