Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Some VMS/x86 perf test data from WASD maintainer.

515 views
Skip to first unread message

Jan-Erik Söderholm

unread,
Apr 27, 2023, 5:24:46 AM4/27/23
to
Hi.

Just saw this: https://wasd.vsm.com.au/info-WASD/2023/0077

> TL;DR obvious profile differences in at least KERNEL mode
> begs some questions about x86-64 VMS.
>
> This item is more about X86 VMS and less about WASD but
> still uses WASD as a part of the measurement process.


Volker Halle

unread,
Apr 27, 2023, 7:05:29 AM4/27/23
to
Jan-Erik Söderholm schrieb am Donnerstag, 27. April 2023 um 11:24:46 UTC+2:

After now also running a VSI OpenVMS x86-64 E9.2-1 system (using VMware Workstation Player 16) and installing the C compiler, I have tried to 'calibrate' the procedure to obtain the VUPS values for x86-64 using the PRIME_SIEVE.C tool (calculating prime numbers up to 10^10). The CPU_MULTIPLIER value for x86_64 should be set to 8 - you can easily edit this procedure to add (at the right place): $ IF F$GETSYI("ARCH_NAME").EQS."x86_64" THEN $ cpu_multiplier=8

The resulting VUPS values for x86-64 are now more realistic: 5900 VUPS on an i5-9600K CPU @3.7 GHz

On this OpenVMS x86_64 system, PRIME_SIEVE needs 19.34 sec to calculate primes up to 10^10. On an ES45 Model 2B (EV68CB 1250 MHz) system it took 49.04 seconds, that system was measured at 2414 VUPS.

See previous discussion in: https://groups.google.com/g/comp.os.vms/c/ouhZ7C7bgoQ/m/_iEqNzm1AwAJ

PRIME_SIEVE was obtained from http://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html

Volker.

Volker Halle

unread,
Apr 27, 2023, 7:08:57 AM4/27/23
to
Jan-Erik Söderholm schrieb am Donnerstag, 27. April 2023 um 11:24:46 UTC+2:

> > TL;DR obvious profile differences in at least KERNEL mode
> > begs some questions about x86-64 VMS.

To start looking at the kernel mode usage - if this is already the time to do this - one would start with the PCS$SDA extension (PC sampling).

Volker.

Simon Clubley

unread,
Apr 27, 2023, 9:02:27 AM4/27/23
to
I couldn't see in the report whether the underlying x86-64 CPU running
the hypervisor was Intel or AMD.

Running these tests under the hypervisor on both Intel and AMD hardware
would be interesting.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Mark Daniel

unread,
Apr 27, 2023, 9:11:28 AM4/27/23
to
On 27/4/2023 10:32 pm, Simon Clubley wrote:
> On 2023-04-27, Jan-Erik Söderholm <jan-erik....@telia.com> wrote:
>> Hi.
>>
>> Just saw this: https://wasd.vsm.com.au/info-WASD/2023/0077
>>
>>> TL;DR obvious profile differences in at least KERNEL mode
>>> begs some questions about x86-64 VMS.
>>>
>>> This item is more about X86 VMS and less about WASD but
>>> still uses WASD as a part of the measurement process.
>>
>
> I couldn't see in the report whether the underlying x86-64 CPU running
> the hypervisor was Intel or AMD.

Processor Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
Installed RAM 16.0 GB

> Running these tests under the hypervisor on both Intel and AMD hardware
> would be interesting.
>
> Simon.
>

--
Anyone, who using social-media, forms an opinion regarding anything
other than the relative cuteness of this or that puppy-dog, needs
seriously to examine their critical thinking.

Mark Daniel

unread,
Apr 27, 2023, 9:31:26 AM4/27/23
to
On 27/4/2023 8:35 pm, Volker Halle wrote:
> Jan-Erik Söderholm schrieb am Donnerstag, 27. April 2023 um 11:24:46 UTC+2:
>
> After now also running a VSI OpenVMS x86-64 E9.2-1 system (using VMware Workstation Player 16) and installing the C compiler, I have tried to 'calibrate' the procedure to obtain the VUPS values for x86-64 using the PRIME_SIEVE.C tool (calculating prime numbers up to 10^10). The CPU_MULTIPLIER value for x86_64 should be set to 8 - you can easily edit this procedure to add (at the right place): $ IF F$GETSYI("ARCH_NAME").EQS."x86_64" THEN $ cpu_multiplier=8

With the VUPS.COM procedure modified to

|$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
|$ if f$getsyi("arch_name").eqs."x86_64" then $ cpu_multiplier=8

the resultant

|X86VMS$ @vups.com
|innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
|Approximate System VUPs Rating : 225.6 ( min: 224.4 max: 226.2 )

down from 272 "VUPs" reported previously, on a commodity DELL SFF with
Windows 10 Pro Version 22H2

Processor Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
Installed RAM 16.0 GB

> The resulting VUPS values for x86-64 are now more realistic: 5900 VUPS on an i5-9600K CPU @3.7 GHz
>
> On this OpenVMS x86_64 system, PRIME_SIEVE needs 19.34 sec to calculate primes up to 10^10. On an ES45 Model 2B (EV68CB 1250 MHz) system it took 49.04 seconds, that system was measured at 2414 VUPS.
>
> See previous discussion in: https://groups.google.com/g/comp.os.vms/c/ouhZ7C7bgoQ/m/_iEqNzm1AwAJ
>
> PRIME_SIEVE was obtained from http://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html
>
> Volker.

fswo...@gmail.com

unread,
Apr 27, 2023, 10:15:06 AM4/27/23
to comp.os.vms to email gateway
Here is i7-8700K processor (2 CPUs and 8 GB memory allocated on VirtualBox under Windows 11 Pro) - VUPS 420.0

Tim
_______________________________________________
Info-vax mailing list
Info...@rbnsn.com
http://rbnsn.com/mailman/listinfo/info-vax_rbnsn.com


Volker Halle

unread,
Apr 27, 2023, 10:43:24 AM4/27/23
to
Mark Daniel schrieb am Donnerstag, 27. April 2023 um 15:31:26 UTC+2:

> |$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
> |$ if f$getsyi("arch_name").eqs."x86_64" then $ cpu_multiplier=8

Mark,

we seem to be using different versions of the 'VUPS.COM' procedure ;-(
You seem to be using the one from FreeAXP, I'm using the one from Stromasys.

And therefore we cannot compare the results !

How about trying the PRIME_SIEVE.C example.

Volker.

Simon Clubley

unread,
Apr 27, 2023, 1:31:05 PM4/27/23
to
On 2023-04-27, Mark Daniel <mark....@wasd.vsm.com.au> wrote:
> On 27/4/2023 10:32 pm, Simon Clubley wrote:
>> On 2023-04-27, Jan-Erik Söderholm <jan-erik....@telia.com> wrote:
>>> Hi.
>>>
>>> Just saw this: https://wasd.vsm.com.au/info-WASD/2023/0077
>>>
>>>> TL;DR obvious profile differences in at least KERNEL mode
>>>> begs some questions about x86-64 VMS.
>>>>
>>>> This item is more about X86 VMS and less about WASD but
>>>> still uses WASD as a part of the measurement process.
>>>
>>
>> I couldn't see in the report whether the underlying x86-64 CPU running
>> the hypervisor was Intel or AMD.
>
> Processor Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
> Installed RAM 16.0 GB
>

Thank you Mark.

If those are the kernel figures for an Intel box, I would love to see
what they are for an AMD box...

Simon Clubley

unread,
Apr 27, 2023, 1:37:58 PM4/27/23
to
I've mentioned this before (and got some negative feedback for it
as everyone is focused on the CPU, not I/O), but I think some tests
that measure I/O performance using RMS (which will force a range of
mode transitions including through executive mode) may be interesting.

To make it a fair comparison, the I/O could be to a RAM disk so that
VMS overheads only are being compared.

Volker Halle

unread,
Apr 27, 2023, 1:49:13 PM4/27/23
to
If you're wondering about kernel mode usage, try this:

$ ANA/SYS
SDA> READ/EXEC ! get better symbolization (routine names)
SDA> LOAD PCS ! PC sampling SDA Execlet
SDA> START TRACE
...
SDA> STOP TRACE
SDA> SHOW TRACE/STAT/MODE=KERNEL
...
SDA> UNLOAD PCS
SDA> EXIT

SHOW TRACE/STAT/MODE=KERNEL will show the PC values seen most often during the trace period (sorted by decreasing no. of occurences) and it will symbolize the PC values as routine names or execlet names. This might give you an idea, which kind of kernel mode code is running how often.

Volker.

Paul Gavin

unread,
Apr 27, 2023, 1:51:27 PM4/27/23
to
If the system (servers, storage, connection infrastructure, etc) cannot do i/o at a reasonable rate, what good is it?
Coming from a manufacturing background, 'systems' not having good i/o rates were the key to how many widgets could be made.

Mark Daniel

unread,
Apr 27, 2023, 1:58:40 PM4/27/23
to
Hi Volker.

I completely unsure of the provenance of the (calculate_)VUPS.COM I use
except it seems to have been around forever. I have placed what I have
been using at

https://wasd.vsm.com.au/other/vups.com

I searched for a stromasys.com version but was unable to locate it
either through the internet engine I use or via the Stromasys support page

https://www.stromasys.com/tools-resources

(though didn't try too hard).

I really didn't want another round of V9.2 metrics as with late last
year and only quoted the "VUPs" as indicative. The point of this
exercise was the very obvious difference in KERNEL mode consumption with
a very similar and diverse workload enabled via the ZAP+WASD crawl.

I was intending to ask a question in the VSI fora, pointing to these
results but Jan-Erik beat me to the punch in c.o.v. Will still do so.

David Jones

unread,
Apr 27, 2023, 3:06:40 PM4/27/23
to
I use the old Byte Magazine benchmark from the 1990s (bytemark, normalized to a 90Mhz Pentium). Running in on
my laptop under VirtualBox with an Intel i5-1235U gets an integer score 6-7 times higher than my DS10L/466 Alpha.
Interestingly, the floating point score is barely higher. Running the posted sieve code on both, the laptop is 11 times
faster.

At some point, I'll install visual studio on the laptop and see what difference the compiler makes. The current VSI
compiler is doing some optimization (unlike the cross compiler), as building with /opt=level=0 does slow it down.

Single Stage to Orbit

unread,
Apr 27, 2023, 5:29:04 PM4/27/23
to
On Thu, 2023-04-27 at 17:31 +0000, Simon Clubley wrote:
> If those are the kernel figures for an Intel box, I would love to see
> what they are for an AMD box...

Never mind, I see someone else has already posted a link, thanks.
--
Tactical Nuclear Kittens

Single Stage to Orbit

unread,
Apr 27, 2023, 5:29:04 PM4/27/23
to
On Thu, 2023-04-27 at 17:31 +0000, Simon Clubley wrote:
> If those are the kernel figures for an Intel box, I would love to see
> what they are for an AMD box...

Point me in the direction of a copy of VUPS.COM and I can run these
benchmarks.
--
Tactical Nuclear Kittens

Jan-Erik Söderholm

unread,
Apr 27, 2023, 5:30:54 PM4/27/23
to
Den 2023-04-27 kl. 19:58, skrev Mark Daniel:

> I really didn't want another round of V9.2 metrics as with late last year
> and only quoted the "VUPs" as indicative.  The point of this exercise was
> the very obvious difference in KERNEL mode consumption with a very similar
> and diverse workload enabled via the ZAP+WASD crawl.
>
> I was intending to ask a question in the VSI fora, pointing to these
> results but Jan-Erik beat me to the punch in c.o.v.  Will still do so.
>

Sorry. It was public on both the WASD site and on the WASD mailing list.
I did think for a while, but thought that it was intersting enough.


Mark Daniel

unread,
Apr 27, 2023, 6:02:23 PM4/27/23
to
Apology absolutely, completely, unnecessary Jan-Erik.

And I certainly think it is interesting enough for the x86-64 VMS world.

Mark Daniel

unread,
Apr 27, 2023, 6:06:17 PM4/27/23
to
On 28/4/2023 7:00 am, Jan-Erik Söderholm wrote:
Oh, and question asked:

https://forum.vmssoftware.com/viewtopic.php?f=21&t=8654

Single Stage to Orbit

unread,
Apr 27, 2023, 8:02:31 PM4/27/23
to
On Thu, 2023-04-27 at 17:31 +0000, Simon Clubley wrote:
> If those are the kernel figures for an Intel box, I would love to see
> what they are for an AMD box...

Using the VUPS.COM off that WASD link given earlier today:

$ @vups.com
innotek GmbH VirtualBox with 8 CPU and 16128MB running VMS E9.2-1
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 482.0 ( min: 482.0 max: 482.0 )

HTH.
--
Tactical Nuclear Kittens

Arne Vajhøj

unread,
Apr 27, 2023, 8:09:57 PM4/27/23
to
On 4/27/2023 3:06 PM, David Jones wrote:
> I use the old Byte Magazine benchmark from the 1990s (bytemark, normalized to a 90Mhz Pentium). Running in on
> my laptop under VirtualBox with an Intel i5-1235U gets an integer score 6-7 times higher than my DS10L/466 Alpha.
> Interestingly, the floating point score is barely higher. Running the posted sieve code on both, the laptop is 11 times
> faster.

I use the code below which I not only have in C but also in
other languages.

Arne

native_test.c
=============

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

#include "high_res_timer.h"

void printres(TIMECOUNT_T t1, TIMECOUNT_T t2, int n1, int n2, char *ops)
{
double xperf;
xperf = (double)n1 * (double)n2 / ((t2 - t1) * 1.0 / UNITS_PER_SECOND);
printf("%.2f million %s per second\n", xperf / 1000000, ops);
}

#define NINT 10000
#define NFP 1000
#define NSTR 100
#define N 1000000

void testint(int scale)
{
int i, j;
int nintscale, sum;
TIMECOUNT_T t1, t2;
nintscale = NINT / scale;
t1 = GET_TIMECOUNT;
for(i = 0; i < nintscale; i++)
{
sum = i;
for(j = 0; j < N; j++)
{
sum = ((sum + 1) * 2 + 1) / 2;
}
if(sum != (i + N))
{
printf("Integer test error\n");
exit(0);
}
}
t2 = GET_TIMECOUNT;
printres(t1, t2, nintscale, N, "integer operations");
}

void testfp(int scale)
{
int i, j;
int nfpscale;
double sum;
TIMECOUNT_T t1, t2;
nfpscale = NFP / scale;
t1 = GET_TIMECOUNT;
for(i = 0; i < nfpscale; i++)
{
sum = i;
for(j = 0; j < N; j++)
{
sum = ((sum + 1) * 2 + 1) / 2;
}
if(fabs(sum - (i + 1.5 * N)) > 1)
{
printf("Floating point test error\n");
exit(0);
}
}
t2 = GET_TIMECOUNT;
printres(t1, t2, nfpscale, N, "floating point operations");
}

#define ALFA "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

void teststr(int scale)
{
int i, j;
int ix, ix1, ix2;
int nstrscale;
char s[1000], buf[N+1];
TIMECOUNT_T t1, t2;
nstrscale = NSTR / scale;
t1 = GET_TIMECOUNT;
for(i = 0; i < nstrscale; i++)
{
strcpy(buf, "");
for(j = 0; j < N; j = j + 10)
{
strcpy(s, ALFA);
strcat(s, ALFA);
ix = (i + j) % strlen(ALFA);
strncat(buf + j, s + ix, 1);
strncat(buf + j, s + ix + 1, 2);
strncat(buf + j, s + ix + 3, 3);
strncat(buf + j, s + ix + 6 , 4);
}
ix1 = N / 3;
ix2 = 2 * N / 3;
if(strlen(buf) != N || buf[ix1] != ALFA[(i + ix1) %
strlen(ALFA)] || buf[ix2] != ALFA[(i + ix2) % strlen(ALFA)])
{
printf("String test error\n");
exit(0);
}
}
t2 = GET_TIMECOUNT;
printres(t1, t2, nstrscale, N / 10, "string operations");
}

#define REP 10

int main(int argc, char *argv[])
{
int i;
int scale;
printf("%d bit\n", (int)(sizeof(char *) * 8));
if(argc > 1)
{
scale = atoi(argv[1]);
}
else
{
scale = 1;
}
for(i = 0; i < REP; i++)
{
testint(scale);
}
for(i = 0; i < REP; i++)
{
testfp(scale);
}
for(i = 0; i < REP; i++)
{
teststr(scale);
}
return 0;
}

high_res_timer.h
================

#if defined(__vms) && (defined(__alpha) || defined(__ia64))
#include <starlet.h>

static long long int vms_get_timecount(void)
{
long int res;
long long int t;
res = sys$gettim(&t);
return (res & 1 == 1) ? t : 0;
}

#define TIMECOUNT_T long long int
#define GET_TIMECOUNT vms_get_timecount()
#define UNITS_PER_SECOND 10000000
#endif

#ifdef __unix
#include <time.h>

static long long int unix_get_timecount(void)
{
int res;
struct timespec t;
res = clock_gettime(CLOCK_MONOTONIC, &t);
return (res == 0) ? (t.tv_sec * 1000000000LL + t.tv_nsec) : 0;
}

#define TIMECOUNT_T long long int
#define GET_TIMECOUNT unix_get_timecount()
#define UNITS_PER_SECOND 1000000000
#endif

#ifdef _WIN32
#include <windows.h>

static long long int win32_get_timecount(void)
{
BOOL res;
LARGE_INTEGER t;
res = QueryPerformanceCounter(&t);
return res ? t.QuadPart : 0;
}

static long long int win32_units_per_second(void)
{
BOOL res;
LARGE_INTEGER t;
res = QueryPerformanceFrequency(&t);
return res ? t.QuadPart : 0;
}

#define TIMECOUNT_T long long int
#define GET_TIMECOUNT win32_get_timecount()
#define UNITS_PER_SECOND win32_units_per_second()
#endif

#ifndef TIMECOUNT_T
#error "TIMECOUNT_T not defined"
#endif

#ifndef GET_TIMECOUNT
#error "GET_TIMECOUNT not defined"
#endif

#ifndef UNITS_PER_SECOND
#error "UNITS_PER_SECOND not defined"
#endif





fswo...@gmail.com

unread,
Apr 27, 2023, 8:25:05 PM4/27/23
to alex....@munted.eu, comp.os.vms to email gateway
Ok, I got updated VUPS.COM from WASD.

Here are results for i7-8700K processor (12 cores):

$ @vups2
innotek GmbH VirtualBox with 2 CPU and 7936MB running VMS E9.2-1
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 414.8 ( min: 408.4 max: 418.0 )
$

-----Original Message-----
From: Info-vax <info-vax...@rbnsn.com> On Behalf Of Single Stage to
Orbit via Info-vax
Sent: Thursday, April 27, 2023 7:56 PM
To: info...@rbnsn.com
Cc: Single Stage to Orbit <alex....@munted.eu>
Subject: Re: [Info-vax] Some VMS/x86 perf test data from WASD maintainer.

John H Reinhardt

unread,
Apr 28, 2023, 12:26:19 AM4/28/23
to
Also using the VUPS from WASD:

$ @wasd_vups
VMware, Inc. VMware7,1 with 4 CPU and 12028MB running VMS E9.2-1
INFO: Preventing endless loop (10$) on fast CPUs
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 693.1 ( min: 658.2 max: 705.6 )

The VM Host is an Intel NUC 9 Extreme NUC9i9QNX with 8 CPUs x Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz running ESXi V7.0u3 and 32GB memory

The OpenVMS VM is 4 vCPU with 12GB of memory.

John H. Reinhardt

Volker Halle

unread,
Apr 28, 2023, 2:41:44 AM4/28/23
to
Mark,

there are at least 2 variants of the 'VUPS' procedure. And some of these variants have my name in them.

Stromasys variant: I have been using this variant since about 19 years in CHARON-VAX and CHARON-AXP migration projects. This procedure was supposed to report the 'correct VUPS' values as documented for the older real VAX and Alpha systems:

VUPS numbers for 'real systems' were publicized by DEC, but most of these documents have disappeared from the WEB.

Here are some examples: http://www.vaxmacro.de/vvcc.html

I had to modify the Stromasys VUPS procedure (called SRI_VUPS.COM at that time, SRI = Software Resources International, which later became Stromasys), because it would enter an endless loop on the faster Alphas (> 1 GHz). And this modification found it's way into the other variants below.

FreeAXP variant: http://www.migrationspecialties.com/Downloads/VUPS.txt
EmuVM variant: https://emuvm.com/download/vups-com-benchmark/

The above 2 are strongly related (as can be seen by this line: $ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40). And there is no code in these procedures to check the architecture they're running on and adapt the CPU_MULTIPLIER ! So the VUPS results produced can not be directly compared between the Stromasys variant and the other two - although there should be a linear relation between them. The DCL code in the 'inner loop' is the same for all these variants. Pure integer performance on one single CPU.

The idea of this procedure was the ability to compare 'VUPS' values between hardware systems and their emulated counterparts. Simple DCL procedure, could be mailed and run on the target system. No compiling required. When OpenVMS on I64 came around, I was interested in also calculating the 'VUPS' values for those systems. I used the runtime of the PRIME_SIEVE program to 'calibrate' the CPU_MODIFIER value in the SRI_VUPS.COM variant, so it would report VUPS values, which allowed to compare the VAX, Alpha, I64 and now x86-64 CPUs under OpenVMS.

So whenever someone provides VUPS values, they need to also document, which procedure has been used to collect them - as John has done by calling your procedure WASD_VUPS.COM

We can continue the VUPS discussion here, but should discuss the KERNEL mode question in the VSI forum.

Volker.

Simon Clubley

unread,
Apr 28, 2023, 8:08:03 AM4/28/23
to
On 2023-04-27, Mark Daniel <mark....@wasd.vsm.com.au> wrote:
>
> And I certainly think it is interesting enough for the x86-64 VMS world.
>

It most certainly is Mark. I was expecting to see much higher overheads
for kernel mode because of the emulation overheads, but the kind of numbers
you have posted I was expecting to see on AMD boxes, not on Intel boxes.

That's why I am now interested in what the kernel mode overhead numbers
actually look like on AMD boxes.

Simon Clubley

unread,
Apr 28, 2023, 8:19:33 AM4/28/23
to
On 2023-04-27, Single Stage to Orbit <alex....@munted.eu> wrote:
> On Thu, 2023-04-27 at 17:31 +0000, Simon Clubley wrote:
>> If those are the kernel figures for an Intel box, I would love to see
>> what they are for an AMD box...
>
> Point me in the direction of a copy of VUPS.COM and I can run these
> benchmarks.

VUPS.COM is interesting, but it isn't the thing that's showing up
the extra kernel mode overheads for real-world activities.

Mark's numbers shown here:

https://wasd.vsm.com.au/info-WASD/2023/0077

with real-world I/O activity, are showing major increases in the percentage
of kernel mode overhead time consumed.

I would be interested in seeing what VSI have to say about Mark's numbers.

We need to be focusing less on artificial tests such as VUPS.COM and more
on the real-world tests like the ones that Mark is running.

Simon Clubley

unread,
Apr 28, 2023, 8:22:29 AM4/28/23
to
On 2023-04-28, Volker Halle <volker...@hotmail.com> wrote:
>
> We can continue the VUPS discussion here, but should discuss the KERNEL mode question in the VSI forum.
>

Why ? The Kernel mode overhead is the real issue that needs discussing,
not some artificial CPU speed test.

Mark Daniel

unread,
Apr 28, 2023, 10:33:49 AM4/28/23
to
This is slightly off the topic of KERNEL mode consumption on X86 but ...

I have put all three into a ZIP archive at

https://wasd.vsm.com.au/other/vups_230428a.zip
https://wasd.vsm.com.au/other/vups_230428a.txt

for anyone who would like to experiment.

The comparative results across a number of platforms I have access to
are as follows (extraneous white-space removed).

|$ @wasd_vups.com
|HP rx2660 (1.40GHz/6.0MB) with 4 CPU and 14335MB running VMS V8.4-2L3
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 469.0 ( min: 465.8 max: 470.6 )
|$ @migs_vups.com
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 456.5 ( min: 453 max: 458 )
|$ @emuvm_vups.com
|Approximate System VUPs Rating : 458.0 ( min: 458.0 max: 458.0 )

|$ @wasd_vups.com
|Digital Personal WorkStation with 1 CPU and 1536MB running VMS V8.4-2L1
|Approximate System VUPs Rating : 151.4 ( min: 151.0 max: 151.8 )
|$ @migs_vups.com
|Approximate System VUPs Rating : 144.5 ( min: 144 max: 144 )
|$ @emuvm_vups.com
|Approximate System VUPs Rating : 145.0 ( min: 144.8 max: 145.4 )

|! EISNER
|$ @WASD_VUPS.COM
|AlphaServer DS20 500 MHz with 2 CPU and 1536MB running VMS V8.4-2L2
|Approximate System VUPs Rating : 243.5 ( min: 238.6 max: 247.6 )
|$ @migs_VUPS.COM
|Approximate System VUPs Rating : 236.5 ( min: 234 max: 239 )
|$ @emuvm_VUPS.COM
|Approximate System VUPs Rating : 239.5 ( min: 238.2 max: 241.8 )

|! Dell Optiplex 9020 4 core i7 3.4Ghz 16GB Win 10 Pro 22H2
|$ @wasd_vups.com
|innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
|Approximate System VUPs Rating : 282.2 ( min: 281.4 max: 282.6 )
|$ @migs_vups.com
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 275.3 ( min: 273 max: 276 )
|$ @emuvm_vups.com
|Approximate System VUPs Rating : 275.8 ( min: 275.8 max: 275.8 )

|! BXNUC10i7FNH4 6 core i7 1.10GHz 32GB
|VSMX86$ @wasd_vups.com
|innotek GmbH VirtualBox with 2 CPU and 7936MB running VMS E9.2-1
|Approximate System VUPs Rating : 645.4 ( min: 642.4 max: 647.0 )
|$ @migs_vups.com
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 641.2 ( min: 641 max: 641 )
|$ @emuvm_vups.com
|Approximate System VUPs Rating : 636.3 ( min: 635.4 max: 638.2 )

Observations ... the VUPS.COM numbers all seem comparable between
versions of the procedure. The WASD_VUPS.COM numbers perhaps marginally
higher than the other two.

When the adjustment re X86 per Volker's post are uncommented

|$!!! if f$getsyi("arch_name").eqs."x86_64" then $ cpu_multiplier=8

the numbers drop by some 20%.

|! Dell Optiplex 9020 4 core i7 3.4Ghz 16GB Win 10 Pro 22H2
|$ @wasd_vups.com
|innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
|Approximate System VUPs Rating : 225.6 ( min: 225.6 max: 225.6 )

|! Dell Optiplex 9020 4 core i7 3.4Ghz 16GB Win 10 Pro 22H2
|$ @wasd_vups.com
|innotek GmbH VirtualBox with 2 CPU and 7936MB running VMS E9.2-1
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 517.8 ( min: 515.8 max: 518.8 )

And now back to the main show, KERNEL mode numbers

https://wasd.vsm.com.au/info-WASD/2023/0077

Mark Berryman

unread,
Apr 28, 2023, 1:41:23 PM4/28/23
to
On 4/28/23 8:33 AM, Mark Daniel wrote:
> .
> .
> .
> This is slightly off the topic of KERNEL mode consumption on X86 but ...
>
> I have put all three into a ZIP archive at
>
>    https://wasd.vsm.com.au/other/vups_230428a.zip
>    https://wasd.vsm.com.au/other/vups_230428a.txt
>
> for anyone who would like to experiment.
>
> The comparative results across a number of platforms I have access to
> are as follows (extraneous white-space removed).
> .
> .
> .

Some results from my own systems:

|$ @wasd_vups
|AlphaServer DS15 with 1 CPU and 2048MB running VMS V8.4-2L2
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 468.7 ( min: 467.4 max: 470.6 )
|
|$ @migs_vups
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 451.2 ( min: 448 max: 452 )
|
|$ @emuvm_vups
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 452.0 ( min: 452.0 max: 452.0 )
|
|$ run bogomips_axp
|AlphaServer DS15 1 CPU 2048MB V8.4-2L2
|Calibrating delay loop.. ok - 994.20 BogoMips

|$ @wasd_vups
|HP rx2620 (1.40GHz/6.0MB) with 4 CPU and 6143MB running VMS V8.4-2L3
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 488.1 ( min: 487.2 max: 488.8 )
|
|$ @migs_vups
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 473.2 ( min: 472 max: 473 )
|
|$ @emuvm_vups
|INFO: Preventing endless loop (10$) on fast CPUs
|Approximate System VUPs Rating : 473.8 ( min: 473.8 max: 473.8 )
|
|$ run bogomips_i64
|HP rx2620 (1.40GHz/6.0MB) 4 CPUs 6143MB V8.4-2L3
|Calibrating delay loop.. ok - 692.73 BogoMips

This next system is a virtual guest of an ESXi host.
The host is a DL380 Gen9 with Intel E5-2660 v3 CPUs at 2.60GHz
|$ @wasd_vups
|VMware, Inc. VMware7,1 with 4 CPU and 16124MB running VMS E9.2-1
|Approximate System VUPs Rating : 509.6 ( min: 496.6 max: 516.2 )
|
|$ @migs_vups
|Approximate System VUPs Rating : 504.2 ( min: 492 max: 508 )
|
|$ @emuvm_vups
|Approximate System VUPs Rating : 503.8 ( min: 486.2 max: 510.4 )
|
|$ run bogomips_x86
|VMware, Inc. VMware7,1 4 CPUs 16124MB E9.2-1
|Calibrating delay loop.. ok - 909.95 BogoMips

It was necessary to compile bogomips /noopt on x86. Otherwise, the
delay loop was optimized away. (For fairness, /noopt was used on the
other platforms as well).

If I knew what arguments to give to the prime_sieve program, I would
have included that as well.

Mark Berryman

Single Stage to Orbit

unread,
Apr 28, 2023, 2:02:31 PM4/28/23
to
On Thu, 2023-04-27 at 20:19 -0400, fswo...@gmail.com wrote:
> $ @vups2
> innotek GmbH VirtualBox with 2 CPU and 7936MB running VMS E9.2-1
> INFO: Preventing endless loop (10$) on fast CPUs
>  
> Approximate System VUPs Rating : 414.8 ( min: 408.4 max: 418.0 )

Where did you get your vups2 from?

I neglected to mention what my system is: AMD Threadripper 2970wx (48
logical cores) with 128GB ECC RAM. VM runs on a spinning rust RAID
array, although plans are afoot to replace that with NVME RAID.
--
Tactical Nuclear Kittens

Single Stage to Orbit

unread,
Apr 28, 2023, 2:02:31 PM4/28/23
to
On Thu, 2023-04-27 at 23:26 -0500, John H Reinhardt wrote:
> Approximate System VUPs Rating : 693.1 ( min: 658.2 max: 705.6 )

That is _quick_!
--
Tactical Nuclear Kittens

Volker Halle

unread,
Apr 28, 2023, 2:03:06 PM4/28/23
to
Mark Berryman schrieb am Freitag, 28. April 2023 um 19:41:23 UTC+2:

> If I knew what arguments to give to the prime_sieve program, I would
> have included that as well.

Mark,

I typically run PRIME_SIEVE to calculate prime numbers between 1 and 10^10, so just give it 10000000000

$ ps :==$dev:<dir>prime_sieve.exe
$ ps 10000000000
# {1 <= primes <= 10000000000} = 455052511

and report F$GETSYI("CPUTIM") difference for the execution.

Volker.

Mark Berryman

unread,
Apr 28, 2023, 4:20:21 PM4/28/23
to
Ah. In that case:

Alpha: 7577
IA64: 6813
x86_64: 3306

Mark Berryman

Jan-Erik Söderholm

unread,
Apr 28, 2023, 6:14:41 PM4/28/23
to
Den 2023-04-28 kl. 14:22, skrev Simon Clubley:
> On 2023-04-28, Volker Halle <volker...@hotmail.com> wrote:
>>
>> We can continue the VUPS discussion here, but should discuss the KERNEL mode question in the VSI forum.
>>
>
> Why ? The Kernel mode overhead is the real issue that needs discussing,
> not some artificial CPU speed test.
>
> Simon.
>

Why? That should be very obvious.

There is already an ongoing diskussion over at the Forum about this.
The posts and code posted are much eiser to read.
Much better thread history.

If the kernel overhead is viewed as "the real problem", it is better
disussed on the Forum, not here...

terry-...@glaver.org

unread,
Apr 28, 2023, 11:07:46 PM4/28/23
to
On Friday, April 28, 2023 at 10:33:49 AM UTC-4, Mark Daniel wrote:
> I have put all three into a ZIP archive at
>
> https://wasd.vsm.com.au/other/vups_230428a.zip
> https://wasd.vsm.com.au/other/vups_230428a.txt
>
> for anyone who would like to experiment.

This is on an EmuVM Alpha with a FreeBSD 12.4 host OS with dual E5-2643 v3 CPUs:

SERVER::$ unzip vups_230428a.zip
Archive: SYS$SYSDEVICE:[SYSPROG.TERRY]VUPS_230428A.ZIP;1
inflating: emuvm_vups.com
inflating: migs_vups.com
inflating: wasd_vups.com
SERVER::$ @emuvm_vups

Approximate System VUPs Rating : 825.4 ( min: 774.6 max: 845.8 )
SERVER::$ @migs_vups
INFO: Preventing endless loop (10$) on fast CPUs
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 832.5 ( min: 799 max: 850 )
SERVER::$ @wasd_vups
AlphaServer DS10 616 MHz with 1 CPU and 2048MB running VMS V8.4-2L1
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 889.2 ( min: 811.0 max: 925.2 )

John H Reinhardt

unread,
Apr 29, 2023, 12:07:16 AM4/29/23
to
By comparison, your DS10 is a hot rod vs my two. I have a "stock" DS10 617MHz and another that is a DS10L 617MHz mainboard I transplanted into a regular DS10 chassis. Both VUP out below half of your figure.

Stock DS10:

$ @wasd_vups
AlphaServer DS10 617 MHz with 1 CPU and 2048MB running VMS V8.4-2L1
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 331.4 ( min: 329.4 max: 332.8 )

Upgraded DS10:
$ @wasd_vups
AlphaServer DS10L 617 MHz with 1 CPU and 2048MB running VMS V8.4-2L1
INFO: Preventing endless loop (10$) on fast CPUs

Approximate System VUPs Rating : 325.8 ( min: 324.6 max: 327.0 )


I wonder what the difference is.

John H. Reinhardt

Volker Halle

unread,
Apr 29, 2023, 12:15:30 AM4/29/23
to
Volker Halle schrieb am Donnerstag, 27. April 2023 um 19:49:13 UTC+2:

Here are the correct commands for the PCS$SDA PC sampling SDA extension - sorry about that ;-(

$ ANA/SYS
SDA> READ/EXEC ! get better symbolization (routine names)
SDA> PCS LOAD ! PC sampling SDA Execlet
SDA> PCS START TRACE
...
SDA> PCS STOP TRACE
SDA> PCS SHOW TRACE/STAT/MODE=KERNEL
...
SDA> PCS UNLOAD
SDA> EXIT

SDA> PCS SHOW TRACE/STAT/MODE=KERNEL will show the PC values seen most often during the trace period (sorted by decreasing no. of occurences) and it will symbolize the PC values as routine names or execlet names. This might give you an idea, which kind of kernel mode code is running how often.

Volker.

terry-...@glaver.org

unread,
Apr 29, 2023, 1:26:14 AM4/29/23
to
On Saturday, April 29, 2023 at 12:07:16 AM UTC-4, John H Reinhardt wrote:
> By comparison, your DS10 is a hot rod vs my two. I have a "stock" DS10 617MHz and another that is a DS10L 617MHz mainboard I transplanted into a regular DS10 chassis. Both VUP out below half of your figure.
>
> I wonder what the difference is.

Mine is emulated via AlphaVM on a Dell Poweredge R730.

It would be interesting to run the same tests on native X86-64 VMS on the same hardware.

John H Reinhardt

unread,
Apr 29, 2023, 2:18:59 AM4/29/23
to
On 4/29/2023 12:26 AM, terry-...@glaver.org wrote:
> On Saturday, April 29, 2023 at 12:07:16 AM UTC-4, John H Reinhardt wrote:
>> By comparison, your DS10 is a hot rod vs my two. I have a "stock" DS10 617MHz and another that is a DS10L 617MHz mainboard I transplanted into a regular DS10 chassis. Both VUP out below half of your figure.
>>
>> I wonder what the difference is.
>
> Mine is emulated via AlphaVM on a Dell Poweredge R730.
>
Doh! I missed that.

> It would be interesting to run the same tests on native X86-64 VMS on the same hardware.

Yes it would

Volker Halle

unread,
Apr 29, 2023, 12:07:51 PM4/29/23
to
Running the various VUPS procedures on my VSI OpenVMS x86-64 E9.2-1 system on VMware Workstation Player 16 on an i5-9600K CPU @3.7 GHz:

$ @wasd_vups.com
VMware, Inc. VMware7,1 with 2 CPU and 3836MB running VMS E9.2-1
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 757.6 ( min: 757.6 max: 757.6 )

$ @migs_vups.com
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 747.9 ( min: 745 max: 753 )

$ @emuvm_vups.com
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 749.2 ( min: 745.2 max: 753.4 )

It should be noted that the following VUPS procedure from Stromasys has been 'calibrated' based on the relative PRIME_SIEVE integer CPU performance (see link above) data to deliver CPU performance data comparable across all 4 different OpenVMS architectures (VAX, Alpha, I64 and x86-64).

$ @SRI_VUPS_NEW.COM
Approximate System VUPs Rating : 3706.0 ( min: 3700 max: 3717 )

PRIME_SIEVE - calculating no. of primes in 10^10 ...
# {1 <= primes <= 10000000000} = 455052511
Calculated no. of primes in 10^10 - end=18:04:31.42 start=18:04:12.11
Delta time is 00:00:19.31 - Delta CPUTIM is 1924 ticks

Volker.

Volker Halle

unread,
Apr 29, 2023, 12:17:31 PM4/29/23
to
terry- schrieb am Samstag, 29. April 2023 um 07:26:14 UTC+2:

> It would be interesting to run the same tests on native X86-64 VMS on the same hardware.

Terry,

I would not expect much difference when running these VUPS tests natively on physical hardware. It's mostly CPU bound and this is 'virtualization' not 'emulation' and the instructions are executed unchanged on the physical processor. IOs would need some intervention by the hypervisor and that could make a difference.

Volker.

terry-...@glaver.org

unread,
Apr 29, 2023, 2:53:39 PM4/29/23
to
But my emulator is providing Alpha hardware emulation (per the "DS10" line in my VUPS output). What I'm interested in seeing is how VMS x86-64 performs on this Dell server, compared to the additional overhead of providing a complete emulated DS10.

Robert A. Brooks

unread,
Apr 29, 2023, 3:07:34 PM4/29/23
to
On 4/29/2023 2:53 PM, terry-...@glaver.org wrote:

> But my emulator is providing Alpha hardware emulation (per the "DS10" line in
> my VUPS output). What I'm interested in seeing is how VMS x86-64 performs on
> this Dell server, compared to the additional overhead of providing a complete
> emulated DS10.

Any performance numbers for VMS on X86_64 are not *that* interesting until it's
built with native optimizing compilers.

Native compilers are becoming available, but the build process will take some
time to convert to a native build.

I do not think (but am not sure) that V9.2-2 (scheduled for near the end of the
year) would be a native build. Early builds of that version are being done
with cross compilers.

--

--- Rob

David Jones

unread,
Apr 29, 2023, 3:56:27 PM4/29/23
to
On Friday, April 28, 2023 at 8:22:29 AM UTC-4, Simon Clubley wrote:
> On 2023-04-28, Volker Halle <volker...@hotmail.com> wrote:
> >
> > We can continue the VUPS discussion here, but should discuss the KERNEL mode question in the VSI forum.
> >
> Why ? The Kernel mode overhead is the real issue that needs discussing,
> not some artificial CPU speed test.
> Simon.

I made a wrapper for the prime_sieve.c V2 program (making it a callable function and redirecting its calls to printf)
so I could experiment with making it multi-threaded. My DS10L/466 Alpha can do the 10^9 test in 20.1 seconds if
using a single worker that assumes only 1 worker. If you remove that assumption (create a 2nd thread), the time
goes up to 26.8 seconds due to the ~500,000 calls to pthread_getspecific. My emulated DS20, which is emulating
2 processors, goes from 51.7 seconds to 57.6 for the same experiment. A fair amount of overhead, but multiple
CPUs will improve the final run time.

On my X86 hardware (i5-1235U), the single worker case takes 3.4 seconds while adding in the pthread_calls increases
the run time to 37.6 seconds! In the first case, monitor shows 100% user mode while the second case show 50%
kernel mode.

Volker Halle

unread,
Apr 30, 2023, 12:26:41 AM4/30/23
to
terry- schrieb am Samstag, 29. April 2023 um 20:53:39 UTC+2:

> But my emulator is providing Alpha hardware emulation (per the "DS10" line in my VUPS output). What I'm interested in seeing is how VMS x86-64 performs on
> this Dell server, compared to the additional overhead of providing a complete emulated DS10.

Terry,

if you can, install a supported Hypervisor on this DELL server under FreeBSD 12.4 host OS, create a VM, install VSI OpenVMS x86-64 and see for yourself.

Volker.

Volker Halle

unread,
Apr 30, 2023, 12:42:06 AM4/30/23
to
David Jones schrieb am Samstag, 29. April 2023 um 21:56:27 UTC+2:

David,

thanks for this experiment. It shows the current huge overhead, when introducing kernel mode calls. Let's be fair: VSI is still concentrating on functionality vs. performance for OpenVMS x86-64. So we should wait for the native build of the OS - as indicated by Rob.

Volker.

terry-...@glaver.org

unread,
Apr 30, 2023, 3:53:25 AM4/30/23
to
On Sunday, April 30, 2023 at 12:26:41 AM UTC-4, Volker Halle wrote:
> if you can, install a supported Hypervisor on this DELL server under FreeBSD 12.4 host OS, create a VM, install VSI OpenVMS x86-64 and see for yourself.

I haven't received access to the kits / licenses yet.

When I do, I'll do it on a different R730 (same CPUS / memory) as this box is hosting my AlphaVM instance which I depend on for email and is also serving various VMS freeware.

Jan-Erik Söderholm

unread,
Apr 30, 2023, 7:15:12 AM4/30/23
to
Just to add to the other stats presented...

VirtualBox 7.0 on a HP ProBook 450 G8 laptop having:
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz
running Win10 Pro.

COMs are from the ZIP kit from Daniel.

Removed that "preventing" output and lining up the numbers...

$ @wasd_vups.com
innotek GmbH VirtualBox with 2 CPU and 3840MB running VMS E9.2-1
Approximate System VUPs Rating : 778.2 ( min: 770.2 max: 783.4 )

$ @migs_vups.com
Approximate System VUPs Rating : 766.0 ( min: 766 max: 766 )

$ @