Microsoft cheats on Java Benchmark (was Sun cheats on Java Benchmark)

Ray Whitmer

unread,

Nov 14, 1997, 3:00:00 AM11/14/97

to

Stuart A Yeates wrote:

> Tucker Balch (tuc...@cc.gatech.edu) wrote:
> the developer of the CaffeineMark Java performance benchmark, today
> presented evidence that a Sun Microsystems Java compiler
specifically
> identifies part of the CaffeineMark test and produces a misleading
> score.
>
> The full press release is available at:
> http://www.webfayre.com/pr1197-2.html
>
> i wonder why you didn't cross post to comp.compilers ?
>
> could it be because this is standard practice and the compiler writers

> there would have laughed ?

And apparently Sun is not the first to cheat on this particular
benchmark.
I quote from a Netscape page which accuses Microsoft:

http://home.netscape.com/comprod/products/communicator/comparison/realstory.html

which says:

"Microsoft has "spoofed" the Pendragon Software's CaffeineMark Java
benchmark
to produce some inaccurate results (the benchmark has since been fixed
by
Pendragon Software). Communicator 4.0 is actually faster than IE 4.0
Preview 2
in very important performance areas, including imaging (over 30 percent
faster) and AWT performance (over 130 percent faster). In compute
performance,
IE 4.0 Preview 2 is only about 20 percent faster.

For most real-world applications, therefore, Communicator 4.0 will
generally
be faster than IE 4.0 Preview 2. In addition, Communicator 4.0 is much
faster
than IE 3.0 in every category. "

I wonder how many of the same people ranting about Sun, were previously
raving about how much better/faster the Microsoft VM was than
Netscape's.

The lesson to be taken from this all is not Sun versus Microsoft, but
simply if relative speeds of VMs are important to you, do your own
benchmarks, trying real applications where the performance makes a
difference to you, not relying on meaningless benchmarks.

Ray Whitmer

Jeffrey C. Dege

unread,

Nov 15, 1997, 3:00:00 AM11/15/97

to

On Fri, 14 Nov 1997 15:18:46 -0700, Ray Whitmer <raywh...@itsnet.com> wrote:
>
>And apparently Sun is not the first to cheat on this particular
>benchmark.
>I quote from a Netscape page which accuses Microsoft:
>
>http://home.netscape.com/comprod/products/communicator/comparison/realstory.html
>
>which says:
>
>"Microsoft has "spoofed" the Pendragon Software's CaffeineMark Java benchmark
>to produce some inaccurate results (the benchmark has since been fixed by
>Pendragon Software). Communicator 4.0 is actually faster than IE 4.0 Preview 2
>in very important performance areas, including imaging (over 30 percent
>faster) and AWT performance (over 130 percent faster). In compute performance,
>IE 4.0 Preview 2 is only about 20 percent faster.

Pendragon's own web page discuesses this, though they don't use the word
``spoof'' and they don't mention any vendor by name.
See: http://www.webfayre.com/pendragon/cm3/optimize.html

>The lesson to be taken from this all is not Sun versus Microsoft, but
>simply if relative speeds of VMs are important to you, do your own
>benchmarks, trying real applications where the performance makes a
>difference to you, not relying on meaningless benchmarks.

A benchmark that doesn't measure the aspects of a system that are
critical to your use is meaningless. The _only_ benchmarks that
measure the aspects of a system that are critical to your use
are the ones that _you_ write.

--
APL is a mistake, carried through to perfection. It is the language of the
future for the programming techniques of the past: it creates a new generation
of coding bums.
-- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5

Bob O

unread,

Nov 16, 1997, 3:00:00 AM11/16/97

to

On Fri, 14 Nov 1997 22:18:46, Ray Whitmer <raywh...@itsnet.com> wrote: > And apparently Sun is not the first to cheat on this particular > benchmark. > I quote from a Netscape page which accuses Microsoft: > http://home.netscape.com/comprod/products/communicator/comparison/realstory.html > which says: > "Microsoft has "spoofed" the Pendragon Software's CaffeineMark Java > benchmark > to produce some inaccurate results (the benchmark has since been fixed > by > Pendragon Software). Communicator 4.0 is actually faster than IE 4.0 > Preview 2 > in very important performance areas, including imaging (over 30 percent > faster) and AWT performance (over 130 percent faster). In compute > performance, > IE 4.0 Preview 2 is only about 20 percent faster. > For most real-world applications, therefore, Communicator 4.0 will > generally > be faster than IE 4.0 Preview 2. In addition, Communicator 4.0 is much > faster > than IE 3.0 in every category. " > I wonder how many of the same people ranting about Sun, were previously > raving about how much better/faster the Microsoft VM was than > Netscape's. > The lesson to be taken from this all is not Sun versus Microsoft, but > simply if relative speeds of VMs are important to you, do your own > benchmarks, trying real applications where the performance makes a > difference to you, not relying on meaningless benchmarks. > Ray Whitmer Hmmmm, perhaps someone with some skill in the area should scan the newsgroup for all those folks that said they would never do business with someone who cheats on benchmarks and hold their toes to the fire.<g> IBM says: "Downloading ActiveX over the web is like bungee jumping from the Empire State Building--without the cord." http://www.ibm.com/java/community/commentary.html

Ray Whitmer

unread,

Nov 17, 1997, 3:00:00 AM11/17/97

to Gary McGath

Gary McGath wrote:

> In article <346CCE46...@itsnet.com>, Ray Whitmer

> <raywh...@itsnet.com> wrote:
> >The lesson to be taken from this all is not Sun versus Microsoft, but
> >simply if relative speeds of VMs are important to you, do your own
> >benchmarks, trying real applications where the performance makes a
> >difference to you, not relying on meaningless benchmarks.
>

> This last is certainly true. But we should note that Netscape doesn't
> provide any specifics to back up its claim, and that it's not exactly a
> disinterested party. I would not convict a company solely on the basis of
> an accusation by its competitor.

I agree, but:

1. This is reportedly a common practice to modify a compiler to bypass specific
meaningless operations in specific benchmarks.

2. It isn't that much of a stretch when many independant simple unofficial
benchmarks (my own included) showed that the Microsoft performance claims had
little relationship to common real-world experience.

3. Netscape's statement seemed to say that the Pendragon benchmark had been
changed as a result. It should, then, be possible to confirm or deny this with
Pendragon, and also to discover why Pendragon was apparently so biased in the way
they publicize the problems of Sun while not raising a protest when Microsoft
apparently previously did a similar thing. Could it be that some companies have
a better relationship with Pendragon and others do not? Or that some will help
Pendragon get into the news and try to establish themselves while others will be
of little impact? How do you determine who is biased by the competitor?

4. If Netscape's claims are not true, then I think Pendragon has a much bigger
issue against Netscape for saying what they did than they did against Sun for
spoofing the benchmarks. If Netscape's claims are true, then Sun has a much
bigger issue against Pendragon than Pendragon has against Sun. In either case,
they are harping on the wrong issue, in their own self-interest, which throws
doubt on their creditability, if their credibility were not already destroyed by
the unrealistic results they have published in the past.

Back to the real lesson here... Benchmark it yourself.

Ray Whitmer

Richard M. Smith

unread,

Nov 17, 1997, 3:00:00 AM11/17/97

to

Ray,

I spoke with the folks at Pendragon today here at Comdex. I asked
them about what Netscape had to say about Microsoft and the
CaffineMark benchmarks. This is the first they have heard
of the Netscape charges and were not really sure what Netscape
was talking about. They did mention that version 2.5 of the
benchmarks could be "over-optimized", but these problems have
been fixed in version 3.0. Except for Sun, everyone appears
to be playing it straight up on version 3.0.

They did tell me one more amusing thing about what Sun did. In
the test that Sun threw away, Pendragon changed one statement
from "if(a == b)" to "if(b == a)". This change of course should
give the same results, but the Sun compiler suddendly got a lot
slower. That's because the byte codes didn't compare anymore,
and the compiler had to really do some work.

Richard

Jeffrey C. Dege

unread,

Nov 18, 1997, 3:00:00 AM11/18/97

to

On Mon, 17 Nov 1997 07:41:25 -0700, Ray Whitmer <raywh...@itsnet.com> wrote:
>Gary McGath wrote:
>
>3. Netscape's statement seemed to say that the Pendragon benchmark had been
>changed as a result. It should, then, be possible to confirm or deny this with

>Pendragon, [...]

http://www.webfayre.com/pendragon/cm3/optimize.html

--
First twenty-one new features that somehow we must add in.
Then thirty-seven changes show up much to our chagrin.
And this thing's just inadequate, and that one's just plain wrong.
And by the way your schedule is about three months too long.

Ray Whitmer

unread,

Nov 18, 1997, 3:00:00 AM11/18/97

to

Richard M. Smith wrote:

> Ray,
>
> I spoke with the folks at Pendragon today here at Comdex. I asked
> them about what Netscape had to say about Microsoft and the
> CaffineMark benchmarks. This is the first they have heard
> of the Netscape charges and were not really sure what Netscape
> was talking about. They did mention that version 2.5 of the
> benchmarks could be "over-optimized", but these problems have
> been fixed in version 3.0. Except for Sun, everyone appears
> to be playing it straight up on version 3.0.

And the Netscape page I quoted didn't claim Microsoft spoofed the latest
version. They said Pendragon had fixed the problem. Perhaps the benchmarks
have become more credible with version 3.0. Unfortunately, fixes are likely
to be as superficial as the benchmarks, so unless they have significantly
expanded them, their claims are probably overly optimistic. Anything that
doesn't exercise significant chunks of diverse code is suspect. I will
believe them a little more when I see results published that match my own
experience and that of others I work with. Unfortunately, the web pages
have been inaccessible from this part of the web (trying from two different
entry points).

Ray Whitmer

Roedy Green

unread,

Nov 18, 1997, 3:00:00 AM11/18/97

to

>Back to the real lesson here... Benchmark it yourself.

Any ARTIFICIAL benchmark is going to be misleading either because:
1. a smart compiler optimised it out of existence.
2. a smart programmer optimised the system to run especially well with it.
3. somebody cheated or bent the rules.
4. hardware runs exceptionally fast on small tight loops that float into
cache or registers, this is not representative or real-world code.

The best sort of benchmark is some REAL WORLD work. Even if the code is
not identical on both machines, so what. That is what you will have to
contend with when you use the machines later. The hardware boys may scream
unfair. But that's all the end user cares about.

Roedy Green Roedy rhymes with Cody ro...@bix.com ICQ:5144242
Canadian Mind Products contract programming (250) 285-2954
POB 707 Quathiaski Cove Quadra Island BC Canada V0P 1N0
http://oberon.ark.com/~roedy for CMP utilities and the Java glossary
-30-

Jeffrey C. Dege

unread,

Nov 22, 1997, 3:00:00 AM11/22/97

to

On Sat, 22 Nov 97 23:12:39 GMT, Gerry Quinn <ger...@indigo.ie> wrote:
>In article <34705795...@itsnet.com>, Ray Whitmer <raywh...@itsnet.com> w
> rote:

>
>>Back to the real lesson here... Benchmark it yourself.
>

>And don't trust Sun.

WHen it comes to determining whether a particular product will meet your
needs, don't trust _anybody_.

--
The customer proceeds to go through each change line-by-line.
Excruciating detail, which no logic can divine.
And when it ends there's only four not sitting their agog:
The customer, the manager, the pony and the dog.

Gerry Quinn

unread,

Nov 22, 1997, 3:00:00 AM11/22/97

to

In article <34705795...@itsnet.com>, Ray Whitmer <raywh...@itsnet.com> wrote:

>Back to the real lesson here... Benchmark it yourself.

And don't trust Sun.

- Gerry

===========================================================
ger...@indigo.ie (Gerry Quinn)
http://indigo.ie/~gerryq
Original puzzlers for PC, Amiga, and Java
===========================================================

Dave Harris

unread,

Nov 23, 1997, 3:00:00 AM11/23/97

to

gmc...@ultranet.com (Gary McGath) wrote:
> The fact that Sun lied goes far beyond "whether a particular product
> will meet your needs."
>
> I'm very disturbed by the number of messages in this thread suggesting
> that willful deception should just be regarded as "business as usual."
> The Java community should be outraged at what Sun pulled. Instead,
> the response looks more like a cynical yawn.

"Sun lied" is your interpretation of events. Sun say they made an honest
mistake, released a version they didn't mean to. They've apologised and
retracted the offending version.

I am giving them the benefit of the doubt. This is partly because of the
shear stupidity of what they did. Had they cheated and meant to get away
with it, they'd (a) have hacked all the tests, not just one; and (b) put
in some delay loops so that it wasn't quite so obvious. I admit to being
influenced by their past reputations, and their willingness to apologies
when they finally realised what they'd done.

For what it's worth, I give Microsoft the benefit of the doubt in similar
circumstances. Cockup is more plausible than conspiracy.

Dave Harris, Swansea, UK | "Weave a circle round him thrice,
bran...@cix.co.uk | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."

Roedy Green

unread,

Nov 23, 1997, 3:00:00 AM11/23/97

to

Dave Harris wrote:
>I am giving them the benefit of the doubt. This is partly because of the
>shear stupidity of what they did.

Never ascribe to malice that which can be explained by incompetence.

Napolean said something similar.

The benchmark result requires the action of only one employee. He probably
did not even consider it cheating. It was just another pattern to add to
his peep-hole optimisation list.

The fault in my opinion lies in the benchmark. Benchmarks should not just
spin their wheels. They should do something complicated. Compilers (and
chips for that matter) have every right to optimise silly benchmarks out of
existence.

Soggy

unread,

Nov 24, 1997, 3:00:00 AM11/24/97

to

Granting Sun a little slack is all well and good but to consider it a simple
mistake or a single employee malfeasance tends to sound a little lame after
considering that Sun apparently rolled out national advertising quoting the
"cooked" test results. They also never voluntarily stepped forward even
after the benchmark company brought it to their attention (privately) but
rather only after the benchmark company went public. Even at that point with
several different ver. or excuses trying to get the spin down right.
Incidentally the software wasn't really optimized to run a given function
better (basically which is more or less acceptable) but rather to recognize
the test sequence and serve up false results.... sounds "dirty" from
here....Soggy

Dave Harris wrote in message ...

>gmc...@ultranet.com (Gary McGath) wrote:
>> The fact that Sun lied goes far beyond "whether a particular product
>> will meet your needs."
>>
>> I'm very disturbed by the number of messages in this thread suggesting
>> that willful deception should just be regarded as "business as usual."
>> The Java community should be outraged at what Sun pulled. Instead,
>> the response looks more like a cynical yawn.
>
>"Sun lied" is your interpretation of events. Sun say they made an honest
>mistake, released a version they didn't mean to. They've apologised and
>retracted the offending version.
>

>I am giving them the benefit of the doubt. This is partly because of the

Daniel Phillips

unread,

Nov 24, 1997, 3:00:00 AM11/24/97

to

Gary McGath wrote:

> I'm very disturbed by the number of messages in this thread suggesting that
> willful deception should just be regarded as "business as usual." The Java
> community should be outraged at what Sun pulled. Instead, the response
> looks more like a cynical yawn.
>

> --
> Gary McGath

I'm a Java fan. I think it's wrong to cheat on benchmarks. I think everyone
does it, and that's even more wrong. I think it was wrong for Sun to cheat.
Sun paid the price in handing Microsoft and their hordes of Media$ucks a
great issue to try and attack Sun's integrity on. It wasn't enough - Sun had
is still plenty credible with most folks. Give it a rest. You're just
reinforcing
your reputation throwing mud instead of addressing issues.

I'm pretty sure Sun isn't going to cheat on a Java benchmark again. But I
wouldn't go so far as to say that about Microsoft - the day that Microsoft
exhibits shame or remorse any time they get caught with their hand in the
cookie jar I'll have to duck to avoid the flying pigs.

--
Daniel Phillips
phillips at dowco.com

Dave Harris

unread,

Nov 24, 1997, 3:00:00 AM11/24/97

to

so...@pacifier.com (Soggy) wrote:
> Granting Sun a little slack is all well and good but to consider it a
> simple mistake or a single employee malfeasance tends to sound a
> little lame after considering that Sun apparently rolled out national
> advertising quoting the "cooked" test results.

Have you read their account? Once the original mistake got made, they say
the information "escaped" around the company. It's something that got set
in motion. The events don't justify supposing a high-level attempt to
mislead.

> They also never voluntarily stepped forward even after the
> benchmark company brought it to their attention (privately) but
> rather only after the benchmark company went public. Even at that
> point with several different ver. or excuses trying to get the spin
> down right.

It apparently took them a while to realise what they had done.

In an ideal world, cock-ups wouldn't happen. I've worked in industry long
enough to know that in real life, they do. It's not so incredible.

> Incidentally the software wasn't really optimized to run a given
> function better (basically which is more or less acceptable) but
> rather to recognize the test sequence and serve up false results....
> sounds "dirty" from here....Soggy

I know what they did; nothing in my earlier post contradicts that account.
It still sounds like a plausible mistake to me.

Pohl Longsine

unread,

Nov 24, 1997, 3:00:00 AM11/24/97

to

<gmc...@ultranet.com> wrote:
>
>I'm very disturbed by the number of messages in this thread
>suggesting that willful deception should just be regarded as
>"business as usual."

It is "business as usual" in the computer industry. Who here
doesn't understand that benchmarks are a tool that vendors use
to lie to their customers? This is really, really old news.
Video accelerator cards have had built-in benchmark-specific
"optimizations" for years.

I already don't trust the benchmark claims of any vendor for
any product. What more do you want? Outrage? I'll
save my outrage for product dumpers, thank you.

>The Java community should be outraged at what Sun pulled.
>Instead, the response looks more like a cynical yawn.

Which part of "caveat emptor" don't you understand?

Ray Whitmer

unread,

Nov 24, 1997, 3:00:00 AM11/24/97

to

Gary McGath wrote:

> In article <slrn67e3em...@jdege.visi.com>, jd...@nospam.visi.com wrote:
>
> >On Sat, 22 Nov 97 23:12:39 GMT, Gerry Quinn <ger...@indigo.ie> wrote:

> >>In article <34705795...@itsnet.com>, Ray Whitmer
> <raywh...@itsnet.com> w
> >> rote:
> >>
> >>>Back to the real lesson here... Benchmark it yourself.
> >>
> >>And don't trust Sun.
> >

> >WHen it comes to determining whether a particular product will meet your
> >needs, don't trust _anybody_.
>

> The fact that Sun lied goes far beyond "whether a particular product will
> meet your needs."
>

> I'm very disturbed by the number of messages in this thread suggesting that

> willful deception should just be regarded as "business as usual." The Java

> community should be outraged at what Sun pulled. Instead, the response
> looks more like a cynical yawn.

I think you prefer to believe that they lied, and desire to continue believing
this so strongly that you call this a fact, as certain others have. While we have
become accustomed to lies coming from various prominent companies and
publications, Sun's explanation would have us believe this was a simple mistake.
Are you offering proof that it was not?

It certainly was openly corrected before it had any major impact that I am aware
of, unlike, IMO, all the little FUDsters running around raving about the supposed
orders of magnitude difference in the performance of the MS VM over the
competitors, which I never heard Microsoft state was an unrealistic result, but it
clearly was for my use and analysis of the VMs.

You certainly misunderstood my post if what you got from it that I was being
cynical. You might as well call me cynical for being realistic about the
difficulty of other computing problems such as the factoring of large numbers used
to create encryption keys.

Typical benchmarks are inherently meaningless, and it is in the struggle to give
them artificial meaning that many mistakes and inaccurate representations are
made. If companies are tuning their work for the benchmark instead of for useful
applications, it is the fault of people who have such unbounded faith in
benchmarks. The inaccuracies at Sun and Microsoft would not have occured if it
were not for this, and this is their bigger mistake, but they are not alone in
this.

I am the best one determine whether a product meets my requirements. Programming
is complex and just because you can boil the performance down to a few simple
numbers doesn't mean that those numbers are universally meaningful. Are you
looking for some kind of father figure to tell you what is the best option for
you? Try Bill Gates, he'll be happy to oblige you. But that is a situation which
invites far more cynicism than recognizing that everyone's requirements are
different, and will be served differently, and there is no such thing as an
unbiased opinion, benchmark, or product review.

Ray Whitmer

flm...@ibm.net

unread,

Nov 25, 1997, 3:00:00 AM11/25/97

to

In <gmcgath-ya0240800...@news.ma.ultranet.com>, gmc...@ultranet.com (Gary McGath) writes:
>In article <slrn67e3em...@jdege.visi.com>, jd...@nospam.visi.com wrote:
>
>>On Sat, 22 Nov 97 23:12:39 GMT, Gerry Quinn <ger...@indigo.ie> wrote:
>>>In article <34705795...@itsnet.com>, Ray Whitmer
><raywh...@itsnet.com> w
>>> rote:
>>>
>>>>Back to the real lesson here... Benchmark it yourself.
>>>
>>>And don't trust Sun.
>>

>>When it comes to determining whether a particular product will meet your

>>needs, don't trust _anybody_.
>
>The fact that Sun lied goes far beyond "whether a particular product will
>meet your needs."
>
>I'm very disturbed by the number of messages in this thread suggesting that
>willful deception should just be regarded as "business as usual." The Java
>community should be outraged at what Sun pulled. Instead, the response
>looks more like a cynical yawn.

>--
You should be more disturbed that this noise is obscuring the real news
which is that:

OS/2 is Now the Best Performing Intel Platform for Java.

"As part of a commitment to deliver premiere Java platforms, IBM has
implemented the complete set of service fixes from JavaSoft and tuned
both the virtual machine and supporting operating system elements.
CaffeineMark 3.0 benchmark results indicate Java applications run 7% faster
on OS/2 Warp 4 than on Microsoft Internet Explorer 4.0 on Windows NT.
Performance comparisons between the OS/2 versions of Java 1.1.4 and
Java 1.1.1 showed a 50% overall improvement."

see http://www.software.ibm.com/os/warp/warpfm/warpex40/

IBM has patents that may keep OS/2 the fastest Intel platform for Java
for many years.

http://www.eskimo.com/~mighetto/lsbench.htm
is useful in reviewing the kinds of benchmarking