Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Computational accuracy of SPSS

72 views
Skip to first unread message

Andreas Völp

unread,
Dec 9, 2009, 5:28:44 AM12/9/09
to
In trying to verify the computational accuracy of some SPSS procedures I
have come across the reference data sets provided by the National Institute
of Standards and Technology (http://www.itl.nist.gov/div898/strd/). During
my tests I was able to replicate most of the "certified" results, but SPSS
failed in case of ANOVA models with a "higher" level of difficulty (i.e.,
dependent variables with a large number of "significant" digits). An example
where SPSS produced totally different results can be found under
http://www.itl.nist.gov/div898/strd/anova/SmLs07.html.

I have tried SPSS Versions 15 and 17 on PC systems running under MS Windows
XP Prof., with both AMD and Intel processors. I have also tried to replicate
the results for this dataset with StatXact (version 8), and it failed as
well (different results than SPSS, but no less incorrect).

Could this be a matter of the PC architecture, or does our software produce
inexact results?

--
Andreas V�lp

Psy Consult Scientific Services
Frankfurt, Germany

Brendan Halpin

unread,
Dec 9, 2009, 5:57:41 AM12/9/09
to
How different are the results?

In Stata, the default is to read data as floats, which truncates the
response variable badly and gives completely wrong results. Reading the
data as double gives results that match quite well (R-sq to 7 digits, SS
to 5 or 6).

The difficulty the test is creating is that the response variable has a
very small range around a very large mean (1e12, range about 0.5), which
will stress the accuracy of floating point representations.

Brendan
--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F2-025 x 3147
mailto:brendan...@ul.ie http://www.ul.ie/sociology/brendan.halpin.html

Andreas Völp

unread,
Dec 9, 2009, 7:23:15 AM12/9/09
to
Brendan:

> How different are the results?

... much different. Here are some examples for the SmLs07 data set:

"Certified"
My SPSS
SSQ between 1.68000000000000E+00 3.478801089119
SSQ within 1.80000000000000E+00 0.000000000000
R-Squared 4.82758620689655E-01 1.000000000000

... and this is the syntax that I use:

data list / group 1-12 dependent 13-32 (1).

begin data.
1 1000000000000.4
1 1000000000000.3
... snip ...
9 1000000000000.4
9 1000000000000.6
end data.

UNIANOVA dependent BY group
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = group .


> In Stata, the default is to read data as floats, which truncates the
> response variable badly and gives completely wrong results. Reading the
> data as double gives results that match quite well

I may be wrong, bit I'm not aware of different representations of numerical
data in SPSS. To the best of my knowledge data are basically either numeric
or strings. Within numerical data there are different display formats, but I
to not think that this affects the precision of the internal representation.


Andreas

Art Kendall

unread,
Dec 9, 2009, 9:02:39 AM12/9/09
to
From version 17.

from ONEWAY
ONEWAY response BY treatment
/MISSING ANALYSIS.

ANOVA
response
Sum of Squares df Mean Square F Sig.
Between Groups 1.683 8 .210 21.038 .000
Within Groups 1.800 180 .010
Total 3.484 188

From unianova.
UNIANOVA response BY treatment


/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)

/DESIGN = treatment .

Tests of Between-Subjects Effects
Dependent Variable:response
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 3.479a 8 .435 . .
Intercept 1.890E26 1 1.890E26 . .
treatment 1.707 8 .213 . .
Error .000 180 .000
Total 1.890E26 189
Corrected Total 3.479 188
a. R Squared = 1.000 (Adjusted R Squared = 1.000)

Art Kendall
Social Research Consultants

Erkki.Ko...@helsinki.fi.invalid

unread,
Dec 9, 2009, 10:39:44 AM12/9/09
to
Art Kendall <A...@drkendall.org> wrote:
: From version 17.

I second Art. ONEWAY goes correctly, UNIANOVA fails as does MEANS. Some
modules needs a brush up. Other question is that such a variables is
quite artificial. Cheers Erkki

Mobile +358 40 5024491 <http://www.helsinki.fi/~komulain/>

Bruce Weaver

unread,
Dec 9, 2009, 11:04:17 AM12/9/09
to

MIXED has trouble with that data set too. Analyzing it with
REGRESSION (and 8 indicator variables for the 9 groups) gives results
that are pretty close, but not identical to those Art got with ONEWAY.

From ONEWAY:
SS df MS F Sig.


Between Groups 1.683 8 .210 21.038 .000
Within Groups 1.800 180 .010
Total 3.484 188

From REGRESSION:
Source SS df MS F
Regression 1.680 8 .210 21.000
Residual 1.800 180 .010
Total 3.480 188

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Andreas Völp

unread,
Dec 9, 2009, 1:59:06 PM12/9/09
to
... interesting results - the oldest procedure (Onway) appears to be the
most accurate ;-)

Bruce, Art, Erkki: Which SPSS versions and operating systems have you been
using?

Andreas

Bruce Weaver

unread,
Dec 9, 2009, 2:06:46 PM12/9/09
to


Good point Andreas...I should have said. I've got SPSS/PASW v17.0.3
running under Windoze XP Professional (2002, SP3)

Art Kendall

unread,
Dec 9, 2009, 2:50:58 PM12/9/09
to
I am using version 18.

Art

Brendan Halpin

unread,
Dec 9, 2009, 3:16:42 PM12/9/09
to
On Wed, Dec 09 2009, Erkki.Ko...@Helsinki.Fi.INVALID wrote:

> Other question is that such a variables is quite artificial.

That is the point of the test. The response variable has a range of 0.5
but a mean of 1e12. The test is putting stress on the floating point
representation. As I said above, Stata gets it completely wrong with the
default 4-byte FP representation, but does reasonably well with the
"double" format (8-byte).

It's interesting that SPSS gives different results with different
procedures, which suggests that while the floating point representation
may be adequate, there are different levels of loss of accuracy in the
two algorithms.

Andreas Völp

unread,
Dec 9, 2009, 5:13:47 PM12/9/09
to
> It's interesting that SPSS gives different results with different
> procedures

... yes, indeed - and there is yet another result using the ANOVA offered as
a statistics option in the Means procedure: Using this option SPSS produces
an empty ANOVA table and issues a message that an ANOVA cannot be computed
as there is "no variance within groups" (!) (tested with SPSS versions 15
and 17 under Win XP Prof.).

Is there anybody out there using SPSS under a different operating system
and/or processor architecture, who could contribute to this interesting
discussion?

Andreas

Erkki.Ko...@helsinki.fi.invalid

unread,
Dec 10, 2009, 2:53:02 AM12/10/09
to
Andreas V�lp <Andrea...@gmx.de> wrote:
:... interesting results - the oldest procedure (Onway) appears to be the
:most accurate ;-)

:Andreas

I was running PASW 17. Erkki

<http://www.helsinki.fi/~komulain/>

0 new messages