| hello, I was hoping for some advice.. I obtained a p value of 0.0e+00 for two variables when I ran a cox regression . Is this statistically significant or not? Any advice is much appreciated, regards Bob |
Regards,
Christian
Keramat Nouri schrieb:
> These types of notations usually mean "very very small" indicating
> highly significance.
>
> cheers,
> KN
>
> ------------------------------------------------------------------------
> *From:* Robert Green <bgreen_...@yahoo.com.au>
> *To:* MedS...@googlegroups.com
> *Sent:* Friday, March 27, 2009 5:05:09 PM
> *Subject:* {MEDSTATS} a p value of 0.0e+00
>
> hello,
>
> I was hoping for some advice.. I obtained a p value of 0.0e+00 for two
> variables when I ran a cox regression .
>
> Is this statistically significant or not?
>
> Any advice is much appreciated,
>
> regards
>
> Bob
>
>
> ------------------------------------------------------------------------
> Yahoo!7 recommends that you update your browser to the new Internet
> Explorer 8. Get it now.
You bet it doesn't! For illustration, here is a little simulation
(in R ... ) of two samples, each drawn from a Normal distribution
with SD = 1, each of size 100000. X is drawn from N(0,1), and
Y is drawn from N(0.05,1), so their difference of means is 0.05
(1/20 of the SD). The resulting P-value (< 2.2e-16) is effectively
the same as 0.0e+00 -- and it is 2-sided.
set.seed(54321)
X<-rnorm(100000,mean=0,sd=1)
Y<-rnorm(100000,mean=0.05,sd=1)
t.test(X,Y,var.equal=TRUE)
# Two Sample t-test
# data: X and Y
# t = -9.963, df = 199998, p-value < 2.2e-16
# alternative hypothesis: true difference in means is not equal
to 0
# 95 percent confidence interval:
# -0.05338496 -0.03583336
# sample estimates:
# mean of x mean of y
Summary: with a large enough sample, any departure from the
Null Hypothesis, no matter how small, will yield a P-value
as small as you please.
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Mar-09 Time: 22:06:45
------------------------------ XFMail ------------------------------
----- Original Message -----From: Robert Green
----- Original Message -----From: Robert Green
>It might help understand the curious presentation 0.0e+00, although I
>think my comments still stand. What is "+00" ? Is it different to "-00" ?
>Does the "+" imply there's been some rounding ?
Some software insists on giving a sign to the exponent - in which case +00
may well mean nothing more or less complicated than plain zero.
As for 0.0e+00, is this not probably nothing more than a very contorted
representation of the ubiquitous "<0.05" ? Given that the mantissa is
expressed to only one decimal place, anything less than 0.05 will have been
rounded to 0.0.
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------
The notation [+/-]Xe[+/-]NN is a standard format for representing
floating-point numbers.
X is a signed decimal fraction with a single leading digit and
a non-zero digit following the decimal point. If the sign is
positive, then it need no be shown.
Examples: 1.23456, -0.12345
The number of decimal places expressed in X is not prescribed,
and is at the whim of the software or the choice of the user,
except that it is always at least one.
NN is a 2- or 3-digit [but see ** below] representation of the power
of 10 to multiply X by to get the number in question (if the sign
is +) or to divide by (if the sign is -). If this power of 10
is less than 10, then the possibilities are 00,01,02,03,04,05,
06,07,08,09.
Thus, given a floating-point number Y (say Y = 123.4567898765),
find the highest power of 10 (10^n) such that Y/(10^n) is between
1 and 9 inclusive, or (say Y = 0.00001234567898765) such that
Y*(10^n) is between 1 and 9 inclusive. Then round the result
(of Y/(10^n) or Y*(10^n)) to the desired number of decimal
places, and give n a "+" in the first case, or "-" in the
second case, and express it as 2 digits. If the power of 10
is zero (i.e. Y is already in the desired range) then n=00]and the
sign is "+". Hence (rounding to 4 decimal places):
Y = 123.4567898765 --[n=2, NN = +02]--> 1.2345e+02
Y = 0.00001234567898765 --[n=5, NN = -05]--> 1.2345e-05
Y = 0 --[n=0, NN = +00]--> 0.00000e+00
However: exact zeros (Y=0) are expressed as 0.0[0...]e+00,
and similarly for values of Y which are zero to machine precision
(e.g. smaller than 2.220446e-16 in standard R) or are deemed to be
zero to within some tolerance (e.g. the square root of that in
standard R, in certain tests of "equality").
To come to Martin's specific questions below:
> It might help understand the curious presentation 0.0e+00,
> although I think my comments still stand. What is "+00" ?
> Is it different to "-00"? Does the "+" imply there's been
> some rounding ?
"+00" is explained above. The "+" does not imply any rounding:
the "NN" is always an exact integer. Of course the 0.0 may be
the result of rounding (as explained above), but that's a different
issue and has nothing to do with the "+00" (puor "+NN" in general).
[**]
In the majority of numerical software on computers, the largest
number that can be represented is 2^1023 = 8.988466e+307,
so you will need "NNN" for this (indeed, once you get to 10^100
you will) but no more. This is because the arithmetic is done in
64-bit registers with a certain maximum representable power of 2.
But software for more demanding numerical tasks may use larger
registers, so in principle you could need "NNNN", "NNNNN", ....
Some software (e.g. the classic Unix progrem 'bc') csan work to
arbitrary precision (with the limits of numbers that can be stored
at all within the available storage resources of the computer).
Whether you get a number ouput in the format (e.g.)
123.456 or 1.23456e+02
will depend on a variety of things. You can set the output format
to be "fixed point" (for the first) or "floating point" for the
second. You may have set the precision to be (say) 7 digits
(total number of digits in the number), with "default" fixed point.
In that case 123.45678987654 will come out as 123.4568 (rounded),
but 1234567898.7654 will come out as 1.234568e+09, since you
can't do a number 10000000 or larger in 7 digits. And so on.
Regarding the remark that
"Something that I didn't know was that 0.1 and 0.01 cannot be
represented in binary....":
Apart from integers (which always have an exact binary representation),
almost all fractions cannot be represented exactly in binary, and
(as in the vast majority of finite-word binary digital computation)
the binary representation will be truncated and therefore will not
be exact; i.e. there is an error, albeit small; but that can still
cause major problems in certain calculations.
The only fractions which can be exactly represented in a finite
binary form are multiples of 1/(2^k) for any k, e.g. 3/4, 5/8,
47/128, ... ALL other fractions, e.g. 1/3, 1/5, 1/6, 1/7, 1/9,
1/10, 1/11, 1/12, 1/13, 1/14, 1/15, 1/17, .... cannot be so
represented.
ALL other fractions, even if finite in decimal notation, have
infinite binary representations.
e.g. 0.1[dec] = 1/10 = .00011001100110011001100110011....[bin]
i.e. a recurring binary fraction starting with 00011, followed
by infinitely many repetitions of 0011, i.e. it is (1/2) of
.0011001100110011001100110011....[bin]
Note that
0.0011[bin] = 0*(1/2) + 0*(1/4) + 1*(1/8) + 1*(1/16)[dec]
= 3/16[dec]
so, if you let that binary representation go to infinity, you get
(1/2)*(3/16)*(1 + 1/16 + 1/(16^2) + 1/(16^3) + ... )
= (1/2)*(3/16)*1/(1 - 1/16) [sum of geometric progression]
= (1/2)*(3/16)*(16/15)
= (1/2)*(1/5)
= 1/10 = 0.1[dec]
-- but only if you let the binary fraction go all the way!
SUMMARY: Digital computers do not do accurate arithmetic!
Ted.
PS: Nevertheless, you can get as close as normal people are likely
to need. I close with pi to 1000 decimal places, and a glimpse of
pi to 3325 binary digits (3323 after the point), thanks to 'bc':
scale=1000
obase=10
pi=4*a(1) # i.e. 4*atan(1)
pi
3.141592653589793238462643383279502884197169399375105820974944592307\
81640628620899862803482534211706798214808651328230664709384460955058\
22317253594081284811174502841027019385211055596446229489549303819644\
28810975665933446128475648233786783165271201909145648566923460348610\
45432664821339360726024914127372458700660631558817488152092096282925\
40917153643678925903600113305305488204665213841469519415116094330572\
70365759591953092186117381932611793105118548074462379962749567351885\
75272489122793818301194912983367336244065664308602139494639522473719\
07021798609437027705392171762931767523846748184676694051320005681271\
45263560827785771342757789609173637178721468440901224953430146549585\
37105079227968925892354201995611212902196086403441815981362977477130\
99605187072113499999983729780499510597317328160963185950244594553469\
08302642522308253344685035261931188171010003137838752886587533208381\
42061717766914730359825349042875546873115956286388235378759375195778\
18577805321712268066130019278766111959092164201988
obase=2
pi
11.00100100001111110110101010001000100001011010001100001000110100110\
00100110001100110001010001011100000001101110000011100110100010010100\
10000001001001110000010001000101001100111110011000111010000000010000\
....
11010001001011000001111101001110010100010101011010100010011001110100\
0110110111011101111000010110110110000010011011110100011101110
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09 Time: 13:42:07
------------------------------ XFMail ------------------------------
print(1234567890,digits=4)
# [1] 1.235e+09
print(1234567890,digits=5)
# [1] 1234567890
which led me to post a query to the R-help list, which drew a
couple of replies (the the same effect), one of which is below.
This is a nice example of "at the whim of the software" (though
a well-reasoned one).
Ted.
==========================================================
>>>>> "TH" == Ted Harding <Ted.H...@manchester.ac.uk>
>>>>> on Tue, 31 Mar 2009 13:59:41 +0100 (BST) writes:
TH> Hi Folks,
TH> Compare
TH> print(1234567890,digits=4)
TH> # [1] 1.235e+09
TH> print(1234567890,digits=5)
TH> # [1] 1234567890
TH> Granted that
TH> digits: a non-null value for 'digits' specifies the minimum
TH> number of significant digits to be printed in values.
TH> how does R decide to switch from the "1.235e+09" (rounded to
TH> 4 digits, i.e. the minumum, in "e" notation) to "1234567890"
TH> (the complete raw notation, 10 digits) when 'digits' goes
TH> from 4 to 5?
that's easy (well, as I'm one of the co-implementors ...) :
One of the design ideas has been to use "e"-notation only when
it's shorter (under the constraints given by 'digits'), i.e.,
1.2346e+09
is not shorter (but has less information) than
1234567890
hence the latter is chosen.
There are quite a few cases, and constraints (*) that apply
simultaneously, such that sometimes the default numeric
formatting may seem peculiar, but I hope that in the mean
time we have squished all real bugs here.
*) such as platform (in)dependency; S - back-compatibility, ..
Best regards,
Martin Maechler, ETH Zurich
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09 Time: 14:54:16
------------------------------ XFMail ------------------------------