# Q interpretation of statistically negative values

11 views

### Cosine

Sep 24, 2022, 3:29:05 AMSep 24
to
Hi:

When doing statistical analysis, we often compute the values of the mean and standard error (SE) of the sample. Then we check the cumulative probability of the interval centered at the mean and depart from there by some positive and negative SE. However, this kind of interval sometimes would include negative values. How do we interpret this kind of result if the variable, by definition, should be always positive, e.g., age, weight, height, and salary?

### Rich Ulrich

Sep 24, 2022, 1:47:12 PMSep 24
to
On Sat, 24 Sep 2022 00:29:03 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:
Q: What does it mean when the computed confidence interval
extends beyond the range of the variable?

A: The assumptions for constructing and using the CI as an accurate
indicator have not been met. And if one tail is obviously too long,
the other tail is often too short, which might be a concern.

In my experience, I saw people confused by CI's on proportions
when they went beyond 0 or 100%. The statistical literature
contains several alternatives for those CIs, which vary the
assumptions about the underlying distribution ("logisitc"?) and
construct intervals that are more precise and legitimate. (Note:
approximations can be easier to compute than exact answers.)

For natural measures which have a large range and are never zero,
starting with the log transformation is often appropriate: Transform;
get the average; back-transform if you prefer the original units.

For well-behaved distributions, transformations to achieve "equal
interval" (in the measurement space of whatever matters) will
usually give good CIs.

For distributions on hand that are not well-behaved, you might be
and use some version of ranges instead of Standard Deviation/Error.
Bootstrap methods are used in some problems, to overcome the
"oddness" of distributions.

--
Rich Ulrich

### Cosine

Sep 24, 2022, 7:11:11 PMSep 24
to
Hi:

However, different transformations would distort the original numeric line in different manners.

For example, while using the log function transforms the original non-negative numeric line [0, inf] to the full numeric line [-inf, inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another nonlinear transformation, we will get a different distortion. After all, we only restrict the transformation to one-to-one.

Since the width of the confidence interval represents the cumulative proportions, would the type of transformation affect the determination of statistical significance?

### Rich Ulrich

Sep 24, 2022, 11:00:09 PMSep 24
to
On Sat, 24 Sep 2022 16:11:09 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:

>Hi:
>
>
> However, different transformations would distort the original numeric line in different manners.

That does not deserve a "However,"....

Yes, you will compute different values when formulas use
different assumptions. As I wrote,

* * For well-behaved distributions, transformations to achieve "equal
interval" (in the measurement space of whatever matters) will
usually give good CIs. * *

>
> For example, while using the log function transforms the original
non-negative numeric line [0, inf] to the full numeric line [-inf,
inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another
nonlinear transformation, we will get a different distortion. After
all, we only restrict the transformation to one-to-one.

I don't take the log of zero. Undefined, not -inf.

Also note: Some people misconstrue "equal intervals." Wealth is
measured in dollars; 'dollars' are seen (erroneously) to make the
factor linear and equal-interval when /measured/ in dollars. But
adding a million dollars is a grossly different contribution to
'wealth' depending on the start -- there are unequal intervals
at the extremes. Think of the variables as 'latent factors' for
what you are interested in, and imagine what makes equal intervals
for that factor. Like 'wealth' or whatever, the available units are

>
> Since the width of the confidence interval represents the
cumulative proportions, would the type of transformation affect the
determination of statistical significance?

If you want a statement about cumulative proportions, the
safe way is to use rank-order. The range from the 40th to
the 60th percentile (for instance) will be a 95% CI for the
median, for some easily computed N.

"Statistical significance" (to me) implies testing, rather than
presenting CIs. If you don't have 'equal intervals' in the
sense I describe above, your testing will be deficient to some
extent.

Does it matter? The usual tests are pretty robust against
moderate distortion of scaling, when you use the usual 5% test
size (actual size remains in the range 4-6%). ANOVA tests at
0.001 on moderately skewed distributions are often wrong
by five-fold or more.

Extremly fat tails or far outliers mess up p-values even at the
5% size. This is why cleaning your data takes at least 90% of
the time of a competent data analyst hired for a job: We
want to know for ourselves that the means will be meaningful,
et cetera. That usually means fixing stuff, or writing cautions
at the end.

--
Rich Ulrich