11 views

Skip to first unread message

Sep 24, 2022, 3:29:05 AMSep 24

to

Hi:

When doing statistical analysis, we often compute the values of the mean and standard error (SE) of the sample. Then we check the cumulative probability of the interval centered at the mean and depart from there by some positive and negative SE. However, this kind of interval sometimes would include negative values. How do we interpret this kind of result if the variable, by definition, should be always positive, e.g., age, weight, height, and salary?

When doing statistical analysis, we often compute the values of the mean and standard error (SE) of the sample. Then we check the cumulative probability of the interval centered at the mean and depart from there by some positive and negative SE. However, this kind of interval sometimes would include negative values. How do we interpret this kind of result if the variable, by definition, should be always positive, e.g., age, weight, height, and salary?

Sep 24, 2022, 1:47:12 PMSep 24

to

On Sat, 24 Sep 2022 00:29:03 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

Q: What does it mean when the computed confidence interval

extends beyond the range of the variable?

A: The assumptions for constructing and using the CI as an accurate

indicator have not been met. And if one tail is obviously too long,

the other tail is often too short, which might be a concern.

In my experience, I saw people confused by CI's on proportions

when they went beyond 0 or 100%. The statistical literature

contains several alternatives for those CIs, which vary the

assumptions about the underlying distribution ("logisitc"?) and

construct intervals that are more precise and legitimate. (Note:

approximations can be easier to compute than exact answers.)

For natural measures which have a large range and are never zero,

starting with the log transformation is often appropriate: Transform;

get the average; back-transform if you prefer the original units.

For well-behaved distributions, transformations to achieve "equal

interval" (in the measurement space of whatever matters) will

usually give good CIs.

For distributions on hand that are not well-behaved, you might be

well-advised to switch from Mean to Median as your central measure,

and use some version of ranges instead of Standard Deviation/Error.

Bootstrap methods are used in some problems, to overcome the

"oddness" of distributions.

--

Rich Ulrich

wrote:

extends beyond the range of the variable?

A: The assumptions for constructing and using the CI as an accurate

indicator have not been met. And if one tail is obviously too long,

the other tail is often too short, which might be a concern.

In my experience, I saw people confused by CI's on proportions

when they went beyond 0 or 100%. The statistical literature

contains several alternatives for those CIs, which vary the

assumptions about the underlying distribution ("logisitc"?) and

construct intervals that are more precise and legitimate. (Note:

approximations can be easier to compute than exact answers.)

For natural measures which have a large range and are never zero,

starting with the log transformation is often appropriate: Transform;

get the average; back-transform if you prefer the original units.

For well-behaved distributions, transformations to achieve "equal

interval" (in the measurement space of whatever matters) will

usually give good CIs.

For distributions on hand that are not well-behaved, you might be

well-advised to switch from Mean to Median as your central measure,

and use some version of ranges instead of Standard Deviation/Error.

Bootstrap methods are used in some problems, to overcome the

"oddness" of distributions.

--

Rich Ulrich

Sep 24, 2022, 7:11:11 PMSep 24

to

Hi:

Thank you for replying.

However, different transformations would distort the original numeric line in different manners.

For example, while using the log function transforms the original non-negative numeric line [0, inf] to the full numeric line [-inf, inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another nonlinear transformation, we will get a different distortion. After all, we only restrict the transformation to one-to-one.

Since the width of the confidence interval represents the cumulative proportions, would the type of transformation affect the determination of statistical significance?

Thank you for replying.

However, different transformations would distort the original numeric line in different manners.

For example, while using the log function transforms the original non-negative numeric line [0, inf] to the full numeric line [-inf, inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another nonlinear transformation, we will get a different distortion. After all, we only restrict the transformation to one-to-one.

Since the width of the confidence interval represents the cumulative proportions, would the type of transformation affect the determination of statistical significance?

Sep 24, 2022, 11:00:09 PMSep 24

to

On Sat, 24 Sep 2022 16:11:09 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Hi:

>

> Thank you for replying.

>

> However, different transformations would distort the original numeric line in different manners.

That does not deserve a "However,"....

Yes, you will compute different values when formulas use

different assumptions. As I wrote,

* * For well-behaved distributions, transformations to achieve "equal

>

> For example, while using the log function transforms the original

non-negative numeric line [0, inf] to the full numeric line [-inf,

inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another

nonlinear transformation, we will get a different distortion. After

all, we only restrict the transformation to one-to-one.

I don't take the log of zero. Undefined, not -inf.

Also note: Some people misconstrue "equal intervals." Wealth is

measured in dollars; 'dollars' are seen (erroneously) to make the

factor linear and equal-interval when /measured/ in dollars. But

adding a million dollars is a grossly different contribution to

'wealth' depending on the start -- there are unequal intervals

at the extremes. Think of the variables as 'latent factors' for

what you are interested in, and imagine what makes equal intervals

for that factor. Like 'wealth' or whatever, the available units are

often misleading.

>

> Since the width of the confidence interval represents the

cumulative proportions, would the type of transformation affect the

determination of statistical significance?

If you want a statement about cumulative proportions, the

safe way is to use rank-order. The range from the 40th to

the 60th percentile (for instance) will be a 95% CI for the

median, for some easily computed N.

"Statistical significance" (to me) implies testing, rather than

presenting CIs. If you don't have 'equal intervals' in the

sense I describe above, your testing will be deficient to some

extent.

Does it matter? The usual tests are pretty robust against

moderate distortion of scaling, when you use the usual 5% test

size (actual size remains in the range 4-6%). ANOVA tests at

0.001 on moderately skewed distributions are often wrong

by five-fold or more.

Extremly fat tails or far outliers mess up p-values even at the

5% size. This is why cleaning your data takes at least 90% of

the time of a competent data analyst hired for a job: We

want to know for ourselves that the means will be meaningful,

et cetera. That usually means fixing stuff, or writing cautions

at the end.

--

Rich Ulrich

wrote:

>Hi:

>

> Thank you for replying.

>

> However, different transformations would distort the original numeric line in different manners.

Yes, you will compute different values when formulas use

different assumptions. As I wrote,

* * For well-behaved distributions, transformations to achieve "equal

interval" (in the measurement space of whatever matters) will

usually give good CIs. * *
>

> For example, while using the log function transforms the original

non-negative numeric line [0, inf] to the full numeric line [-inf,

inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another

nonlinear transformation, we will get a different distortion. After

all, we only restrict the transformation to one-to-one.

Also note: Some people misconstrue "equal intervals." Wealth is

measured in dollars; 'dollars' are seen (erroneously) to make the

factor linear and equal-interval when /measured/ in dollars. But

adding a million dollars is a grossly different contribution to

'wealth' depending on the start -- there are unequal intervals

at the extremes. Think of the variables as 'latent factors' for

what you are interested in, and imagine what makes equal intervals

for that factor. Like 'wealth' or whatever, the available units are

often misleading.

>

> Since the width of the confidence interval represents the

cumulative proportions, would the type of transformation affect the

determination of statistical significance?

safe way is to use rank-order. The range from the 40th to

the 60th percentile (for instance) will be a 95% CI for the

median, for some easily computed N.

"Statistical significance" (to me) implies testing, rather than

presenting CIs. If you don't have 'equal intervals' in the

sense I describe above, your testing will be deficient to some

extent.

Does it matter? The usual tests are pretty robust against

moderate distortion of scaling, when you use the usual 5% test

size (actual size remains in the range 4-6%). ANOVA tests at

0.001 on moderately skewed distributions are often wrong

by five-fold or more.

Extremly fat tails or far outliers mess up p-values even at the

5% size. This is why cleaning your data takes at least 90% of

the time of a competent data analyst hired for a job: We

want to know for ourselves that the means will be meaningful,

et cetera. That usually means fixing stuff, or writing cautions

at the end.

--

Rich Ulrich

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu