On Sat, 24 Sep 2022 16:11:09 -0700 (PDT), Cosine <ase...@gmail.com
> Thank you for replying.
> However, different transformations would distort the original numeric line in different manners.
That does not deserve a "However,"....
Yes, you will compute different values when formulas use
different assumptions. As I wrote,
* * For well-behaved distributions, transformations to achieve "equal
interval" (in the measurement space of whatever matters) will
usually give good CIs. * *
> For example, while using the log function transforms the original
non-negative numeric line [0, inf] to the full numeric line [-inf,
inf], it "expands" the part of [0, 1] to [-inf, 0]. If we use another
nonlinear transformation, we will get a different distortion. After
all, we only restrict the transformation to one-to-one.
I don't take the log of zero. Undefined, not -inf.
Also note: Some people misconstrue "equal intervals." Wealth is
measured in dollars; 'dollars' are seen (erroneously) to make the
factor linear and equal-interval when /measured/ in dollars. But
adding a million dollars is a grossly different contribution to
'wealth' depending on the start -- there are unequal intervals
at the extremes. Think of the variables as 'latent factors' for
what you are interested in, and imagine what makes equal intervals
for that factor. Like 'wealth' or whatever, the available units are
> Since the width of the confidence interval represents the
cumulative proportions, would the type of transformation affect the
determination of statistical significance?
If you want a statement about cumulative proportions, the
safe way is to use rank-order. The range from the 40th to
the 60th percentile (for instance) will be a 95% CI for the
median, for some easily computed N.
"Statistical significance" (to me) implies testing, rather than
presenting CIs. If you don't have 'equal intervals' in the
sense I describe above, your testing will be deficient to some
Does it matter? The usual tests are pretty robust against
moderate distortion of scaling, when you use the usual 5% test
size (actual size remains in the range 4-6%). ANOVA tests at
0.001 on moderately skewed distributions are often wrong
by five-fold or more.
Extremly fat tails or far outliers mess up p-values even at the
5% size. This is why cleaning your data takes at least 90% of
the time of a competent data analyst hired for a job: We
want to know for ourselves that the means will be meaningful,
et cetera. That usually means fixing stuff, or writing cautions
at the end.