[I have put the prior posts into temporal order.
My post is at the end. -- RFK]
On Sep 30, 11:42 pm, lina <
lina....@googlemail.com> wrote:> On 29
> Thanks for your answer. But the formular you presented should be the
> formular of the weighted sample variance and corresponing weighted
> sample standard deviation (like given in Wikipedia:
>
http://en.wikipedia.org/wiki/Weighted_mean). I computed the
> corresponding value on my data set and the result had the expected
> size for a sample variance.
>
> My problem is that I need the standard error of the mean. Without
> weighting, I would take the sample standard deviation divided by the
> square root of the number of observations. But what do I do in case
> of weighting? I am not sure what the denominator should be.
For a weighted mean and variance computed as in the wikipedia
article, the standard error of the weighted mean is s/sqrt(n'),
where s is the weighted s.d., n' = V1^2/V2, and V1 & V2 are as in
the article. The df would be n' - 1. n' is the "effective sample
size" and is conceptually the same as Simpson's Reciprocal Index
of Diversity.
However, the variance described in the wikipedia article (and the
standard error I just gave) may not be what you should be using. The
precision of your mean, weighted or not, depends on how precisely
each of the values that are averaged estimates its own age-sex
subpopulation mean. The sizes of the differences between the
subpopulations do not matter. In Ted's expression for the weighted
variance, each term Si^2 should be understood as being (an estimate
of) the true variance of subpopulation i, divided by the number of
scores that enter into your sample mean for that subpopulation.
Why? Because the model that underlies the wikipedia "Weighted sample
variance" section is not the same as the model used in previous
sections. The model that underlies the weighted variance section is
that X1,...,Xn are independent identically distributed observations
from a population with mean mu and variance sigma^2, whereas the
nmodel that is appropriate for your data is that the Xi are
independent but not identically distributed: each Xi comes from
a population with its own mean mu_i and variance sigma_i^2.