I agree there is a problem in one of the early chapters that asserts that as the sample size increases we should see something *systematic* with the standard deviation of S. More appropriate would be to say that “the sample standard deviation will become more precise as sample size increases”, and not something like “the standard deviation will decrease, on average, as sample size increases”.
Properties of Estimate of Variance, S^2
------------------------------------------------------------
Expected value of S^2
When population mean is known, estimate of variance = S^2 = (sum(Xi-mu))^2/N is unbiased. The expected value of the variance computed this way is equal to the true population variance, E(S^2) = Sigma^2, Where Sigma^2 is the true unknown population variance. Note that as N increases, there is no change to the expectation for S^2, E(S^2).
When population mean is known, variance = S^2 = (sum(Xi-mu))^2/N is *biased*. The expected value of the variance computed this way is smaller than the true population variance, E(S^2) = Sigma^2*(N-1)/N). Note that (N-1)/N is lower than 1 but approaches 1 as N increases, so this estimator is a little too small on average but rises to match true Sigma^2 as N increases.
When population mean is unknown, variance = S^2 = (sum(Xi-mu))^2/(N-1) is unbiased. The expected value of the variance computed this way is equal to the true population variance, E(S^2) = Sigma^2, so we say it is unbiased. Note that as N increases, there is no change to the expectation for S^2. That is why we apply N-1 to the denominator of the S^2 calculation in practice.
Precision of S^2
How variable is our estimate of S^2 as a function of N? The variance of the sampling distribution of S^2 is given by Variance(S^2) = 2*Sigma^4/(N-1). So as N increases, 1/(N-1) decreases, and the sample estimates of the variance have decreasing variability; they are increasingly more precise. Hence we can say that “the sample standard deviation will become more precise, or less variable, as sample size increases”
Footnote: There is another fine point, meaning of smaller consequence, and not the main point here: By taking the square root to find S = sqrt(var), or a = sqrt(b) where b is an unbiased estimator of a true population quantity, there is a very small downward bias introduced in S as an estimate of Sigma, E(S) < Sigma, by Jensen's inequality, due to the square root being a concave function. The bias in the variance (S^2) is easily corrected (by applying the N-1 as above), but the bias from the square root function to solve for S is more difficult to correct, and depends on the distribution in question. Luckily, it is very small. The error in this approximation decays quadratically (as 1/N2), and so S as an estimate of true population sigma is suited for all but the smallest samples or highest precision: for n = 3 the bias is equal to 1.3%, and for n = 9 the bias is already less than 0.1%.