This seems to me a very simple question, so I'm frustrated because I
couldn't find a clear answer so far.
Introduction:
I work in the field of Quality Engineering. As you may know, every
measurement has some uncertainty associated with it, usually described
as a standard deviation (u) that produces a "confidence interval"
around the value measured.
On the other side, the concept of Process Capability refers to the
total variability of a certain process, calculated from a group of
measurements, and how this variability relates to the tolerance of the
product. Usually this relationship is expressed as a Quality Index
called Cp. The formula for Cp is:
Cp=(Tolerance) / (6 sigma )
The denominator is six times the value of estimated total standard
deviation, representing the "process width".
Question:
It is argued that as the "sigma" in the denominator contains the
uncertainty of the measurements, that are exogenous to the process,
"inflating" therefore the process variability with a variance component
u^2. Thus, in order to find the true "process variability", without the
influence of the uncertainty in the measurements, we should "deflate"
sigma in the denominator, using sqr(sigma^2 - u^2)
1) So according to this the <process width> should be considered
"shorter" due to the effect of uncertainty in measurements.
but...
If I look at the histogram, made up with the individual results of the
data set, each one of the extreme values should be considered V +/- 3u,
so actually the histogram representing the process will be wider (3u on
each side).
Widening the histogram this way, could intuitively represent the
confidence interval of what I can consider the process variability.
2) So according to this the <process width> should be considered
"wider" due to the effect of uncertainty.
(1) and (2) are contradictory, but if I look at the arguments, I can't
find the flaw that makes one of them the valid answer. Can you help me?
Thanks
Daniel
What I find interesting is that the probable value of the extreme point
will depend on my knowledge of how many points I already have in the
data set.
question: So the probable width of the histogram should be reduced as
the dataset size increases, and maybe this "width reduction" is related
to the formula in (1) sqr(sigma^2 - u^2) ?
Until now, I have allways regarded Regression in the usual "lineal
models" arena, and this is the first time I find the use in a way
related to the original meaning by which Galton defined it.
Can you point to a book with a good explanation of this concept and its
implications for the analysis of groups of data?
Thanks again !
.
Or even more: the actual width could be considered
I didn't quite understand your question about the probable/actual width
of histograms.
Regression towards the mean gets a mention in Lindgren, & I'm sure is
covered in many books, but you might as well Google for it. A nice
interactive simulation is here :
http://www.ruf.rice.edu/~lane/stat_sim/reg_to_mean/