unemployment stats

RichD

unread,

Oct 3, 2021, 7:35:50 PM10/3/21

to

Given a population of unemployed persons, i.e. names
and phone numbers. You wish to construct a histogram
of # of persons vs. time (# of days) out of work.

Stats 101, a student homework assignment, right?
Call some random subset of the list, ask them: when
were you laid off? Assuming the sample is unbiased,
it will satisfy the conditions.

No, this method is flawed. Because the person out of
work a long time, has a greater chance of receiving
multiple calls (or at least one call) than one who is
shortly re-employed. This biases the sample, skews
the numbers on the long side.

Therefore, officially published statistics are unreliable.

--
Rich

Rich Ulrich

unread,

Oct 4, 2021, 2:35:24 PM10/4/21

to

On Sun, 3 Oct 2021 16:35:48 -0700 (PDT), RichD
<r_dela...@yahoo.com> wrote:

>Given a population of unemployed persons, i.e. names
>and phone numbers. You wish to construct a histogram
>of # of persons vs. time (# of days) out of work.
>
>Stats 101, a student homework assignment, right?
>Call some random subset of the list, ask them: when
>were you laid off? Assuming the sample is unbiased,
>it will satisfy the conditions.
>
>No, this method is flawed. Because the person out of
>work a long time, has a greater chance of receiving
>multiple calls (or at least one call) than one who is
>shortly re-employed. This biases the sample, skews
>the numbers on the long side.

Well, the number represents what it represents.
It is only a mis-report of you mis-report it.

It is always proper to warn readers of ways that
they might misinterpret what is being reported.

>
>Therefore, officially published statistics are unreliable.

I think you mean "invalid". And you are wrong, mainly.

Technically, in statistics, we have both "reliability" and
"validity". Good reliability says that the number is
reproducible, whereas good validity says that it measures
what it purports to measure. You should complain about
validity: the statistics imply something untrue.

I do know that official "unemployment statistics" have
nuances -- like, you don't get counted in the popular
number if you have given up looking for a job. Yes,
amateurs are apt to be misled by raw numbers. I suppose
that the emphasis on "changes" makes use of the
underlying "reliability" -- and the /changes/ lead to
inferences that are generally meaningful and valid.

--
Rich Ulrich

RichD

unread,

Oct 4, 2021, 10:37:16 PM10/4/21

to

On October 4, Rich Ulrich wrote:
>> Given a population of unemployed persons, i.e. names
>> and phone numbers. You wish to construct a histogram
>> of # of persons vs. time (# of days) out of work.

>> Stats 101, right?

>> Call some random subset of the list, ask them: when
>> were you laid off? Assuming the sample is unbiased,
>> it will satisfy the conditions.
>> No, this method is flawed. Because the person out of
>> work a long time, has a greater chance of receiving
>> multiple calls (or at least one call) than one who is
>> shortly re-employed. This biases the sample, skews
>> the numbers on the long side.
>
> Well, the number represents what it represents.
> It is only a mis-report of you mis-report it.
> It is always proper to warn readers of ways that
> they might misinterpret what is being reported.
>
>> Therefore, officially published statistics are unreliable.
>
> I think you mean "invalid". And you are wrong, mainly.
>
> Technically, in statistics, we have both "reliability" and
> "validity". Good reliability says that the number is
> reproducible, whereas good validity says that it measures
> what it purports to measure. You should complain about
> validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified?
i.e. is the methodology flawed?

--
Rich

Rich Ulrich

unread,

Oct 5, 2021, 8:40:44 PM10/5/21

to

My position is that you can collect and report information for
any numbers that might be interesting.

The initial problem is, "Where do these data come from?" - That
might put hard limits on what you can infer. Does a person
have to be unemployed for two weeks before they get in that
list? If you call and the person is now employed, were they
asked the two questions, "How long were you unemployed?"
and "How long ago did you get the new job?"

You are jumping ahead to "bad inference." Showing a
histogram of a cross-section of a stated population (sample)
is not "drawing an inference."

Assuming a simplified, instantaneous cross-sectional sample from
that population, you might use your observations above,
about the implicit weighting, to compute a weighted mean --
Each person would be weighted by their TIME (as the
"probability of being sampled") and you compute that weighted
mean... as an estimate of ... hmm. It estimates something that
might be fairly robust, but it can't be labeled, I think, without
knowing something about what else gets a person off the list,
OTHER than getting employed. It is a weighted "observed time
of unemployment for the newly unemployed" with some cutoff
being enforced. I think I would get the average for only those
under 6 months, and give some additional comment on the others.

--
Rich Ulrich

RichD

unread,

Oct 8, 2021, 3:04:20 PM10/8/21

to

On October 5, Rich Ulrich wrote:
>>>> Given a population of unemployed persons, i.e. names
>>>> and phone numbers. You wish to construct a histogram
>>>> of # of persons vs. time (# of days) out of work.
>>>> Stats 101, right?
>>>> Call some random subset of the list, ask them: when
>>>> were you laid off? Assuming the sample is unbiased,
>>>> it will satisfy the conditions.
>>>> No, this method is flawed. Because the person out of
>>>> work a long time, has a greater chance of receiving
>>>> multiple calls (or at least one call) than one who is
>>>> shortly re-employed. This biases the sample, skews
>>>> the numbers on the long side.
>
>>> Well, the number represents what it represents.
>>> It is only a mis-report of you mis-report it.
>

>>>> Therefore, officially published statistics are unreliable.
>
>>> I think you mean "invalid". And you are wrong, mainly.
>>> Technically, in statistics, we have both "reliability" and

>>> "validity"... good validity says that it measures

>>> what it purports to measure. You should complain about
>>> validity: the statistics imply something untrue.
>
>>Given the goal of the study, is the objection mentioned above, justified?
>>i.e. is the methodology flawed?
>
> My position is that you can collect and report information for
> any numbers that might be interesting.
> The initial problem is, "Where do these data come from?" - That
> might put hard limits on what you can infer.

> You are jumping ahead to "bad inference." Showing a
> histogram of a cross-section of a stated population (sample)
> is not "drawing an inference."
> Assuming a simplified, instantaneous cross-sectional sample from
> that population, you might use your observations above,
> about the implicit weighting, to compute a weighted mean --
> Each person would be weighted by their TIME (as the
> "probability of being sampled") and you compute that weighted
> mean... as an estimate of ... hmm.

The goal isn't to estimate the chance a person might receive a call.
The goal is to estimate the distribution of population vs. time unemployed,
given a histogram of samples of the unemployed. Then, perhaps, one might
predict, probabilistically, how much time a newly unemployed will require to
find new work.

Intuitively, the distribution should match the sample histogram. That's the
desired inference. Very simple.

Given all that, review the objection mentioned above: those longer unemployed,
will have a greater chance of getting a call. Therefore, the methodology is flawed;
the sample isn't unbiased.

I have an ulterior motive for posting this -

--
Rich

RichD

unread,

Oct 8, 2021, 3:10:05 PM10/8/21

to

On October 8, RichD wrote:
> Given all that, review the objection mentioned above: those longer unemployed,
> will have a greater chance of getting a call. Therefore, the methodology is flawed;
> the sample isn't unbiased.

To be more precise: not that the long time unemployed is more likely to be
sampled on a particular day, but more likely during his lifetime, so to speak.

--
Rich

Rich Ulrich

unread,

Oct 9, 2021, 7:10:40 PM10/9/21

to

Yes, that is apt to be the naive intuition of someone who has
never considered "sampling." Any good course on sampling is
going to replace that bad idea, early on.

> That's the
>desired inference. Very simple.
>
>Given all that, review the objection mentioned above: those longer unemployed,
>will have a greater chance of getting a call. Therefore, the methodology is flawed;
>the sample isn't unbiased.

I will repeat: What you do with the numbers, what you say
about them, is what matters. I think I would say that you may
label this methodology as "problematic" because of the bias.

A whole lot of sample-schems are biased. If all in a set share
the same bias, you might even compare the results fairly without
ever estimating and correcting the bias. But you always do
want to let your audience know that you recognize the bias
(so the wiser ones don't think you are an ignoramus).

>
>I have an ulterior motive for posting this -

--

Rich Ulrich

RichD

unread,

Oct 23, 2021, 8:18:52 PM10/23/21

to

Forgot about this one -

On October 9, Rich Ulrich wrote:
>>>>>> Given a population of unemployed persons, i.e. names
>>>>>> and phone numbers. You wish to construct a histogram
>>>>>> of # of persons vs. time (# of days) out of work.
>>>>>> Stats 101, right?
>>>>>> Call some random subset of the list, ask them: when
>>>>>> were you laid off? Assuming the sample is unbiased,
>>>>>> it will satisfy the conditions.
>

>>>>> Well, the number represents what it represents.
>>>>> It is only a mis-report of you mis-report it.

>>> My position is that you can collect and report information for
>>> any numbers that might be interesting.

That's essentially the philosophy of science.

Every experiment is correct, in the sense that it is what it is. Start
with initial conditions, observe the results. Ask a question of nature,
she answers. She doesn't care about your confusion.

First, one must specify a hypothesis to be tested, and desired inference
to be drawn. One assesses experimental design correctness according
to whether the experiment meets these goals.

Let's recap: we want to learn the distribution of unemployed persons vs.
days out of work.

We are given a list of unemployed persons, i.e. names
and phone numbers. Presumably, the list is complete. We call
a sample, ask: how many days since you were you laid off?
Couldn't be simpler.

Later, perhaps, one might predict, probabilistically, how much time a

newly unemployed will require to find new work.

A reviewer objects. Those longer unemployed, will have a greater chance of
getting a call (or repeat calls). Therefore, the methodology is flawed; the
sample isn't unbiased. Hence the desired inference is invalid.

I find this objection spurious. Of course, the longer one is unemployed, the
greater chance of being sampled! That's inherent to the experiment, not a
defect. If Joe is out of work 100 days, the only question is whether he gets a
call, and whether 100 goes into the data. It doesn't matter if he was also
sampled 50 days ago.

The goal isn't to estimate the chance a person might receive a call, during
his lifetime, so to speak. That would be another hypothesis, another experiment.

Correct?

--
Rich

Rich Ulrich

unread,

Oct 24, 2021, 1:39:57 PM10/24/21

to

On Sat, 23 Oct 2021 17:18:50 -0700 (PDT), RichD
<r_dela...@yahoo.com> wrote:

>
>
>Every experiment is correct, in the sense that it is what it is. Start
>with initial conditions, observe the results. Ask a question of nature,
>she answers. She doesn't care about your confusion.
>
>First, one must specify a hypothesis to be tested, and desired inference
>to be drawn. One assesses experimental design correctness according
>to whether the experiment meets these goals.
>
>Let's recap: we want to learn the distribution of unemployed persons vs.
>days out of work.

Ahem. What is your "hypothesis to be tested" or "desired
inference to be drawn"? A "distribution" is mum on that.

>
>We are given a list of unemployed persons, i.e. names
>and phone numbers. Presumably, the list is complete. We call
>a sample, ask: how many days since you were you laid off?
>Couldn't be simpler.
>
>Later, perhaps, one might predict, probabilistically, how much time a
>newly unemployed will require to find new work.

Ay, there's the rub.

"When someone is fired or quits a job, how long do they
stay unemployed?" That's neither hypothesis nor inference.
It asks for a description.

But it is an "interesting" question -- An ordinary person
might assume was being answered by that "distribution"
mentioned earlier, but it is not. That is why there was
a post.

IN THE REAL WORLD -- A better starting point is the
list of people with the time they register as "unemployed."
That suggests limitations: Not everyone registers; and
no one registers (US) if they expect a new job quickly.

And in the US, you can drop off the rolls of "unemployed"
after some time or lack of effort to find a job.

Otherwise, you could survey and ask EVERYONE if they
have ever been unemployed, and for how long, for some
previous time period. That suffers from errors of memory,
among other problems, but it direct attack on the
question that most people assume is being answered.

--
Rich Ulrich