23 views

Skip to first unread message

Oct 3, 2021, 7:35:50 PM10/3/21

to

Given a population of unemployed persons, i.e. names

and phone numbers. You wish to construct a histogram

of # of persons vs. time (# of days) out of work.

Stats 101, a student homework assignment, right?

Call some random subset of the list, ask them: when

were you laid off? Assuming the sample is unbiased,

it will satisfy the conditions.

No, this method is flawed. Because the person out of

work a long time, has a greater chance of receiving

multiple calls (or at least one call) than one who is

shortly re-employed. This biases the sample, skews

the numbers on the long side.

Therefore, officially published statistics are unreliable.

--

Rich

and phone numbers. You wish to construct a histogram

of # of persons vs. time (# of days) out of work.

Stats 101, a student homework assignment, right?

Call some random subset of the list, ask them: when

were you laid off? Assuming the sample is unbiased,

it will satisfy the conditions.

No, this method is flawed. Because the person out of

work a long time, has a greater chance of receiving

multiple calls (or at least one call) than one who is

shortly re-employed. This biases the sample, skews

the numbers on the long side.

Therefore, officially published statistics are unreliable.

--

Rich

Oct 4, 2021, 2:35:24 PM10/4/21

to

On Sun, 3 Oct 2021 16:35:48 -0700 (PDT), RichD

<r_dela...@yahoo.com> wrote:

>Given a population of unemployed persons, i.e. names

>and phone numbers. You wish to construct a histogram

>of # of persons vs. time (# of days) out of work.

>

>Stats 101, a student homework assignment, right?

>Call some random subset of the list, ask them: when

>were you laid off? Assuming the sample is unbiased,

>it will satisfy the conditions.

>

>No, this method is flawed. Because the person out of

>work a long time, has a greater chance of receiving

>multiple calls (or at least one call) than one who is

>shortly re-employed. This biases the sample, skews

>the numbers on the long side.

Well, the number represents what it represents.
<r_dela...@yahoo.com> wrote:

>Given a population of unemployed persons, i.e. names

>and phone numbers. You wish to construct a histogram

>of # of persons vs. time (# of days) out of work.

>

>Stats 101, a student homework assignment, right?

>Call some random subset of the list, ask them: when

>were you laid off? Assuming the sample is unbiased,

>it will satisfy the conditions.

>

>No, this method is flawed. Because the person out of

>work a long time, has a greater chance of receiving

>multiple calls (or at least one call) than one who is

>shortly re-employed. This biases the sample, skews

>the numbers on the long side.

It is only a mis-report of you mis-report it.

It is always proper to warn readers of ways that

they might misinterpret what is being reported.

>

>Therefore, officially published statistics are unreliable.

Technically, in statistics, we have both "reliability" and

"validity". Good reliability says that the number is

reproducible, whereas good validity says that it measures

what it purports to measure. You should complain about

validity: the statistics imply something untrue.

I do know that official "unemployment statistics" have

nuances -- like, you don't get counted in the popular

number if you have given up looking for a job. Yes,

amateurs are apt to be misled by raw numbers. I suppose

that the emphasis on "changes" makes use of the

underlying "reliability" -- and the /changes/ lead to

inferences that are generally meaningful and valid.

--

Rich Ulrich

Oct 4, 2021, 10:37:16 PM10/4/21

to

On October 4, Rich Ulrich wrote:

>> Given a population of unemployed persons, i.e. names

>> and phone numbers. You wish to construct a histogram

>> of # of persons vs. time (# of days) out of work.

>> Stats 101, right?
>> Given a population of unemployed persons, i.e. names

>> and phone numbers. You wish to construct a histogram

>> of # of persons vs. time (# of days) out of work.

>> Call some random subset of the list, ask them: when

>> were you laid off? Assuming the sample is unbiased,

>> it will satisfy the conditions.

>> No, this method is flawed. Because the person out of

>> work a long time, has a greater chance of receiving

>> multiple calls (or at least one call) than one who is

>> shortly re-employed. This biases the sample, skews

>> the numbers on the long side.

>

> Well, the number represents what it represents.

> It is only a mis-report of you mis-report it.

> It is always proper to warn readers of ways that

> they might misinterpret what is being reported.

>

>> Therefore, officially published statistics are unreliable.

>

> I think you mean "invalid". And you are wrong, mainly.

>

> Technically, in statistics, we have both "reliability" and

> "validity". Good reliability says that the number is

> reproducible, whereas good validity says that it measures

> what it purports to measure. You should complain about

> validity: the statistics imply something untrue.

Given the goal of the study, is the objection mentioned above, justified?
>> were you laid off? Assuming the sample is unbiased,

>> it will satisfy the conditions.

>> No, this method is flawed. Because the person out of

>> work a long time, has a greater chance of receiving

>> multiple calls (or at least one call) than one who is

>> shortly re-employed. This biases the sample, skews

>> the numbers on the long side.

>

> Well, the number represents what it represents.

> It is only a mis-report of you mis-report it.

> It is always proper to warn readers of ways that

> they might misinterpret what is being reported.

>

>> Therefore, officially published statistics are unreliable.

>

> I think you mean "invalid". And you are wrong, mainly.

>

> Technically, in statistics, we have both "reliability" and

> "validity". Good reliability says that the number is

> reproducible, whereas good validity says that it measures

> what it purports to measure. You should complain about

> validity: the statistics imply something untrue.

i.e. is the methodology flawed?

--

Rich

Oct 5, 2021, 8:40:44 PM10/5/21

to

any numbers that might be interesting.

The initial problem is, "Where do these data come from?" - That

might put hard limits on what you can infer. Does a person

have to be unemployed for two weeks before they get in that

list? If you call and the person is now employed, were they

asked the two questions, "How long were you unemployed?"

and "How long ago did you get the new job?"

You are jumping ahead to "bad inference." Showing a

histogram of a cross-section of a stated population (sample)

is not "drawing an inference."

Assuming a simplified, instantaneous cross-sectional sample from

that population, you might use your observations above,

about the implicit weighting, to compute a weighted mean --

Each person would be weighted by their TIME (as the

"probability of being sampled") and you compute that weighted

mean... as an estimate of ... hmm. It estimates something that

might be fairly robust, but it can't be labeled, I think, without

knowing something about what else gets a person off the list,

OTHER than getting employed. It is a weighted "observed time

of unemployment for the newly unemployed" with some cutoff

being enforced. I think I would get the average for only those

under 6 months, and give some additional comment on the others.

--

Rich Ulrich

Oct 8, 2021, 3:04:20 PM10/8/21

to

On October 5, Rich Ulrich wrote:

>>>> Given a population of unemployed persons, i.e. names

>>>> and phone numbers. You wish to construct a histogram

>>>> of # of persons vs. time (# of days) out of work.

>>>> Stats 101, right?

>>>> Call some random subset of the list, ask them: when

>>>> were you laid off? Assuming the sample is unbiased,

>>>> it will satisfy the conditions.

>>>> No, this method is flawed. Because the person out of

>>>> work a long time, has a greater chance of receiving

>>>> multiple calls (or at least one call) than one who is

>>>> shortly re-employed. This biases the sample, skews

>>>> the numbers on the long side.

>

>>> Well, the number represents what it represents.

>>> It is only a mis-report of you mis-report it.

>

>>>> Given a population of unemployed persons, i.e. names

>>>> and phone numbers. You wish to construct a histogram

>>>> of # of persons vs. time (# of days) out of work.

>>>> Stats 101, right?

>>>> Call some random subset of the list, ask them: when

>>>> were you laid off? Assuming the sample is unbiased,

>>>> it will satisfy the conditions.

>>>> No, this method is flawed. Because the person out of

>>>> work a long time, has a greater chance of receiving

>>>> multiple calls (or at least one call) than one who is

>>>> shortly re-employed. This biases the sample, skews

>>>> the numbers on the long side.

>

>>> Well, the number represents what it represents.

>>> It is only a mis-report of you mis-report it.

>

>>>> Therefore, officially published statistics are unreliable.

>

>>> I think you mean "invalid". And you are wrong, mainly.

>>> Technically, in statistics, we have both "reliability" and

>>> "validity"... good validity says that it measures
>

>>> I think you mean "invalid". And you are wrong, mainly.

>>> Technically, in statistics, we have both "reliability" and

>>> what it purports to measure. You should complain about

>>> validity: the statistics imply something untrue.

>

>>Given the goal of the study, is the objection mentioned above, justified?

>>i.e. is the methodology flawed?

>

> My position is that you can collect and report information for

> any numbers that might be interesting.

> The initial problem is, "Where do these data come from?" - That

> might put hard limits on what you can infer.

>>> validity: the statistics imply something untrue.

>

>>Given the goal of the study, is the objection mentioned above, justified?

>>i.e. is the methodology flawed?

>

> My position is that you can collect and report information for

> any numbers that might be interesting.

> The initial problem is, "Where do these data come from?" - That

> might put hard limits on what you can infer.

> You are jumping ahead to "bad inference." Showing a

> histogram of a cross-section of a stated population (sample)

> is not "drawing an inference."

> Assuming a simplified, instantaneous cross-sectional sample from

> that population, you might use your observations above,

> about the implicit weighting, to compute a weighted mean --

> Each person would be weighted by their TIME (as the

> "probability of being sampled") and you compute that weighted

> mean... as an estimate of ... hmm.

The goal isn't to estimate the chance a person might receive a call.
> histogram of a cross-section of a stated population (sample)

> is not "drawing an inference."

> Assuming a simplified, instantaneous cross-sectional sample from

> that population, you might use your observations above,

> about the implicit weighting, to compute a weighted mean --

> Each person would be weighted by their TIME (as the

> "probability of being sampled") and you compute that weighted

> mean... as an estimate of ... hmm.

The goal is to estimate the distribution of population vs. time unemployed,

given a histogram of samples of the unemployed. Then, perhaps, one might

predict, probabilistically, how much time a newly unemployed will require to

find new work.

Intuitively, the distribution should match the sample histogram. That's the

desired inference. Very simple.

Given all that, review the objection mentioned above: those longer unemployed,

will have a greater chance of getting a call. Therefore, the methodology is flawed;

the sample isn't unbiased.

I have an ulterior motive for posting this -

--

Rich

Oct 8, 2021, 3:10:05 PM10/8/21

to

On October 8, RichD wrote:

> Given all that, review the objection mentioned above: those longer unemployed,

> will have a greater chance of getting a call. Therefore, the methodology is flawed;

> the sample isn't unbiased.

To be more precise: not that the long time unemployed is more likely to be
> Given all that, review the objection mentioned above: those longer unemployed,

> will have a greater chance of getting a call. Therefore, the methodology is flawed;

> the sample isn't unbiased.

sampled on a particular day, but more likely during his lifetime, so to speak.

--

Rich

Oct 9, 2021, 7:10:40 PM10/9/21

to

never considered "sampling." Any good course on sampling is

going to replace that bad idea, early on.

> That's the

>desired inference. Very simple.

>

>Given all that, review the objection mentioned above: those longer unemployed,

>will have a greater chance of getting a call. Therefore, the methodology is flawed;

>the sample isn't unbiased.

about them, is what matters. I think I would say that you may

label this methodology as "problematic" because of the bias.

A whole lot of sample-schems are biased. If all in a set share

the same bias, you might even compare the results fairly without

ever estimating and correcting the bias. But you always do

want to let your audience know that you recognize the bias

(so the wiser ones don't think you are an ignoramus).

>

>I have an ulterior motive for posting this -

--

Oct 23, 2021, 8:18:52 PM10/23/21

to

Forgot about this one -

On October 9, Rich Ulrich wrote:

>>>>>> Given a population of unemployed persons, i.e. names

>>>>>> and phone numbers. You wish to construct a histogram

>>>>>> of # of persons vs. time (# of days) out of work.

>>>>>> Stats 101, right?

>>>>>> Call some random subset of the list, ask them: when

>>>>>> were you laid off? Assuming the sample is unbiased,

>>>>>> it will satisfy the conditions.

>

Every experiment is correct, in the sense that it is what it is. Start

with initial conditions, observe the results. Ask a question of nature,

she answers. She doesn't care about your confusion.

First, one must specify a hypothesis to be tested, and desired inference

to be drawn. One assesses experimental design correctness according

to whether the experiment meets these goals.

Let's recap: we want to learn the distribution of unemployed persons vs.

days out of work.

We are given a list of unemployed persons, i.e. names

and phone numbers. Presumably, the list is complete. We call

a sample, ask: how many days since you were you laid off?

Couldn't be simpler.

Later, perhaps, one might predict, probabilistically, how much time a

getting a call (or repeat calls). Therefore, the methodology is flawed; the

sample isn't unbiased. Hence the desired inference is invalid.

I find this objection spurious. Of course, the longer one is unemployed, the

greater chance of being sampled! That's inherent to the experiment, not a

defect. If Joe is out of work 100 days, the only question is whether he gets a

call, and whether 100 goes into the data. It doesn't matter if he was also

sampled 50 days ago.

The goal isn't to estimate the chance a person might receive a call, during

his lifetime, so to speak. That would be another hypothesis, another experiment.

Correct?

--

Rich

On October 9, Rich Ulrich wrote:

>>>>>> Given a population of unemployed persons, i.e. names

>>>>>> and phone numbers. You wish to construct a histogram

>>>>>> of # of persons vs. time (# of days) out of work.

>>>>>> Stats 101, right?

>>>>>> Call some random subset of the list, ask them: when

>>>>>> were you laid off? Assuming the sample is unbiased,

>>>>>> it will satisfy the conditions.

>

>>>>> Well, the number represents what it represents.

>>>>> It is only a mis-report of you mis-report it.

>>>>> It is only a mis-report of you mis-report it.

>>> My position is that you can collect and report information for

>>> any numbers that might be interesting.

That's essentially the philosophy of science.
>>> any numbers that might be interesting.

Every experiment is correct, in the sense that it is what it is. Start

with initial conditions, observe the results. Ask a question of nature,

she answers. She doesn't care about your confusion.

First, one must specify a hypothesis to be tested, and desired inference

to be drawn. One assesses experimental design correctness according

to whether the experiment meets these goals.

Let's recap: we want to learn the distribution of unemployed persons vs.

days out of work.

We are given a list of unemployed persons, i.e. names

and phone numbers. Presumably, the list is complete. We call

a sample, ask: how many days since you were you laid off?

Couldn't be simpler.

Later, perhaps, one might predict, probabilistically, how much time a

newly unemployed will require to find new work.

A reviewer objects. Those longer unemployed, will have a greater chance of
getting a call (or repeat calls). Therefore, the methodology is flawed; the

sample isn't unbiased. Hence the desired inference is invalid.

I find this objection spurious. Of course, the longer one is unemployed, the

greater chance of being sampled! That's inherent to the experiment, not a

defect. If Joe is out of work 100 days, the only question is whether he gets a

call, and whether 100 goes into the data. It doesn't matter if he was also

sampled 50 days ago.

The goal isn't to estimate the chance a person might receive a call, during

his lifetime, so to speak. That would be another hypothesis, another experiment.

Correct?

--

Rich

Oct 24, 2021, 1:39:57 PM10/24/21

to

On Sat, 23 Oct 2021 17:18:50 -0700 (PDT), RichD

<r_dela...@yahoo.com> wrote:

>

>

>Every experiment is correct, in the sense that it is what it is. Start

>with initial conditions, observe the results. Ask a question of nature,

>she answers. She doesn't care about your confusion.

>

>First, one must specify a hypothesis to be tested, and desired inference

>to be drawn. One assesses experimental design correctness according

>to whether the experiment meets these goals.

>

>Let's recap: we want to learn the distribution of unemployed persons vs.

>days out of work.

Ahem. What is your "hypothesis to be tested" or "desired
<r_dela...@yahoo.com> wrote:

>

>

>Every experiment is correct, in the sense that it is what it is. Start

>with initial conditions, observe the results. Ask a question of nature,

>she answers. She doesn't care about your confusion.

>

>First, one must specify a hypothesis to be tested, and desired inference

>to be drawn. One assesses experimental design correctness according

>to whether the experiment meets these goals.

>

>Let's recap: we want to learn the distribution of unemployed persons vs.

>days out of work.

inference to be drawn"? A "distribution" is mum on that.

>

>We are given a list of unemployed persons, i.e. names

>and phone numbers. Presumably, the list is complete. We call

>a sample, ask: how many days since you were you laid off?

>Couldn't be simpler.

>

>Later, perhaps, one might predict, probabilistically, how much time a

>newly unemployed will require to find new work.

"When someone is fired or quits a job, how long do they

stay unemployed?" That's neither hypothesis nor inference.

It asks for a description.

But it is an "interesting" question -- An ordinary person

might assume was being answered by that "distribution"

mentioned earlier, but it is not. That is why there was

a post.

IN THE REAL WORLD -- A better starting point is the

list of people with the time they register as "unemployed."

That suggests limitations: Not everyone registers; and

no one registers (US) if they expect a new job quickly.

And in the US, you can drop off the rolls of "unemployed"

after some time or lack of effort to find a job.

Otherwise, you could survey and ask EVERYONE if they

have ever been unemployed, and for how long, for some

previous time period. That suffers from errors of memory,

among other problems, but it direct attack on the

question that most people assume is being answered.

--

Rich Ulrich

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu