Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Q sources of, identification of and remedial for insignificance

4 views
Skip to first unread message

Cosine

unread,
Jan 27, 2023, 1:02:01 PM1/27/23
to
Hi:

What are the potential sources of statistical insignificance?

What are the potential means to improve it?

Yes, it might simply reflect the true, i.e., the the new drug is no better than the traditional one. However, it might due to inaccurate measurement or even the design of experiment, e.g., not using a paired comparison.

How do we identify the factors actually contributing to the insignificance obtained?

Rich Ulrich

unread,
Jan 27, 2023, 3:26:48 PM1/27/23
to
On Fri, 27 Jan 2023 10:01:58 -0800 (PST), Cosine <ase...@gmail.com>
wrote:
I can get an interesting question out of this, beyond agreeing
that, Yes, you should use the right test.

Why do the data look wrong? -- Who cleaned it?

Fifty years ago, with info provided on 80-column cards,
there were more sources of foul-up than there are today.

1960s, a friend worked on data for the US magazine, Consumer
Reports, for auto repair expenses for used cars. The first check
the investigator ran was to see if Corvettes had their notorious
high cost; they did not. SO, the data were bad. It turned
out that the two-card format was not properly sorted, and
numbers half the time did not represent the proper card-and
column; and car-ID was seldom associated with its data.

Cards also allowed the error of us describing the wrong columns,
and of errors that shifted the names by one (or more). Fortran-
style formats leave the analysis open to programmer errors that
were often detected by seeing bad values; or too many 'Missings'.

Back before there was on-line data-entry which checked for errors,
the data cleaning was easily 90% of the time needed for "data
analysis." Even with relatively clean entry assured, I always
started by analyses by looking at univariate distributions of
EVERYTHING, to check for invalid values or outliers that would
screw up assumptions ('equal intervals') and tests.

Given real values? Outliers can be interesting. If they screw up
your testing, they need special handling.

OUTLIERS.
Ordinary 'high cholesterol' is in the hundreds. Do not put in your
study, as an ordinary case, the subject who is expected to die
if untreated before age 45 because their CHO is 4000.

Detecting the hole in the ozone layer, southern hemisphere, was
slowed by the computer-guided deletion of 'extreme values' from
the regular reports from the satellites. Then someone looked at the
raw values and took them seriously.

I consulted for a PI who collected data from heart beat monitors
that were strapped on over the chest, and which recorded while
the phobic subjects went shopping. A few recordings had stretches
of values that were HIGH, sometimes over 200 and likely to be
wrong. I eventually produced correlations with 'everything' and
that revealed an association with weight: It turned out that the
straps in size-regular did not tighten up enough for subjects who
were size-small, and COULD produce counts that were doubled.
(Their manual did not mention that.) The PI had saved money
by not buying a harness in the small size.

DATA DREDGING?
Last year, I read about a study that leaned towards an undesirable
degree of 'data-dredging' -- not finding what they expected in their
large sample, the PIs pursued detailed analyses of subsamples based
on not-quite-significant interactions. If I recall correctly, the
first subsample, selected by age, also failed to find produce
'significance' and they chased another minor interaction. They
eventually achieved 'significance'.

What justified these steps in the eyes of the reviewer is that the
PIs had, from the start, a particular biological hypothesis which
corresponded to those interactions.

Hope this was interesting.

--
Rich Ulrich
0 new messages