Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Are these outliers

105 views

Skip to first unread message

Onion Knight

unread,

Jun 27, 2012, 11:22:40 PM6/27/12

Looking here http://tmp.gallopinginsanity.com/outliers.pdf there has
been a debate as to if the bottom graph shows outliers, and if so how
many? One person says there are four outliers. Period. The other
person says there is no good definition and being how the claimed
outliers form a very strong pattern they seem to not be such.
http://tmp.gallopinginsanity.com/LinuxTrend2011-2ndhalf.png

Here is the full data set http://tmp.gallopinginsanity.com/LinuxTrendMar2012Snit-vs-cc.png

Would anyone with a clue say that those points in the upward trend are
outliers or not?

Rich Ulrich

unread,

Jun 28, 2012, 3:26:15 PM6/28/12

It is hard to say what may be an outlier if you don't have any
theory about what you are measuring.

However - I don't get "4" by any count.
If you just look at the collection of *numbers*, then you
could use a Tukey-type box-plot. In that display, there
would be, at most, two outliers. Maybe one or none.
The meaning of such an outlier is, generally, "wonder
about it."

If you assume that the series is autocorrelated, then
the big drop after the peak is the outlier. And maybe
the point after that.

When I see that the measurement is of "desktop OS use",
I would not expect a meaningful increase to be followed
by a rapid decrease to the level before the increase. So
I start to wonder how flaky the measurements are.

- Is this a tabulation over a small number of shops?
(Would it be more informative to look at "shop" behavior
than behavior counted by desktop?)
- Has the standard of measurement changed a couple
of times?

If you mean "probably wrong" by "outlier", I would
certainly not mean that. If you mean "maybe just
flukishly off", then that might be the case -- But I would
not say that there is yet an apparent reason to *discard*
any of the values.

Hope this helps.

--
Rich Ulrich

Onion Knight

unread,

Jun 30, 2012, 12:35:54 AM6/30/12

On Jun 28, 12:26 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Wed, 27 Jun 2012 20:22:40 -0700 (PDT), Onion Knight
>

> <onionknight...@gmail.com> wrote:
> >Looking herehttp://tmp.gallopinginsanity.com/outliers.pdfthere has

> >been a debate as to if the bottom graph shows outliers, and if so how
> >many? One person says there are four outliers. Period. The other
> >person says there is no good definition and being how the claimed
> >outliers form a very strong pattern they seem to not be such.
> >http://tmp.gallopinginsanity.com/LinuxTrend2011-2ndhalf.png
>
> >Here is the full data set http://tmp.gallopinginsanity.com/LinuxTrendMar2012Snit-vs-cc.png
>
> >Would anyone with a clue say that those points in the upward trend are
> >outliers or not?
>
> It is hard to say what may be an outlier if you don't have any
> theory about what you are measuring.

That makes sense to me. There has been more information on this
though I admit some of it goes over my head

https://groups.google.com/group/comp.os.linux.advocacy/msg/a27cb2f20bfed416

This shows a more complete set of data going back to 2007 instead of
24 months. As far as a theory as to the data I am not sure I follow
that. What type of theory do you need to be able to find the outliers?
In other words, in this case of looking for an trend in Linux usage
what type theories would be needed?

> However - I don't get "4" by any count.

The person doing the work on the different outlier methods got all
sorts of different numbers of outliers. Seems the different methods
find different answers and there is no solid "facts" about which are
outliers. From the links above

http://tmp.gallopinginsanity.com/Grubbs-Quartile_Nov2007-May2012.pdf

Shows just one outlier with the Grubb's method and three with the
Quartile method.

http://tmp.gallopinginsanity.com/median_and_array.pdf

That shows anywhere from none to one to finally four on the last
page. So there does seem to be a method to get four outliers, though
if I understand correctly to do that the guy who created that page
(Snit) had to use a pretty extreme criteria to get it to show that. It
is not what one would do in an ordinary circumstance. He does have a
link to the Excel file he used to create that
http://tmp.gallopinginsanity.com/median_and_array.xls. It is outside
my expertise as to if his math and calculations are correct but
looking at it for a bit it seems right to me.

> If you just look at the collection of *numbers*, then you
> could use a Tukey-type box-plot. In that display, there
> would be, at most, two outliers. Maybe one or none.
> The meaning of such an outlier is, generally, "wonder
> about it."

If I understand you correctly it is wrong to call the determination of
outliers as "facts". It depends on your method and there are many.
With the two outliers you come up with using the Tukey-type box-plot
which numbers are they?

> If you assume that the series is autocorrelated, then
> the big drop after the peak is the outlier. And maybe
> the point after that.

What does it mean to be autocorrelated? There has been some discussion
that the peak was from an anomaly that took place just in California
and thus that would be the outliers. Those points apparently do not
show up as outliers based on the math just on the knowledge of where
the data came from? Is that a reasonable way to determine outliers? To
just call points outliers even though the math does not show it
because you have tracked down how they are not correctly
representative?

> When I see that the measurement is of "desktop OS use",
> I would not expect a meaningful increase to be followed
> by a rapid decrease to the level before the increase. So
> I start to wonder how flaky the measurements are.

It seems that the increase came from an anomaly that does not show up
mathematically but from looking at the source of the data. This seems
more reasonable to me in that you discount data you know is bad based
on your knowledge of the data and not just on different mathematical
models that disagree and might not even be right to use with the type
of data in question.

> - Is this a tabulation over a small number of shops?
> (Would it be more informative to look at "shop" behavior
> than behavior counted by desktop?)
> - Has the standard of measurement changed a couple
> of times?

Based on the above links the data comes from http://marketshare.hitslink.com
which seems like a reasonable source. In this link
https://groups.google.com/group/comp.os.linux.advocacy/msg/a27cb2f20bfed416
there is a discussion that the increase in non-desktop systems might
be skewing the data some.

> If you mean "probably wrong" by "outlier", I would
> certainly not mean that. If you mean "maybe just
> flukishly off", then that might be the case -- But I would
> not say that there is yet an apparent reason to *discard*
> any of the values.
>
> Hope this helps.

It helps some. I feel like I need to take a class on the topic to
really understand it. Your comments and the comments of some others
such as the links I pointed to have helped me to understand some.

> --
> Rich Ulrich

Rich Ulrich

unread,

Jun 30, 2012, 2:33:09 AM6/30/12

I'm going to delete sections generously, to answer a few
points.

On Fri, 29 Jun 2012 21:35:54 -0700 (PDT), Onion Knight
<onionkn...@gmail.com> wrote:

>On Jun 28, 12:26 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
>> On Wed, 27 Jun 2012 20:22:40 -0700 (PDT), Onion Knight

[... ]
[me ... need a theory to say what makes an "outlier]

>In other words, in this case of looking for an trend in Linux usage
>what type theories would be needed?

Here's something halfway similar: US job market. This
also addresses the later question of What is autocorrelation?

US Labor Department surveys give estimates of 2.5 million
jobs (plus or minus) ended and started each month.
The difference is the Job Loss or Job Gain. During the 2008-
2009 deregulation crash, there was a farily steep curve down
for the differences for several months -- greater and greater
excess of job-losses every month -- which then became a
fairly steep curve up. Presently, new jobs exceed lost jobs
by one or two hundred thousand a month, which barely
matches population growth.

Theory says:
Every month is going to *tend* to be simlar to the month
before -- on each of the separate numbers. Further, when
a month differs from the previous month by much, it is apt
to be part of a continuing dip or a climb. Any number that
does not follow these expectations will be, in the simple
sense, an "outlier" -- which means, as I said before, it
is something to wonder about. Are the numbers accurate?
Is it due to a change in New Jobs or a change in Lost Jobs?
(The initial survey estimate is "corrected" in following months
by using more coimplete data.) Is there something new
happening? Is some single sector having a strong effect?

[...]

>
>> If you just look at the collection of *numbers*, then you
>> could use a Tukey-type box-plot. In that display, there
>> would be, at most, two outliers. Maybe one or none.
>> The meaning of such an outlier is, generally, "wonder
>> about it."
>
>If I understand you correctly it is wrong to call the determination of
>outliers as "facts". It depends on your method and there are many.
>With the two outliers you come up with using the Tukey-type box-plot
>which numbers are they?

A Tukey box-plot considers the interquartile range as
a robust measure of spread, and takes a multiple of that.

For instance, if the IQR of some numbers is (210, 220)
-- which means that the center half of the data is in that
range -- then the size of the IQR is 10 points. Then you
add some multiplier of that size, like 1.5, and extend both
ends of the range by that much. For (210,220) +/- 15 points,
the outliers would be anything below 195 and above 235.

The 6 low numbers in your example of 24 numbers were
all pretty close to the mean. Only two of the high numbers
were distant from the mean by any notable amount at all,
from my eyeball estimate of the graph.

>
>> If you assume that the series is autocorrelated, then
>> the big drop after the peak is the outlier. And maybe
>> the point after that.
>
>What does it mean to be autocorrelated? There has been some discussion
>that the peak was from an anomaly that took place just in California
>and thus that would be the outliers. Those points apparently do not
>show up as outliers based on the math just on the knowledge of where
>the data came from? Is that a reasonable way to determine outliers? To
>just call points outliers even though the math does not show it
>because you have tracked down how they are not correctly
>representative?

The big concern about outliers is "bad data". If you track
the anomaly to a source, you apparently know the numbers
are good.

Once you know that the numbers are good, you still have
the question (sometimes), Do you want to use them?
Does the anomaly deserve to be disqualified from the
main series because it is weird?
...

>
>It seems that the increase came from an anomaly that does not show up
>mathematically but from looking at the source of the data. This seems
>more reasonable to me in that you discount data you know is bad based
>on your knowledge of the data and not just on different mathematical
>models that disagree and might not even be right to use with the type
>of data in question.

Yes. Still -
Do you discount these numbers or use them?

The fact that the number went UP and then went all the
way back down is something important. It makes me
think that "artifact" of some sort, like a couple of odd users,
or sites in the network, could indeed make this numbers
worth excluding. That would further lead me to wonder
if there is something that I could design in my analyses
that should make the result resistant to sudden impacts.

For instance: If a couple of sites show mostly the use
of one OS or browser or whatever, it could be because
it is specially serving those users... and it should be
discounted or de-weighted, or something.

If they have a wide network, they could/should be
testing for good consistency across systematic subsamples,
if they want to perserve good generality for those
conclusions about OS use, etc.

...

--
Rich Ulrich

Onion Knight

unread,

Jun 30, 2012, 4:47:35 PM6/30/12

On Jun 29, 11:33 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> I'm going to delete sections generously, to answer a few
> points.
>
> On Fri, 29 Jun 2012 21:35:54 -0700 (PDT), Onion Knight
>

With the data in question there was a trend for the latter half of
2011. Each number followed the pattern of the ones before it. None of
the mathematical models I have been shown indicated that these were
outliers. Is this roughly what you mean?

It did turn out that the data was from a localized region and even if
was not an outlier based on the mathematical models it made sense to
discount the significance of that apparent rise based on the knowledge
of where the data came from. The determination of those data points
came not from any math process but from looking at the data and seeing
that something looked odd. When that is seen it makes sense to look
into the data and see what is going on. In this case it was found that
there was an anomaly, even though it was not one you would find with
an automated process. It made sense to theorize that the data was
'wrong' and look into it and with this process the data was found to
not be representative of an overall worldwide trend.

> Are the numbers accurate?
> Is it due to a change in New Jobs or a change in Lost Jobs?
> (The initial survey estimate is "corrected" in following months
> by using more coimplete data.) Is there something new
> happening? Is some single sector having a strong effect?

The single sector is similar to the single region. CA was skewing the
data significantly.

> [...]
>
>
>
> >> If you just look at the collection of *numbers*, then you
> >> could use a Tukey-type box-plot. In that display, there
> >> would be, at most, two outliers. Maybe one or none.
> >> The meaning of such an outlier is, generally, "wonder
> >> about it."
>
> >If I understand you correctly it is wrong to call the determination of
> >outliers as "facts". It depends on your method and there are many.
> >With the two outliers you come up with using the Tukey-type box-plot
> >which numbers are they?
>
> A Tukey box-plot considers the interquartile range as
> a robust measure of spread, and takes a multiple of that.

Is there an automated online tool for doing this? Is this the same as
the quartile method? The quartile method found 3 outliers and they
were in the unexpected upswing.

> For instance, if the IQR of some numbers is (210, 220)
> -- which means that the center half of the data is in that
> range -- then the size of the IQR is 10 points. Then you
> add some multiplier of that size, like 1.5, and extend both
> ends of the range by that much. For (210,220) +/- 15 points,
> the outliers would be anything below 195 and above 235.
>
> The 6 low numbers in your example of 24 numbers were
> all pretty close to the mean. Only two of the high numbers
> were distant from the mean by any notable amount at all,
> from my eyeball estimate of the graph.
>
>
>
> >> If you assume that the series is autocorrelated, then
> >> the big drop after the peak is the outlier. And maybe
> >> the point after that.
>
> >What does it mean to be autocorrelated? There has been some discussion
> >that the peak was from an anomaly that took place just in California
> >and thus that would be the outliers. Those points apparently do not
> >show up as outliers based on the math just on the knowledge of where
> >the data came from? Is that a reasonable way to determine outliers? To
> >just call points outliers even though the math does not show it
> >because you have tracked down how they are not correctly
> >representative?
>
> The big concern about outliers is "bad data". If you track
> the anomaly to a source, you apparently know the numbers
> are good.

Or in this case bad if the goal is to look at global usage and not
just usage in CA.

> Once you know that the numbers are good, you still have
> the question (sometimes), Do you want to use them?
> Does the anomaly deserve to be disqualified from the
> main series because it is weird?
> ...

In this case it seems reasonable to remove the numbers because they
are not representative of the global usage.

> >It seems that the increase came from an anomaly that does not show up
> >mathematically but from looking at the source of the data. This seems
> >more reasonable to me in that you discount data you know is bad based
> >on your knowledge of the data and not just on different mathematical
> >models that disagree and might not even be right to use with the type
> >of data in question.
>
> Yes. Still -
> Do you discount these numbers or use them?

In this case they were discounted.

> The fact that the number went UP and then went all the
> way back down is something important. It makes me
> think that "artifact" of some sort, like a couple of odd users,
> or sites in the network, could indeed make this numbers
> worth excluding. That would further lead me to wonder
> if there is something that I could design in my analyses
> that should make the result resistant to sudden impacts.

The analysis would have to look for regional anomalies.

> For instance: If a couple of sites show mostly the use
> of one OS or browser or whatever, it could be because
> it is specially serving those users... and it should be
> discounted or de-weighted, or something.
>
> If they have a wide network, they could/should be
> testing for good consistency across systematic subsamples,
> if they want to perserve good generality for those
> conclusions about OS use, etc.
>
> ...
>
> --
> Rich Ulrich

Thank you. Still much of this goes over my head but it gives me food
for thought.

Onion Knight

unread,

Jul 1, 2012, 12:31:28 AM7/1/12

In another forum someone went through and did the math for me
https://groups.google.com/group/comp.os.linux.advocacy/msg/87e225af5eb7d69a

Any flaws with that? I do not see any but so much of this is going
over my head I could be wrong.

Rich Ulrich

unread,

Jul 1, 2012, 2:01:58 AM7/1/12

On Sat, 30 Jun 2012 13:47:35 -0700 (PDT), Onion Knight
<onionkn...@gmail.com> wrote:

[snip, most everything]

You seem to have the idea, pretty much.

>
>With the data in question there was a trend for the latter half of
>2011. Each number followed the pattern of the ones before it. None of
>the mathematical models I have been shown indicated that these were
>outliers. Is this roughly what you mean?

Of course, even with tiny changes, someone can always
point to the *biggest* of them as possible outliers....

Oh - Whatever was mentioned as detection based on
quartiles is very likely to be a version of what I described,
using varioius cutoff ranges.

>
>It did turn out that the data was from a localized region and even if
>was not an outlier based on the mathematical models it made sense to
>discount the significance of that apparent rise based on the knowledge
>of where the data came from. The determination of those data points
>came not from any math process but from looking at the data and seeing
>that something looked odd. When that is seen it makes sense to look
>into the data and see what is going on. In this case it was found that
>there was an anomaly, even though it was not one you would find with
>an automated process. It made sense to theorize that the data was
>'wrong' and look into it and with this process the data was found to
>not be representative of an overall worldwide trend.

>[snip, rest]

Right, the local change does not represent an overall trend.

A couple of points, though: A local change *is* one part of
the overall trend. So an informative summary might say (for
instance) what the trend looks like, both with and without the
odd region. And it might point out that the blip seems to have
ended.

Also: once a particular anomaly has been seen and diagnosed,
good technique (if there is something to be said or done about
the problem) would *try* to automate the detection of this
sort of problem.

Since there was a jump in the overall data that can be blamed
on effects for one region, it seems that the local effects at
that region must have been even more noticeable.

Before, I suggested examining consistency between subsets;
and you refined that to "regions." That's fine, for one sort
of discrepancy. However, for the time-trend spurt, an automated
detector might more simply be constructed to flag the biggest
weekly/monthly/annual changes that arise by region. Or website.
Or flag the biggest ones that exceed some limit.

--
Rich Ulrich

alina

unread,

Jul 1, 2012, 9:05:52 AM7/1/12

Hi,my intention is not to give an answer to your qouestion, but to make a question. i have been fighting with logistic regresion for some time in terms of understanding how to calculate adjusted OR. I asked for local statisticians but they didn't give me an answer, but a question, what's the formula for these aOR? this was the reason that motivated me to learn SPSS. Now I understand how to compute and interpret the data of binominal logistic regression, what are the diffrences between the methos of selecting the model( enter, foward, backward), but again nothing about aOR. Why am I fighting stil? Just to answer a question. The use of multimineral supplements in pregnancy is not widely recomended, still o great percent of pregnant women are using it? So I found that this women seems to be more educated, has more visists on general practicioner and obstetrician, are more educated (high school or universitary degree) and have a higher nutritional knowledge (assessed by a questionaire). However this cofounders are interrelated. i just wnat to make some models : 1 sociodemographical data; 2 1+ prenatal care and so on. What's tour practical advice on how to handle with this by using SPSS?

Message has been deleted

Onion Knight

unread,

Jul 1, 2012, 8:01:07 PM7/1/12

On Jun 30, 11:01 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Sat, 30 Jun 2012 13:47:35 -0700 (PDT), Onion Knight
>

> <onionknight...@gmail.com> wrote:
>
> [snip, most everything]
>
> You seem to have the idea, pretty much.

Good to hear. :-)

> >With the data in question there was a trend for the latter half of
> >2011. Each number followed the pattern of the ones before it. None of
> >the mathematical models I have been shown indicated that these were
> >outliers. Is this roughly what you mean?
>
> Of course, even with tiny changes, someone can always
> point to the *biggest* of them as possible outliers....
>
> Oh - Whatever was mentioned as detection based on
> quartiles is very likely to be a version of what I described,
> using varioius cutoff ranges.

The guy who has been helping me has posted about multiple methods now.
I pointed to his post where he talked about it but this is the summary

<http://tmp.gallopinginsanity.com/Grubbs-Quartile_Nov2007-May2012.pdf>
<http://tmp.gallopinginsanity.com/median_and_array.pdf>
<http://tmp.gallopinginsanity.com/BoxPlot.pdf>

He came up with rather different results based on the Quartile method
and the Box Plot method so apparently they are different. Actually
amazing how different the methods' results are

Grubs: 1 (1.56)
Quartile: 3 (1.31, 1.41, 1.56)
Median and Array: depends on Z-Score Tolerance, but at the extreme: 4
(0.66, 1.31, 1.41, 1.56)
Box Plot: 7 (0.66 0.69 0.72 0.73 1.56 1.41 1.31)

The only one agreed upon by all is the 1.56, though 3 of the 4 also
agree on 1.41 and 1.31. Each in the order listed seems to be more
"sensitive" than the one above it, though the Median and Array is only
that sensitive when values used are extreme. Even then the Box Plot
seems more sensitive. Even using the Box Plot and even removing data
that not even if finds but was determined to be outliers based on
region it seems desktop Linux usage is slowly moving upward. I think
between all the idiotic fights this really was the question that was
being looked at here, though there also was the question as if the
rate of increase in usage itself was increasing. Just eyeballing it
that does not seem to be the case though there was pretty extreme
removal of higher stats for recent times and there are other devices
which are eating away at the stats. In 2007 there were very few mobile
devices compared to now.

> >It did turn out that the data was from a localized region and even if
> >was not an outlier based on the mathematical models it made sense to
> >discount the significance of that apparent rise based on the knowledge
> >of where the data came from. The determination of those data points
> >came not from any math process but from looking at the data and seeing
> >that something looked odd. When that is seen it makes sense to look
> >into the data and see what is going on. In this case it was found that
> >there was an anomaly, even though it was not one you would find with
> >an automated process. It made sense to theorize that the data was
> >'wrong' and look into it and with this process the data was found to
> >not be representative of an overall worldwide trend.
> >[snip, rest]
>
> Right, the local change does not represent an overall trend.
>
> A couple of points, though: A local change *is* one part of
> the overall trend. So an informative summary might say (for
> instance) what the trend looks like, both with and without the
> odd region. And it might point out that the blip seems to have
> ended.

One part of the discussion has been about how someone predicted an
increase and then the increase was seen but only if you include the CA
data. Did the data correlate with the data? It depends on if you
include that data or not. What is clear though is that the reasons for
the increase which would have applied globally or at least mostly did
not pan out. The person who made the prediction has noted this. I
think debating if there was or was not a correlation is a rather silly
nit in the face of the fact that all involved agree the overall cause
and effect used to make the prediction was not the cause of the
increase. Such is the way of usenet debates. Overall points are
ignored for the sake of minutia and nits.

> Also: once a particular anomaly has been seen and diagnosed,
> good technique (if there is something to be said or done about
> the problem) would *try* to automate the detection of this
> sort of problem.

I will ask the person who made the above PDFs if he can somehow
automate the finding of the regional data. Would be interesting if
there was a way to do so for past data and future.

> Since there was a jump in the overall data that can be blamed
> on effects for one region, it seems that the local effects at
> that region must have been even more noticeable.

The site where the stats have been coming from allows for digging down
to specifics of region. I think it would be challenging to automate
looking for that for each month of data but there may be a way.

> Before, I suggested examining consistency between subsets;
> and you refined that to "regions." That's fine, for one sort
> of discrepancy. However, for the time-trend spurt, an automated
> detector might more simply be constructed to flag the biggest
> weekly/monthly/annual changes that arise by region. Or website.
> Or flag the biggest ones that exceed some limit.

How would one make such an automated detector if the outlier formulas
are missing those points?

> --
> Rich Ulrich

Message has been deleted

Rich Ulrich

unread,

Jul 1, 2012, 11:34:17 PM7/1/12

On Sun, 1 Jul 2012 17:08:07 -0700 (PDT), Onion Knight
<onionkn...@gmail.com> wrote:

>Sorry for the repost. Google Groups seems to have eaten the first
>posting.

>
>On Jun 30, 11:01 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
>> On Sat, 30 Jun 2012 13:47:35 -0700 (PDT), Onion Knight

[snip]

>
>The guy who has been helping me has posted about multiple methods now.
>I pointed to his post where he talked about it but this is the summary
>
><http://tmp.gallopinginsanity.com/Grubbs-Quartile_Nov2007-May2012.pdf>
><http://tmp.gallopinginsanity.com/median_and_array.pdf>
><http://tmp.gallopinginsanity.com/BoxPlot.pdf>
>
>He came up with rather different results based on the Quartile method
>and the Box Plot method so apparently they are different. Actually
>amazing how different the methods' results are
>
>Grubs: 1 (1.56)
>Quartile: 3 (1.31, 1.41, 1.56)
>Median and Array: depends on Z-Score Tolerance, but at the extreme: 4
>(0.66, 1.31, 1.41, 1.56)
>Box Plot: 7 (0.66 0.69 0.72 0.73 1.56 1.41 1.31)

As a statistician, I will disagree with your comment.

"Actually - it is impressive that the results are 100%
consistent, despite some differences in method."

Grubbs uses the mean and SD. Median (I take it)
uses the median. There are several ways that the
Tukey method has been used, as described in Wikip,
http://en.wikipedia.org/wiki/Box_plot

There is not a single difference shown that contradicts
any other finding. In other words, the only apparent
differences among them are explained by the number
of points: and any method could be adapted with a
different cutoff, if you just want to look at more or
fewer points.

[snip]

me>> Before, I suggested examining consistency between subsets;

>> and you refined that to "regions." That's fine, for one sort
>> of discrepancy. However, for the time-trend spurt, an automated
>> detector might more simply be constructed to flag the biggest
>> weekly/monthly/annual changes that arise by region. Or website.
>> Or flag the biggest ones that exceed some limit.
>

>How would one make such an automated detector if the outlier formulas
>are missing those points?

Huh? I don't follow the question.

Outlier detection based on history does have to
use (say) a table of historical information or limits;
so it is not as simple as scanning for outliers based
solely on the data in the present set.

--
Rich Ulrich

Onion Knight

unread,

Jul 2, 2012, 2:10:44 AM7/2/12

On Jul 1, 8:34 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Sun, 1 Jul 2012 17:08:07 -0700 (PDT), Onion Knight
>

> <onionknight...@gmail.com> wrote:
> >Sorry for the repost. Google Groups seems to have eaten the first
> >posting.
>
> >On Jun 30, 11:01 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> >> On Sat, 30 Jun 2012 13:47:35 -0700 (PDT), Onion Knight
> [snip]
>
> >The guy who has been helping me has posted about multiple methods now.
> >I pointed to his post where he talked about it but this is the summary
>
> ><http://tmp.gallopinginsanity.com/Grubbs-Quartile_Nov2007-May2012.pdf>
> ><http://tmp.gallopinginsanity.com/median_and_array.pdf>
> ><http://tmp.gallopinginsanity.com/BoxPlot.pdf>
>
> >He came up with rather different results based on the Quartile method
> >and the Box Plot method so apparently they are different. Actually
> >amazing how different the methods' results are
>
> >Grubs: 1 (1.56)
> >Quartile: 3 (1.31, 1.41, 1.56)
> >Median and Array: depends on Z-Score Tolerance, but at the extreme: 4
> >(0.66, 1.31, 1.41, 1.56)
> >Box Plot: 7 (0.66 0.69 0.72 0.73 1.56 1.41 1.31)
>
> As a statistician, I will disagree with your comment.

As someone who is new to most of this I will accept your disagreement
and try to learn. :-)

> "Actually - it is impressive that the results are 100%
> consistent, despite some differences in method."
>
> Grubbs uses the mean and SD. Median (I take it)
> uses the median. There are several ways that the

> Tukey method has been used, as described in Wikip,http://en.wikipedia.org/wiki/Box_plot

>
> There is not a single difference shown that contradicts
> any other finding. In other words, the only apparent
> differences among them are explained by the number
> of points: and any method could be adapted with a
> different cutoff, if you just want to look at more or
> fewer points.

I had not looked at it that way. Doing so sheds some light on the
claimed findings of others who have looked at the same data. These
others were claiming they used multiple methods and consistently found
several outliers in 2007 and 2008 and four in the latter half of 2011.
None of the above methods are showing this. It seems amazingly
unlikely that they just happened to use several different method that
were consistent with each other but inconsistent with the findings
shown in the above links. Even more odd is that they specifically
mentioned the Grubbs method but said they found the results
inconsistent with what is shown about.

For them to be accurate it would mean that there are two sets of
possible sets of outliers to be found with each being internally
consistent within themselves but not consistent with each other.
Seems to me that the chances that they happened to use 3 or 4 methods
that found one set of results with great consistency that was
different from the above methods but then did not show their work
strongly suggests they were being dishonest or at best ignorant.

If not I am missing something. Is it likely that there could be a
different set of inconsistent findings which would be found by a
different set of methods? If so is it likely that the two groups would
just happen to use methods they each found were internally consistent
but not consistent with each other. I am not a statistician but to me
this seems to be statistically almost impossible to believe.

It is no wonder they have not been willing to show how they came up
with their claimed results. :-)

> [snip]
>
> me>> Before, I suggested examining consistency between subsets;
> >> and you refined that to "regions." That's fine, for one sort
> >> of discrepancy. However, for the time-trend spurt, an automated
> >> detector might more simply be constructed to flag the biggest
> >> weekly/monthly/annual changes that arise by region. Or website.
> >> Or flag the biggest ones that exceed some limit.
>
> >How would one make such an automated detector if the outlier formulas
> >are missing those points?
>
> Huh? I don't follow the question.

You suggested an automated detector but it seems it would be hard to
construct one that focused on region unless you could somehow get
automate getting that regional data. Doing the spreadsheet and math
work is beyond what I could do but I could ask the person who made the
above PDFs. I would love to see the result and see if it explains some
of the other dips and vallies.

> Outlier detection based on history does have to
> use (say) a table of historical information or limits;
> so it is not as simple as scanning for outliers based
> solely on the data in the present set.

If I understand you correctly then it is not good enough just to use
these mathematical methods. One has to look at the line or scatter
plot and try to understand unexplained or surprising results. This is
what was done with the localized data from CA and while it was not
found with the math it was appropriate to do so.

> --
> Rich Ulrich

Rich Ulrich

unread,

Jul 2, 2012, 1:15:08 PM7/2/12

? I don't follow. "The above methods" are showing the high
values in 2011 and the low values in 2007 and 2008.

[My own earlier comments about only seeing high values
were specific about the 24 data points for the final two
years. The smallest of those was 0.87%. The outlier
ranges for those points were (low, 0.74%), (1.34%, high).]

>unlikely that they just happened to use several different method that
>were consistent with each other but inconsistent with the findings
>shown in the above links. Even more odd is that they specifically
>mentioned the Grubbs method but said they found the results
>inconsistent with what is shown about.

Using a different time period? The Grubbs method seems
pretty explicit -- although, I would not be surprised if some
people ignored the lookup table for "significant" and tried
to use conventional t-test values.

>
>For them to be accurate it would mean that there are two sets of
>possible sets of outliers to be found with each being internally
>consistent within themselves but not consistent with each other.
>Seems to me that the chances that they happened to use 3 or 4 methods
>that found one set of results with great consistency that was
>different from the above methods but then did not show their work
>strongly suggests they were being dishonest or at best ignorant.

Well, I don't see the inconsistency of methods.
If someone was not looking at the first year of the data,
the low scores would not be present. And the earliest
numbers are, indeed, the lowest scores in the sample.

>
>If not I am missing something. Is it likely that there could be a
>different set of inconsistent findings which would be found by a
>different set of methods? If so is it likely that the two groups would
>just happen to use methods they each found were internally consistent
>but not consistent with each other. I am not a statistician but to me
>this seems to be statistically almost impossible to believe.

Starting with the mean or median can give results that
differ from what you get from the IQR. Or, if someone is
being even more sophisiticated in applying a "model", there
are other opportunities to differ. Back near the start, I
mentioned that, for a time series, one might choose to
start by computing all the lagged differences, and label
your Outliers from the biggest of those.

Why is one picking outliers? No harm is done if the only
purpose is, "Let's look closer right here." Personally, I
have never dropped (or "set-aside") any values automatically,
or been involved in any way with a system that did.

I guess that the point here is that the data analysis is used
to construct a narrative. How much can you say, how
accurately? Are there people who care? To the extent that
people care, you look at alternate explanations, to the depth
that is available at a reasonable cost. Finding the California
exception was the result of nicely-directed detective work,
since it seems to be an alternative that other folks had been
ignoring.
'
In the end, you do try to point to the limits of the data,
and the hazards that you can imagine (or demonstrate)
for extrapolation or inferences.

--
Rich Ulrich

Onion Knight

unread,

Jul 2, 2012, 2:52:55 PM7/2/12

On Jul 2, 10:15 am, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Sun, 1 Jul 2012 23:10:44 -0700 (PDT), Onion Knight
>

They do, but the specific claim was 'There are four outliers in the
latter half of 2011.'

And that does not seem to be the case with any of the methods tried.
This was very strongly claimed and when others could not replicate it
they were told they were stupid yada yada internet BS trollfest. :)

The original person who made this claim never showed his work nor even
showed he was able to get the full set of data even though he was
asked multiple times.

This is getting off the topic though. I think it boils down to the
fact that the claim of their being 'four outliers in the latter half
of 2011' is incorrect based on the mathematical models. Once you take
into account the known 'bad' data from CA a case can be made for this
though it should still include some early data from 2012.

> [My own earlier comments about only seeing high values
> were specific about the 24 data points for the final two
> years. The smallest of those was 0.87%. The outlier
> ranges for those points were (low, 0.74%), (1.34%, high).]

Right. The person who made the claim about 2011 changed his date range
from 24 months to all available data. This has lead to confusion as to
what is being looked at. Makes sense to me though to look at the full
set if you can.

> >unlikely that they just happened to use several different method that
> >were consistent with each other but inconsistent with the findings
> >shown in the above links. Even more odd is that they specifically
> >mentioned the Grubbs method but said they found the results
> >inconsistent with what is shown about.
>
> Using a different time period? The Grubbs method seems
> pretty explicit -- although, I would not be surprised if some
> people ignored the lookup table for "significant" and tried
> to use conventional t-test values.

Given that it seems likely the person who made the 'four outliers'
claim and said he used the Grubbs method was at best incorrect.

> >For them to be accurate it would mean that there are two sets of
> >possible sets of outliers to be found with each being internally
> >consistent within themselves but not consistent with each other.
> >Seems to me that the chances that they happened to use 3 or 4 methods
> >that found one set of results with great consistency that was
> >different from the above methods but then did not show their work
> >strongly suggests they were being dishonest or at best ignorant.
>
> Well, I don't see the inconsistency of methods.
> If someone was not looking at the first year of the data,
> the low scores would not be present. And the earliest
> numbers are, indeed, the lowest scores in the sample.

The inconsistency is the four outliers in 2011. With the math
calculations those do not seem to exist.

> >If not I am missing something. Is it likely that there could be a
> >different set of inconsistent findings which would be found by a
> >different set of methods? If so is it likely that the two groups would
> >just happen to use methods they each found were internally consistent
> >but not consistent with each other. I am not a statistician but to me
> >this seems to be statistically almost impossible to believe.
>
> Starting with the mean or median can give results that
> differ from what you get from the IQR. Or, if someone is
> being even more sophisiticated in applying a "model", there
> are other opportunities to differ. Back near the start, I
> mentioned that, for a time series, one might choose to
> start by computing all the lagged differences, and label
> your Outliers from the biggest of those.

What are lagged differences? Is there a mathematical model to
calculate these that might find the 'four outliers' or is it based on
just seeing the sudden unexplained increase. That increase has since
been explained.

> Why is one picking outliers? No harm is done if the only
> purpose is, "Let's look closer right here." Personally, I
> have never dropped (or "set-aside") any values automatically,
> or been involved in any way with a system that did.

The original question tied to a prediction of increased usage based on
improvements to desktop Linux. While there seems to be some increase I
have not seen anything that shows the rate of increase growing. I
believe everyone has accepted this at this point.

Right. It did not show up with the mathematical models in the way that
was claimed. There were no 'four outliers' in the latter half of 2011
based on the math.

'
> In the end, you do try to point to the limits of the data,
> and the hazards that you can imagine (or demonstrate)
> for extrapolation or inferences.

In addition to doing a reasonable math process you eye ball it and
look for reasons for anomalies.

> --
> Rich Ulrich

Rich Ulrich

unread,

Jul 3, 2012, 1:36:24 AM7/3/12

I'm going to reply to one statement, but this also addresses
another issue or two elsewhere in the post.

On Mon, 2 Jul 2012 11:52:55 -0700 (PDT), Onion Knight
<onionkn...@gmail.com> wrote:

>On Jul 2, 10:15 am, Rich Ulrich <rich.ulr...@comcast.net> wrote:
>> On Sun, 1 Jul 2012 23:10:44 -0700 (PDT), Onion Knight
>>

[snip, much]

>
>What are lagged differences? Is there a mathematical model to
>calculate these that might find the 'four outliers' or is it based on
>just seeing the sudden unexplained increase. That increase has since
>been explained.
>

The data for every month *is* presumed to be similar to the
previous month, as a simple modeling of these data across
time. The set of "lagged differences" is thus the set of
X(t) minus X(t-1).

Looking at the previous month is the simplest way to
judge whether a single value is unlikely, an "outlier",
since it only needs a single estimate of an error term...
It might be possible to produce a pretty good estimate
of the error from the reported number of visits.

If the "visits" themselves were tabulating distinct
persons in the course of a month, rather than "visits",
the numbers might even be Poisson. But I expect that
the observed variation would be more than that.

Looking at the ratio of X(t) / X(t-1) is another statistic
that would have a certain predictable stability across X's
of different magnitude.

In any case: You did not notice that your correspondent
was using differences, but I wonder if that is because
you were not alert to the possibility. "Outliers" based
on those lagged differences could (IIRC) find 4 points
in 2011, because a big *drop* into a normal range is
still a big change.

--
Rich Ulrich

Onion Knight

unread,

Jul 3, 2012, 4:13:52 AM7/3/12

He has yet to produce the tool he used to find these outliers. At one
point he did mention the Grubbs method which does not produce them. If
there is a method that results in his findings I would love to see
it.

Generally, though, when someone claims they have a finding but refuses
to share their methodology and tools and makes claims that others who
do not agree with them are "idiots" and the like it is safe to assume
they are just trolling for attention. This is especially true for the
person in question who refused to acknowledge such simple things as
sigma lines being depicted incorrectly on normal curves and other very
obvious lies.

Rich Ulrich

unread,

Jul 3, 2012, 3:18:37 PM7/3/12

On Tue, 3 Jul 2012 01:13:52 -0700 (PDT), Onion Knight
<onionkn...@gmail.com> wrote:

[snip, previous...]

>
>He has yet to produce the tool he used to find these outliers. At one
>point he did mention the Grubbs method which does not produce them. If
>there is a method that results in his findings I would love to see
>it.

Did you apply the Grubbs method to the Differences?
Did anyone?

>
>Generally, though, when someone claims they have a finding but refuses
>to share their methodology and tools and makes claims that others who
>do not agree with them are "idiots" and the like it is safe to assume
>they are just trolling for attention. This is especially true for the
>person in question who refused to acknowledge such simple things as
>sigma lines being depicted incorrectly on normal curves and other very
>obvious lies.

When words show up like "lies" and "idiot" (and "stupid"),
I learned a few years ago to assume that one side or the
other side is undiagnosed autistic. (ref: RF Bob)

The autistic himself says "you are lying" when he thinks
you are wrong, and "you are stupid" when he does not
understand what you are saying. Meanwhile, the rest of us
think that the autistic -- who may be amazingly incapable of
parsing and absorbing an argument that seems to oppose
him -- is stupid for refusing to deal with some arguments;
or is lying when he still claims that they irrelevant or wrong.

And it is tempting to call the autistic a troll, but that does
not really fit.

--
Rich Ulrich

Onion Knight

unread,

Jul 3, 2012, 8:37:56 PM7/3/12

On Jul 3, 12:18 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
> On Tue, 3 Jul 2012 01:13:52 -0700 (PDT), Onion Knight
>

> <onionknight...@gmail.com> wrote:
>
> [snip, previous...]
>
>
>
> >He has yet to produce the tool he used to find these outliers. At one
> >point he did mention the Grubbs method which does not produce them. If
> >there is a method that results in his findings I would love to see
> >it.
>
> Did you apply the Grubbs method to the Differences?
> Did anyone?

I do not believe anyone did. The person who made the claim about the
four outliers in 2011 has refused to show his work. He claims to have
an Excel file where he did his calculations but he has not produced
it. He did not even produce the full set of data going back to 2007
though he did point to the site which had the data. To get the data
you had to look at multiple pages and I have no reason to believe he
even figured that out. Here was Snit showing how he did it.

http://tmp.gallopinginsanity.com/NetMarketShareStats.mov

That is from the guy who shows all of his work and is very detailed in
how he comes to his conclusions. The one he has been debating with is
either unable or unwilling to state how he does his work and when he
is wrong he refuses to admit to it. At one time they were debating
how to visualize the standard deviation line. CC denied that there
was a way to tell based on the distance from the mean. Snit said
there was a way if you looked at the distance fromt the mean to the
place where the curve changes. CC denied this and said there was no
way and that the examples Snit showed were all fine. Here is his
video he made to explain his view

http://www.youtube.com/watch?v=MoW3hMq-eIc

And a PDF where he shows an incorrect depiction http://tmp.gallopinginsanity.com/sd.png

CC was not willing to ever admit the depiction was wrong. There are
many people on the web like him who will just never admit they do not
know things or that they were wrong. I think it is better to admit
when you do not know things or when you realized someone knows more
than you do. You clearly know statistics better than I do. Snit
clearly knows it better than CC and myself.

> >Generally, though, when someone claims they have a finding but refuses
> >to share their methodology and tools and makes claims that others who
> >do not agree with them are "idiots" and the like it is safe to assume
> >they are just trolling for attention. This is especially true for the
> >person in question who refused to acknowledge such simple things as
> >sigma lines being depicted incorrectly on normal curves and other very
> >obvious lies.
>
> When words show up like "lies" and "idiot" (and "stupid"),
> I learned a few years ago to assume that one side or the
> other side is undiagnosed autistic. (ref: RF Bob)

I had not considered that CC might be autistic. I admit I have called
CC and his sock all sorts of names and I am not autistic. I am
reacting to their inappropriate behavior and calling them out on their
'autistic' behavior.

> The autistic himself says "you are lying" when he thinks
> you are wrong, and "you are stupid" when he does not
> understand what you are saying.

That described CC very well.

> Meanwhile, the rest of us
> think that the autistic -- who may be amazingly incapable of
> parsing and absorbing an argument that seems to oppose
> him -- is stupid for refusing to deal with some arguments;
> or is lying when he still claims that they irrelevant or wrong.

In the case of this debate CC is clearly not willing to admit when he
is shown to be wrong and refuses to show his Excel files and other
material he claims to have. I far more respect people such as yourself
and Snit who explain yourself well and show how you come to your
conclusions.

> And it is tempting to call the autistic a troll, but that does
> not really fit.

In this case I think it is fair to call CC a troll and a liar. He
repeatedly wants to be seen as being intelligent and knowing things
about topics he demonstrates deep ignorance about. This is usenet and
there are many such folks. This is especially true for the 'advocacy'
groups such as comp.os.mac.advocacy and comp.os.linux.advocacy.

Things there can get nasty. When CC and his friends feel they have
lost a debate they work to track people down and contact those
people's family and the people they work for. It is truly out of
bounds behavior a far as I am concerned.

> --
> Rich Ulrich

Snit

unread,

Jul 4, 2012, 8:18:07 PM7/4/12

On 7/4/12 3:37 PM, in article
4702a322-a644-4f35...@m2g2000pbv.googlegroups.com, "Onion
Knight" <onionkn...@gmail.com> wrote:

> On Jul 3, 12:18 pm, Rich Ulrich <rich.ulr...@comcast.net> wrote:
>> On Tue, 3 Jul 2012 01:13:52 -0700 (PDT), Onion Knight
>> <onionknight...@gmail.com> wrote:
>> [snip, previous...]
>>
>>> He has yet to produce the tool he used to find these outliers. At one point
>>> he did mention the Grubbs method which does not produce them. If there is a
>>> method that results in his findings I would love to see it.
>>>
>> Did you apply the Grubbs method to the Differences? Did anyone?
>>

> I do not believe anyone did. The person who made the claim about the four
> outliers in 2011 has refused to show his work.

He did show how "work" - his "work" being to troll Usenet and make claims he
could not back up. As far as his views on the stats and trends they kept
changing. He first asked me to look at a set of data from 24 months... but
that did not back his claims. So then he wanted me to look back to data
going back to 2007 be he could not produce the data. I did and it did not
back his claims. He also changed his claims, sometimes saying:

cc:
-----
It will be 1%. Same as it ever was.
-----

But other times saying it has not always been the same and speaking of the
downward trend seen in the data:

cc:
-----
Linux has been on a significant downward trend since then.
-----
And if you look at January and then look at now, then there
is a downward trend.
-----

It cannot always have been the same *and* had a "significant downward trend"
over part of the discussed time period. They are simply contradictory
claims.

But cc cannot admit to this ... so he trolls. That is the work he does: he
looks for fights and makes claims he knows he cannot back up. He has
*nothing* to back up his claims. Not a thing:

1) cc was wrong to call it a "fact" that some data points
were outliers: the reality is that there is no clear
definition nor single standard way to determine such, and thus
such determinations are subjective and *not* "facts".

2) cc was wrong to claim that multiple common methods of
finding outliers would show four for the latter half of 2011
and a few from late 2007 and early 2008. This was shown to
*not* be the case using the Median() and Array, Grubbs,
Quartile, and Box Plot methods and you never showed *any*
method where it was true.

3) cc was wrong to deny that the poorly done depictions I
showed him were, in fact, poorly done. They were
unambiguously wrong. This was tied to his ignorant denial
that sigma lines can be correctly drawn based on the distance
from the mean to some feature of the curve (specifically the
inflection points).

4) cc was wrong to say I missed steps in the creation of a
linear trend line in Excel. I did no such thing (and cc
never were able to list any steps I missed in creating a
linear trend line, nor explain why the MS site and others
would also miss these "steps" you alleged existed.)

5) cc cannot support his claim that "June brought another round
of UI improvements". He simply made that up, as he makes up
so many other things. When called on this one he tried to
turn things around and say he was commenting on my views,
but I never said there was a single UI improvement in June.

6) cc repeatedly snips and ignores comments which are contrary to
his claimed views. As such, it is very clear he not only
is mistaken, cc is knowingly lying.

> He claims to have an Excel file where he did his calculations but he has not
> produced it.

Nor will cc ever because it does not exist. He lied.

> He did not even produce the full set of data going back to 2007 though he did
> point to the site which had the data. To get the data you had to look at
> multiple pages and I have no reason to believe he even figured that out. Here
> was Snit showing how he did it.
>
> http://tmp.gallopinginsanity.com/NetMarketShareStats.mov

You are correct: cc never showed he had done this or was able to get the
full set of data.

> That is from the guy who shows all of his work and is very detailed in how he
> comes to his conclusions. The one he has been debating with is either unable
> or unwilling to state how he does his work and when he is wrong he refuses to
> admit to it.

This has been the pattern with cc and I for some time - he picks absurd
fights and when he is proved wrong with detailed information he just changes
the topic or goes into massive denial mode. He has even gone so far as to
speak of experts in fields we have discussed to try to back his claims - but
when those experts are shown to *disagree* with his view he lied and claimed
he emailed them and they rescinded their public views and now agree with
him. He did the same with me - repeatedly - just made up "quotes" and
attributed them to me even though I never said anything like what he
attributed to me. At one point he even admitted to this - though defended
his actions by saying his out and out lying was merely based on him giving
false "paraphrases" of my views... paraphrases he presented as quotes and
did not represent my views (and were generally contrary to my views). He
never even admitted this was poor behavior on his part.

> At one time they were debating how to visualize the standard
> deviation line. CC denied that there was a way to tell based on the distance
> from the mean. Snit said there was a way if you looked at the distance fromt
> the mean to the place where the curve changes. CC denied this and said there
> was no way and that the examples Snit showed were all fine. Here is his video
> he made to explain his view
>
> http://www.youtube.com/watch?v=MoW3hMq-eIc
>
> And a PDF where he shows an incorrect depiction
>
> http://tmp.gallopinginsanity.com/sd.png

Some of cc's specific claims on that issue:

cc:
-----
There'se nothing wrong with the image, other than some weird
axis labeling.
-----
Snit's so fucking stupid he thinks the sigma lines are drawn
based on distance from the mean, not area under the curve.
-----
| The sigma lines are drawn based on the area of the curve -
| which is easy to see when the images screw it up, esp. when
| they do so really badly, like in some of the ones I showed
| you.
They are not wrong.
------
LOL!!!! All of those links are fine. The first sigma lines
cover 68% of the area UNDER THE CURVE.
-----
If you would like to prove, on any single one of the links
you call incorrect, that the first sigma lines do not bound
an area that is 68.2% of the area UNDER THE CURVE, then I
would like to see it.
-----
Hahahaha your "approximate inflection points" are hilarious.
Please, post more on this subject.
------
I know exactly what an inflection point is. It's where the
second derivative changes sign, and it's exactly where the
sigma lines are in your supposed incorrect examples. Funny
how you're now questioning the applications used to generate
those graphs! Face it, you're wrong.
------

> CC was not willing to ever admit the depiction was wrong.

Correct. He repeatedly lashed out and attacked me for letting him know a
simple fact about sigma lines on normal curves - not a "hot" topic one would
expect a reasonable person to get worked up over. Not religion or politics
or any other "big" issue. Simply my letting cc know that the first sigma
lines on such a curve are at the distance from the mean to the inflection
points (and then each of the next sigmas are at the same distance from the
previous line).

Such behavior on his part is not rational. His above quotes prove he was
wrong on this topic. So be it. We are all wrong on different topics... but
he is just not able to admit to his areas of being wrong.

> There are many people on the web like him who will just never admit they do
> not know things or that they were wrong. I think it is better to admit when
> you do not know things or when you realized someone knows more than you do.
> You clearly know statistics better than I do. Snit clearly knows it better
> than CC and myself.

Though I am hardly an expert on the subject, I do not hide when I am
ignorant on things or when I turn out to be wrong. I have been very clear
as to when I have been wrong. An example of this is the current usage data
goes *against* my belief that the improvements in desktop Linux were leading
to an increased gain in usage. There was a time when the data seemed to
back that up (though the increase was more than I expected *and*, as I
repeatedly noted, it correlated with my predictions but did not show the
cause-and-effect relationship I used to make my prediction was correct).
Later data went *contrary* to my predictions. Ok. So I my prediction, at
least for now, has not come true. That is OK - again, we all make mistakes
and, really, my prediction was quite vague.

I also have repeatedly made predictions that Apple's massive increase in
income will cease and they will run into some bad quarters - or at least
ones where they are far less successful than they have been. I have been
repeatedly wrong on this. At *some* point I will be right... but given how
I have repeatedly made the prediction and it has not come true, the idea
that it almost surely *someday* will come true is hardly supportive of my
ability to predict such things with any decent level of accuracy.

>>> Generally, though, when someone claims they have a finding but refuses
>>> to share their methodology and tools and makes claims that others who
>>> do not agree with them are "idiots" and the like it is safe to assume
>>> they are just trolling for attention. This is especially true for the
>>> person in question who refused to acknowledge such simple things as
>>> sigma lines being depicted incorrectly on normal curves and other very
>>> obvious lies.
>>
>> When words show up like "lies" and "idiot" (and "stupid"),
>> I learned a few years ago to assume that one side or the
>> other side is undiagnosed autistic. (ref: RF Bob)
>

> I had not considered that CC might be autistic. I admit I have called CC and
> his sock all sorts of names and I am not autistic. I am reacting to their
> inappropriate behavior and calling them out on their 'autistic' behavior.

I do not think he is autistic... I think he is simply unwilling to admit
when he is wrong. When people are wrong and admit this online it is often
used against them - I know this for a fact given how cc and others have
repeatedly attacked me for my errors - even when the only way they can show
I had made an error was my noting it!

>> The autistic himself says "you are lying" when he thinks you are wrong, and
>> "you are stupid" when he does not understand what you are saying.
>

> That described CC very well.

He repeatedly accuses me of lying without being able to show quotes. As
noted above, he fabricates claims which are not representative of my views
at all, presents them as quotes from me anyway, and then when called on it
either ignores it or makes up stories about how his dishonest
representations of my views are a form of "paraphrasing". Just completely
dishonest on his part.

>> Meanwhile, the rest of us think that the autistic -- who may be amazingly
>> incapable of parsing and absorbing an argument that seems to oppose him -- is
>> stupid for refusing to deal with some arguments; or is lying when he still
>> claims that they irrelevant or wrong.
>>

> In the case of this debate CC is clearly not willing to admit when he is shown
> to be wrong and refuses to show his Excel files and other material he claims
> to have. I far more respect people such as yourself and Snit who explain
> yourself well and show how you come to your conclusions.

cc is not a respectable person - nor is the person who was backing him.
Both have threatened to contact my employer with lies... cc claiming to have
done so and even received a response and the other publicly made threats to
contact my employer and lie about me impersonating the person his
ex-girlfriend bragged about tracking me down to being... all with the stated
purpose of having me be fired.

All because he lost Usenet debates. Just over-the-top insane on their
parts.

>> And it is tempting to call the autistic a troll, but that does
>> not really fit.
>

> In this case I think it is fair to call CC a troll and a liar. He repeatedly
> wants to be seen as being intelligent and knowing things about topics he
> demonstrates deep ignorance about. This is usenet and there are many such
> folks.

There are - though he is an extreme in this area.

> This is especially true for the 'advocacy' groups such as comp.os.mac.advocacy
> and comp.os.linux.advocacy. Things there can get nasty. When CC and his
> friends feel they have lost a debate they work to track people down and
> contact those people's family and the people they work for. It is truly out of
> bounds behavior a far as I am concerned.

Absolutely out of bounds. With the case of cc's friend's girlfriend her
local police took care of it and she no longer posts or threatens me (has
not for years). With cc and the stalker, though, they have not stopped.
Perhaps at some point they will cross a line that will get them into
significant legal trouble. It is possible I could push such now - publicly
lying about me sexually harassing people, contacting my family and employer,
forging my online username, etc.

>> --
>> Rich Ulrich

--
Summary of cc's statistical BS: <http://tinyurl.com/7rwazxw>
Details on cc's "outliers" BS: <http://tinyurl.com/84r3ypq>
More on cc's ignorance about outliers: <http://tinyurl.com/7vyhttc>
Four method compared to cc's absurd claims: <http://tinyurl.com/7efkuzm>
Details on cc's sigma and R^2 BS: <http://tinyurl.com/7vambev>

Onion Knight

unread,

Jul 5, 2012, 9:21:20 PM7/5/12

On Jul 4, 5:18 pm, Snit <use...@gallopinginsanity.com> wrote:
> On 7/4/12 3:37 PM, in article

> 4702a322-a644-4f35-8955-dddb9328b...@m2g2000pbv.googlegroups.com, "Onion

Exactly. CC could not decide what data he wanted to use. He kept
hoping to find data that would back his claims but he failed to ever
find any. As you note eh also kept changing his claims. When this was
pointed out he resorted to name calling, making up claims about what
others said but failing to show any quotes to back his claims, and
either using socks or working with the moron who has been trolling you
for 10 years. I am still not convinced they are not the same person.

> But cc cannot admit to this ... so he trolls. That is the work he does: he
> looks for fights and makes claims he knows he cannot back up. He has
> *nothing* to back up his claims. Not a thing:
>
> 1) cc was wrong to call it a "fact" that some data points
> were outliers: the reality is that there is no clear
> definition nor single standard way to determine such, and thus
> such determinations are subjective and *not* "facts".

It is not clear that this by itself showed he is ignorant on the topic
of outliers and statistics in general. The fact he never did produce
the Excel work he claimed to have used just backs this up more. CC
lied.

> 2) cc was wrong to claim that multiple common methods of
> finding outliers would show four for the latter half of 2011
> and a few from late 2007 and early 2008. This was shown to
> *not* be the case using the Median() and Array, Grubbs,
> Quartile, and Box Plot methods and you never showed *any*
> method where it was true.

You were able to find data that fit the vague description he gave for
2007 and 2008 much as you showed data that fit your vague predictions
for an increase in Linux usage. Nobody could find any method that fit
his claims about 2011. The fact he acted as if such findings were a
given shows again how he was pretending to be intelligent on this
subject when he is not. He lied.

> 3) cc was wrong to deny that the poorly done depictions I
> showed him were, in fact, poorly done. They were
> unambiguously wrong. This was tied to his ignorant denial
> that sigma lines can be correctly drawn based on the distance
> from the mean to some feature of the curve (specifically the
> inflection points).

He has completely run away from this point which is funny because if
he had not been wrong it would be simple to show which of the
depictions you claimed were wrong were not. He did not understand the
facts about sigma lines that you explained. Nor did I. I admit to this
and am honest. CC is a lying ass who will never admit to it.

> 4) cc was wrong to say I missed steps in the creation of a
> linear trend line in Excel. I did no such thing (and cc
> never were able to list any steps I missed in creating a
> linear trend line, nor explain why the MS site and others
> would also miss these "steps" you alleged existed.)

He shows no steps because you missed none. The process is trivial and
you did it without error even showing CC a video of how to do it. When
he saw this he stopped trying to say what specific steps you missed
but just kept insisting you missed some. He then changed his story to
say you missed steps in doing an analysis of the data and not in your
creation of a trend line as described by Microsoft. He again altered
his story. With all of that though he never was able to produce the
Excel work he claimed to have done. He never did any. He lied.

> 5) cc cannot support his claim that "June brought another round
> of UI improvements". He simply made that up, as he makes up
> so many other things. When called on this one he tried to
> turn things around and say he was commenting on my views,
> but I never said there was a single UI improvement in June.

CC thought he was being clever and was going to get buy-in from you
about the UI improvements in June. You challenged him to be specific
on what improvements he meant and he even tried to turn it around on
you! He wanted you to help him defend his own claims! He went into
bizarro land and made screwed up claims that since there has been
improvement overall in the last year or more that this must mean that
their was another round of improvements in June. It is like he thinks
the improvement is coming out daily and not when the big projects have
their updates. Again he showed of his lacking his knowledge was in
another area. He does not get the concept of how UIs are improved or
what it takes.

> 6) cc repeatedly snips and ignores comments which are contrary to
> his claimed views. As such, it is very clear he not only
> is mistaken, cc is knowingly lying.

Yes. He knows he is lying. He knows he can't give reasoned responses
to the above. He knows you and others have shown proof of the claims
you make. He knows he has been busted lying. This lead Rich Ulrich to
conclude that CC was asking like someone with autism. I think CC is
just asking like a fucking liar who is grossly humiliated by his own
ignorance and too afraid to admit when he has been proved wrong.
Nothing to do with autism.

> > He claims to have an Excel file where he did his calculations but he has not
> > produced it.
>
> Nor will cc ever because it does not exist. He lied.

Of course CC lied.

> > He did not even produce the full set of data going back to 2007 though he did
> > point to the site which had the data. To get the data you had to look at
> > multiple pages and I have no reason to believe he even figured that out. Here
> > was Snit showing how he did it.
>
> >http://tmp.gallopinginsanity.com/NetMarketShareStats.mov
>
> You are correct: cc never showed he had done this or was able to get the
> full set of data.
>
> > That is from the guy who shows all of his work and is very detailed in how he
> > comes to his conclusions. The one he has been debating with is either unable
> > or unwilling to state how he does his work and when he is wrong he refuses to
> > admit to it.
>
> This has been the pattern with cc and I for some time - he picks absurd
> fights and when he is proved wrong with detailed information he just changes
> the topic or goes into massive denial mode. He has even gone so far as to
> speak of experts in fields we have discussed to try to back his claims - but
> when those experts are shown to *disagree* with his view he lied and claimed
> he emailed them and they rescinded their public views and now agree with
> him. He did the same with me - repeatedly - just made up "quotes" and
> attributed them to me even though I never said anything like what he
> attributed to me. At one point he even admitted to this - though defended
> his actions by saying his out and out lying was merely based on him giving
> false "paraphrases" of my views... paraphrases he presented as quotes and
> did not represent my views (and were generally contrary to my views). He
> never even admitted this was poor behavior on his part.

He lies to yank your chain and Steve Carroll jumps in to give him blow
jobs. They both know they are full of shit.

The amazing thing is you show not just one quote but a whole set of
quotes where he is clearly an idiot on this. There is no way he can be
so delusional as to think he was not wrong or to even think others
will not see how fucked up his claims were. He is just trolling for
attention now.

> Correct. He repeatedly lashed out and attacked me for letting him know a
> simple fact about sigma lines on normal curves - not a "hot" topic one would
> expect a reasonable person to get worked up over. Not religion or politics
> or any other "big" issue. Simply my letting cc know that the first sigma
> lines on such a curve are at the distance from the mean to the inflection
> points (and then each of the next sigmas are at the same distance from the
> previous line).
>
> Such behavior on his part is not rational. His above quotes prove he was
> wrong on this topic. So be it. We are all wrong on different topics... but
> he is just not able to admit to his areas of being wrong.

It takes a special kind of fucked up to get so worked as he does. He
was ignorant about standard deviations. Big fucking deal. If he was a
man he would have admitted to it and apologized for his poor treatment
of you. He is no man. He is an asshole who is afraid to admit to his
own ignorance.

> > There are many people on the web like him who will just never admit they do
> > not know things or that they were wrong. I think it is better to admit when
> > you do not know things or when you realized someone knows more than you do.
> > You clearly know statistics better than I do. Snit clearly knows it better
> > than CC and myself.
>
> Though I am hardly an expert on the subject, I do not hide when I am
> ignorant on things or when I turn out to be wrong. I have been very clear
> as to when I have been wrong. An example of this is the current usage data
> goes *against* my belief that the improvements in desktop Linux were leading
> to an increased gain in usage. There was a time when the data seemed to
> back that up (though the increase was more than I expected *and*, as I
> repeatedly noted, it correlated with my predictions but did not show the
> cause-and-effect relationship I used to make my prediction was correct).
> Later data went *contrary* to my predictions. Ok. So I my prediction, at
> least for now, has not come true. That is OK - again, we all make mistakes
> and, really, my prediction was quite vague.
>
> I also have repeatedly made predictions that Apple's massive increase in
> income will cease and they will run into some bad quarters - or at least
> ones where they are far less successful than they have been. I have been
> repeatedly wrong on this. At *some* point I will be right... but given how
> I have repeatedly made the prediction and it has not come true, the idea
> that it almost surely *someday* will come true is hardly supportive of my
> ability to predict such things with any decent level of accuracy.

You are admitting where you are not all knowing. CC will use this
against you. He and Steve Carroll use your honestly as a part of their
trolling. Steve has been doing this for 10 years or so.

> >>> Generally, though, when someone claims they have a finding but refuses
> >>> to share their methodology and tools and makes claims that others who
> >>> do not agree with them are "idiots" and the like it is safe to assume
> >>> they are just trolling for attention. This is especially true for the
> >>> person in question who refused to acknowledge such simple things as
> >>> sigma lines being depicted incorrectly on normal curves and other very
> >>> obvious lies.
>
> >> When words show up like "lies" and "idiot" (and "stupid"),
> >> I learned a few years ago to assume that one side or the
> >> other side is undiagnosed autistic. (ref: RF Bob)
>
> > I had not considered that CC might be autistic. I admit I have called CC and
> > his sock all sorts of names and I am not autistic. I am reacting to their
> > inappropriate behavior and calling them out on their 'autistic' behavior.
>
> I do not think he is autistic... I think he is simply unwilling to admit
> when he is wrong. When people are wrong and admit this online it is often
> used against them - I know this for a fact given how cc and others have
> repeatedly attacked me for my errors - even when the only way they can show
> I had made an error was my noting it!

I lean your way. I would not call someone autistic when the evidence
shows they are just asshole lying trolling pieces of shit.

> >> The autistic himself says "you are lying" when he thinks you are wrong, and
> >> "you are stupid" when he does not understand what you are saying.
>
> > That described CC very well.
>
> He repeatedly accuses me of lying without being able to show quotes. As
> noted above, he fabricates claims which are not representative of my views
> at all, presents them as quotes from me anyway, and then when called on it
> either ignores it or makes up stories about how his dishonest
> representations of my views are a form of "paraphrasing". Just completely
> dishonest on his part.

Everything CC does is dishonest. It is a part of who he is.

> >> Meanwhile, the rest of us think that the autistic -- who may be amazingly
> >> incapable of parsing and absorbing an argument that seems to oppose him -- is
> >> stupid for refusing to deal with some arguments; or is lying when he still
> >> claims that they irrelevant or wrong.
>
> > In the case of this debate CC is clearly not willing to admit when he is shown
> > to be wrong and refuses to show his Excel files and other material he claims
> > to have. I far more respect people such as yourself and Snit who explain
> > yourself well and show how you come to your conclusions.
>
> cc is not a respectable person - nor is the person who was backing him.
> Both have threatened to contact my employer with lies... cc claiming to have
> done so and even received a response and the other publicly made threats to
> contact my employer and lie about me impersonating the person his
> ex-girlfriend bragged about tracking me down to being... all with the stated
> purpose of having me be fired.
>
> All because he lost Usenet debates. Just over-the-top insane on their
> parts.

I can't imagine being as pathetic as they are.

> Absolutely out of bounds. With the case of cc's friend's girlfriend her
> local police took care of it and she no longer posts or threatens me (has
> not for years). With cc and the stalker, though, they have not stopped.
> Perhaps at some point they will cross a line that will get them into
> significant legal trouble. It is possible I could push such now - publicly
> lying about me sexually harassing people, contacting my family and employer,
> forging my online username, etc.

Steve's girlfriend was a piece of shit who I am happy to see no longer
posts. You did a very good job of documenting her obsession
https://docs.google.com/open?id=0B2Sg8JtoQQgndXNGTTRNejZGejA

Snit

unread,

Jul 6, 2012, 12:40:52 PM7/6/12

On 7/5/12 6:21 PM, in article
79b854c6-33f8-4b09...@re8g2000pbc.googlegroups.com, "Onion

They are not (or I would be shocked if they were). But, yes, they are cut
from the same cloth. Neither will ever admit to any of the facts discussed
here. They will seek to nit-pick some wording of mine or just lie and
pretend I was wrong to call them out on their lies.

>> But cc cannot admit to this ... so he trolls. That is the work he does: he
>> looks for fights and makes claims he knows he cannot back up. He has
>> *nothing* to back up his claims. Not a thing:
>>
>> 1) cc was wrong to call it a "fact" that some data points
>> were outliers: the reality is that there is no clear
>> definition nor single standard way to determine such, and thus
>> such determinations are subjective and *not* "facts".
>
> It is not clear that this by itself showed he is ignorant on the topic
> of outliers and statistics in general. The fact he never did produce
> the Excel work he claimed to have used just backs this up more. CC
> lied.

>> 2) cc was wrong to claim that multiple common methods of
>> finding outliers would show four for the latter half of 2011
>> and a few from late 2007 and early 2008. This was shown to
>> *not* be the case using the Median() and Array, Grubbs,
>> Quartile, and Box Plot methods and you never showed *any*
>> method where it was true.
>
> You were able to find data that fit the vague description he gave for
> 2007 and 2008 much as you showed data that fit your vague predictions
> for an increase in Linux usage. Nobody could find any method that fit
> his claims about 2011. The fact he acted as if such findings were a
> given shows again how he was pretending to be intelligent on this
> subject when he is not. He lied.

If cc was not lying he would have posted his Excel Workbook long ago.

Both cc and my stalker are unable to be honest. It is that simple.

>> 3) cc was wrong to deny that the poorly done depictions I
>> showed him were, in fact, poorly done. They were
>> unambiguously wrong. This was tied to his ignorant denial
>> that sigma lines can be correctly drawn based on the distance
>> from the mean to some feature of the curve (specifically the
>> inflection points).
>
> He has completely run away from this point which is funny because if
> he had not been wrong it would be simple to show which of the
> depictions you claimed were wrong were not. He did not understand the
> facts about sigma lines that you explained. Nor did I. I admit to this
> and am honest. CC is a lying ass who will never admit to it.

He is a coward who will never admit he is wrong.

>> 4) cc was wrong to say I missed steps in the creation of a
>> linear trend line in Excel. I did no such thing (and cc
>> never were able to list any steps I missed in creating a
>> linear trend line, nor explain why the MS site and others
>> would also miss these "steps" you alleged existed.)
>
> He shows no steps because you missed none. The process is trivial and
> you did it without error even showing CC a video of how to do it. When
> he saw this he stopped trying to say what specific steps you missed
> but just kept insisting you missed some. He then changed his story to
> say you missed steps in doing an analysis of the data and not in your
> creation of a trend line as described by Microsoft. He again altered
> his story. With all of that though he never was able to produce the
> Excel work he claimed to have done. He never did any. He lied.

Exactly correct.

>> 5) cc cannot support his claim that "June brought another round
>> of UI improvements". He simply made that up, as he makes up
>> so many other things. When called on this one he tried to
>> turn things around and say he was commenting on my views,
>> but I never said there was a single UI improvement in June.
>
> CC thought he was being clever and was going to get buy-in from you
> about the UI improvements in June. You challenged him to be specific
> on what improvements he meant and he even tried to turn it around on
> you! He wanted you to help him defend his own claims! He went into
> bizarro land and made screwed up claims that since there has been
> improvement overall in the last year or more that this must mean that
> their was another round of improvements in June. It is like he thinks
> the improvement is coming out daily and not when the big projects have
> their updates. Again he showed of his lacking his knowledge was in
> another area. He does not get the concept of how UIs are improved or
> what it takes.

Exactly correct.

>> 6) cc repeatedly snips and ignores comments which are contrary to
>> his claimed views. As such, it is very clear he not only
>> is mistaken, cc is knowingly lying.
>
> Yes. He knows he is lying. He knows he can't give reasoned responses
> to the above. He knows you and others have shown proof of the claims
> you make. He knows he has been busted lying. This lead Rich Ulrich to
> conclude that CC was asking like someone with autism. I think CC is
> just asking like a fucking liar who is grossly humiliated by his own
> ignorance and too afraid to admit when he has been proved wrong.
> Nothing to do with autism.

I agree with you. No sign of autism with cc... just of someone lying.

Yes, they do. And while I disagree with your comments toward Carroll, I
will admit they seem to be working. He has slunk off again. Good.

Exactly correct.

>> Correct. He repeatedly lashed out and attacked me for letting him know a
>> simple fact about sigma lines on normal curves - not a "hot" topic one would
>> expect a reasonable person to get worked up over. Not religion or politics
>> or any other "big" issue. Simply my letting cc know that the first sigma
>> lines on such a curve are at the distance from the mean to the inflection
>> points (and then each of the next sigmas are at the same distance from the
>> previous line).
>>
>> Such behavior on his part is not rational. His above quotes prove he was
>> wrong on this topic. So be it. We are all wrong on different topics... but
>
>> he is just not able to admit to his areas of being wrong.
>
> It takes a special kind of fucked up to get so worked as he does. He
> was ignorant about standard deviations. Big fucking deal. If he was a
> man he would have admitted to it and apologized for his poor treatment
> of you. He is no man. He is an asshole who is afraid to admit to his
> own ignorance.

Exactly correct.

And Tattoo Vampire then trolls me claiming I am saying I have been 100%
correct.

He is just trolling me, too... but at least he is just being a goofball.
...

Steve Carroll

unread,

Jul 7, 2012, 10:47:09 AM7/7/12

Snit... or Onion Knight... or whatever you're calling yourself this
week... you need to get your meds adjusted.

Onion Knight

unread,

Jul 8, 2012, 12:18:51 AM7/8/12

I call myself Onion Knight. I post anonymously because of your
harassment and threats. Even now you do not stop but you can only
harass me by email because you are so stoned you have no idea who I
am. Does not help that you have a history of harassing people. Knowing
I am one of the many does not help you to narrow things down much.

Back to the topic. Explain how Linux can always be at the same level,
1%, but also have a significant downtrend. This is what cc claimed. He
is likely your sock but even if not you defend the idiot.

Steve Carroll

unread,

Jul 8, 2012, 12:43:52 AM7/8/12

Yes, Snit... I know.

Onion Knight

unread,

Jul 8, 2012, 12:58:44 AM7/8/12

The above message is being sent to

7510 W 57th Ave, Arvada, CO 80002
10190 W 59th Ave, Arvada, CO 80004

0 new messages