Re: Muller on Mann et al PCA: M&M are right, the hockey stick shape reflects the unconventional method, not the data

Thomas Palm

unread,

Oct 15, 2004, 10:54:01 AM10/15/04

to

w...@bas.ac.uk wrote in news:416f...@news.nwl.ac.uk:

> Steve Schulin <steve....@nuclear.com> wrote:
>>http://www.technologyreview.com/articles/04/10/wo_muller101504.asp

>>, and to
>>suppress all data that do not. To demonstrate this effect, McIntyre and
>>McKitrick created some meaningless test data that had, on average, no
>>trends. This method of generating random data is called łMonte Carlo˛
>>analysis, after the famous casino, and it is widely used in statistical
>>analysis to test procedures. When McIntyre and McKitrick fed these
>>random data into the Mann procedure, out popped a hockey stick shape!
>
> I haven't seen that. Can you provide a ref? I can provide you with:

It is on page 4-5 of the article they submitted to Nature:
http://www.uoguelph.ca/
~rmckitri/research/fallupdate04/submission.1.final.pdf
Whether they are correct I leave to someone willing to double check their
results to test, McKitrck has at other times proven to be quite ingeniouns
in getting the result he wants regardless of the underlying data.

Steve Schulin

unread,

Oct 15, 2004, 10:37:26 AM10/15/04

to

In article <416f...@news.nwl.ac.uk>, w...@bas.ac.uk wrote:

> Steve Schulin <steve....@nuclear.com> wrote:
> >http://www.technologyreview.com/articles/04/10/wo_muller101504.asp
>

> >But it wasnąt so. McIntyre and McKitrick obtained part of the program
> >that Mann used, and they found serious problems. Not only does the
> >program not do conventional PCA, but it handles data normalization in a
> >way that can only be described as mistaken.
>
> This is untrue.
>
> >Now comes the real shocker. This improper normalization procedure tends
> >to emphasize any data that do have the hockey stick shape...
>
> Not at all obvious.

>
> >, and to
> >suppress all data that do not. To demonstrate this effect, McIntyre and
> >McKitrick created some meaningless test data that had, on average, no
> >trends. This method of generating random data is called łMonte Carlo˛
> >analysis, after the famous casino, and it is widely used in statistical
> >analysis to test procedures. When McIntyre and McKitrick fed these
> >random data into the Mann procedure, out popped a hockey stick shape!
>

> I haven't seen that. Can you provide a ref? ...

http://www.uoguelph.ca/~rmckitri/research/fallupdate04/MM.short.pdf

This link is to the final of three submittals to Nature which discuss
the matter about which you ask. The final submittal, which represented
cutting the article down to 800 word length deemed target by Nature,
describes the particulars on pp. 4-5, and illustrates the effect in Fig.
2.

http://www.uoguelph.ca/~rmckitri/research/fallupdate04/update.fall04.html
provides discussion about this and links to all three submittals to
Nature which describe it.

> ... I can provide you with:
> http://cgi.cse.unsw.edu.au/~lambert/cgi-bin/blog/2004/05#mckitrick3
> to demonstrate one of M&M's frequent errors.
>
> >In PCA and similar techniques, each of the (in this case, typically 70)
> >different data sets have their averages subtracted (so they have a mean
> >of zero), and then are multiplied by a number to make their average
> >around that mean to be equal to one; in technical jargon, we say that
> >each data set is normalized to zero mean and unit variance. In standard
> >PCA, each data set is normalized over its complete data period; for the
> >global climate data that Mann used to create his hockey stick graph,
> >this was the interval 1400-1980. But the computer program Mann used did
> >not do that. Instead, it forced each data set to have zero mean for the
> >time period 1902-1980,
>
> Yes, this was exactly what they intended to do. It wasn't an error.
>
> >and to match the historical records for this
> >interval. This is the time when the historical temperature is well
> >known, so this procedure does guarantee the most accurate temperature
> >scale. But it completely screws up PCA.
>
> I can see no justification for that assertion.
>
> >PCA is mostly concerned with the
> >data sets that have high variance, and the Mann normalization procedure
> >tends to give very high variance to any data set with a hockey stick
> >shape. (Such data sets have zero mean only over the 1902-1980 period,
> >not over the longer 1400-1980 period.)
>
> >The net result: the łprincipal component˛ will have a hockey stick shape
> >even if most of the data do not.
>
> No, don't understant this (or believe it). It will need some more detail
> before its believable.
>
> >I emphasize the bug in
> >their PCA program simply because it is so blatant and so easy to
> >understand.
>
> This is stupid. It wasn't a bug. The words above aren't comprehensible.
>
> -W.

Peter Hearnden

unread,

Oct 15, 2004, 12:38:18 PM10/15/04

to

Groan, not another ruddy bombshell....

"Steve Schulin" <steve....@nuclear.com> wrote in message
news:steve.schulin-F82...@comcast.dca.giganews.com...
> http://www.technologyreview.com/articles/04/10/wo_muller101504.asp
>
> Technology Review, October 15, 2004
>
> Global Warming Bombshell:
> A prime piece of evidence linking human activity to climate change turns
> out to be an artifact of poor mathematics.
>
> by Richard Muller (prof physics - University of California, Berkeley)
>
> Progress in science is sometimes made by great discoveries. But science
> also advances when we learn that something we believed to be true isnąt.
> When solving a jigsaw puzzle, the solution can sometimes be stymied by
> the fact that a wrong piece has been wedged in a key place.
>
> In the scientific and political debate over global warming, the latest
> wrong piece may be the łhockey stick,˛ the famous plot (shown below),
> published by University of Massachusetts geoscientist Michael Mann and
> colleagues. This plot purports to show that we are now experiencing the
> warmest climate in a millennium, and that the earth, after remaining
> cool for centuries during the medieval era, suddenly began to heat up
> about 100 years ago--just at the time that the burning of coal and oil
> led to an increase in atmospheric levels of carbon dioxide.
>
> I talked about this at length in my December 2003 column. Unfortunately,
> discussion of this plot has been so polluted by political and activist
> frenzy that it is hard to dig into it to reach the science. My earlier
> column was largely a plea to let science proceed unmolested.
> Unfortunately, the very importance of the issue has made careful science
> difficult to pursue.
>
> But now a shock: independent Canadian scientists Stephen McIntyre and
> Ross McKitrick have uncovered a fundamental mathematical flaw in the
> computer program that was used to produce the hockey stick. In his
> original publications of the stick, Mann purported to use a standard
> method known as principal component analysis, or PCA, to find the
> dominant features in a set of more than 70 different climate records.

>
> But it wasnąt so. McIntyre and McKitrick obtained part of the program
> that Mann used, and they found serious problems. Not only does the
> program not do conventional PCA, but it handles data normalization in a
> way that can only be described as mistaken.
>

> Now comes the real shocker. This improper normalization procedure tends

> to emphasize any data that do have the hockey stick shape, and to

> suppress all data that do not. To demonstrate this effect, McIntyre and
> McKitrick created some meaningless test data that had, on average, no
> trends. This method of generating random data is called łMonte Carlo˛
> analysis, after the famous casino, and it is widely used in statistical
> analysis to test procedures. When McIntyre and McKitrick fed these
> random data into the Mann procedure, out popped a hockey stick shape!
>

> That discovery hit me like a bombshell, and I suspect it is having the
> same effect on many others. Suddenly the hockey stick, the poster-child
> of the global warming community, turns out to be an artifact of poor
> mathematics. How could it happen? What is going on? Let me digress into
> a short technical discussion of how this incredible error took place.

>
> In PCA and similar techniques, each of the (in this case, typically 70)
> different data sets have their averages subtracted (so they have a mean
> of zero), and then are multiplied by a number to make their average
> around that mean to be equal to one; in technical jargon, we say that
> each data set is normalized to zero mean and unit variance. In standard
> PCA, each data set is normalized over its complete data period; for the
> global climate data that Mann used to create his hockey stick graph,
> this was the interval 1400-1980. But the computer program Mann used did
> not do that. Instead, it forced each data set to have zero mean for the

> time period 1902-1980, and to match the historical records for this

> interval. This is the time when the historical temperature is well
> known, so this procedure does guarantee the most accurate temperature

> scale. But it completely screws up PCA. PCA is mostly concerned with the

> data sets that have high variance, and the Mann normalization procedure
> tends to give very high variance to any data set with a hockey stick
> shape. (Such data sets have zero mean only over the 1902-1980 period,
> not over the longer 1400-1980 period.)
>
> The net result: the łprincipal component˛ will have a hockey stick shape
> even if most of the data do not.
>

> McIntyre and McKitrick sent their detailed analysis to Nature magazine
> for publication, and it was extensively refereed. But their paper was
> finally rejected. In frustration, McIntyre and McKitrick put the entire
> record of their submission and the referee reports on a Web page for all
> to see. If you look, youąll see that McIntyre and McKitrick have found
> numerous other problems with the Mann analysis. I emphasize the bug in

> their PCA program simply because it is so blatant and so easy to

> understand. Apparently, Mann and his colleagues never tested their
> program with the standard Monte Carlo approach, or they would have
> discovered the error themselves. Other and different criticisms of the
> hockey stick are emerging (see, for example, the paper by Hans von
> Storch and colleagues in the September 30 issue of Science).
>
> Some people may complain that McIntyre and McKitrick did not publish
> their results in a refereed journal. That is true--but not for lack of
> trying. Moreover, the paper was refereed--and even better, the referee
> reports are there for us to read. McIntyre and McKitrickąs only failure
> was in not convincing Nature that the paper was important enough to
> publish.
>
> How does this bombshell affect what we think about global warming?
>
> It certainly does not negate the threat of a long-term global
> temperature increase. In fact, McIntyre and McKitrick are careful to
> point out that it is hard to draw conclusions from these data, even with
> their corrections. Did medieval global warming take place? Last month
> the consensus was that it did not; now the correct answer is that nobody
> really knows. Uncovering errors in the Mann analysis doesnąt settle the
> debate; it just reopens it. We now know less about the history of
> climate, and its natural fluctuations over century-scale time frames,
> than we thought we knew.
>
> If you are concerned about global warming (as I am) and think that
> human-created carbon dioxide may contribute (as I do), then you still
> should agree that we are much better off having broken the hockey stick.
> Misinformation can do real harm, because it distorts predictions.
> Suppose, for example, that future measurements in the years 2005-2015
> show a clear and distinct global cooling trend. (It could happen.) If we
> mistakenly took the hockey stick seriously--that is, if we believed that
> natural fluctuations in climate are small--then we might conclude
> (mistakenly) that the cooling could not be a natural occurrence. And
> that might lead in turn to the mistaken conclusion that global warming
> predictions are a lot of hooey. If, on the other hand, we reject the
> hockey stick, and recognize that natural fluctuations can be large, then
> we will not be misled by a few years of random cooling.
>
> A phony hockey stick is more dangerous than a broken one -- if we know
> it is broken. It is our responsibility as scientists to look at the data
> in an unbiased way, and draw whatever conclusions follow. When we
> discover a mistake, we admit it, learn from it, and perhaps discover
> once again the value of caution.
>
> Richard A. Muller, a 1982 MacArthur Fellow, is a physics professor at
> the University of California, Berkeley, where he teaches a course called
> łPhysics for Future Presidents.˛ Since 1972, he has been a Jason
> consultant on U.S. national security

David Ball

unread,

Oct 15, 2004, 6:00:29 PM10/15/04

to

On Fri, 15 Oct 2004 13:38:18 +0100, "Peter Hearnden"
<pe...@wronghearndenbit.freeserve.co.uk> wrote:

>Groan, not another ruddy bombshell....
>

Cmon, Peter. You need to put this into its proper context.
It's a bombshell for those researchers who don't know how to calculate
a mean value, who can't convert from degrees to radians, who claim to
audit another author's work by not following their methodology, ...
I'm particularly amused by the section where Muller claims that
something nefarious was done when they normalized the data using the
1902-1980 mean. It's standard practice to compare climate data from
different eras to some benchmark. That way you can meaningfully
compare the numbers. Since MBH were using real data from that period,
it makes sense to normalize the input data to that period.

w...@bas.ac.uk

unread,

Oct 15, 2004, 9:32:11 PM10/15/04

to

Thomas Palm <Thoma...@chello.removethis.se> wrote:
>w...@bas.ac.uk wrote in news:416f...@news.nwl.ac.uk:
>> Steve Schulin <steve....@nuclear.com> wrote:
>>>http://www.technologyreview.com/articles/04/10/wo_muller101504.asp

>> I haven't seen that. Can you provide a ref? I can provide you with:

>It is on page 4-5 of the article they submitted to Nature:
>http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.final.pdf

>Whether they are correct I leave to someone willing to double check their
>results to test, McKitrck has at other times proven to be quite ingeniouns
>in getting the result he wants regardless of the underlying data.

OK, I agree with you there. So we have to be cautious of their results.

But (having read their paper) I now think I understand what they think the
problem is (aside: they complain about data issues with some series but
I think this is beside the point: the *main* point they are talking about
is below), and I think that they are probably wrong, based on reading
MBH's fortran (aside: fortran is a terrible language for doing this stuff,
they should use a vector language like IDL). But anyway:

Lets for the moment assume for simplicity that these series run from
1000 (AD) to 1980. MBH want to calibrate them against the instrumental
record so they standardise them to 1902-1980. 1902-1980 is the "training
period".

What M&M are saying (and Muller is repeating) is (and I quote): the data

"were first scaled to the 1902-1980 mean and standard deviation,
then the PCs were computed using singular value decomposition
(SVD) on the transformed data..."

they complain that this means that:

"For stationary series in which the 1902-1980 mean is the same as the
1400-1980 mean, the MBH98 method approximately zero-centers the series.
But for those series where the 1902-1980 mean shifts (up or down)
away from the 1400-1980 mean, the variance of the shifted series will be
inflated."

This is a plausible idea: if you take 2 series, statistically identical,
but when one trends up at the end where the other happens to be flat, and
you compute the SD of just the end bit, and then *scale the series to
this SD*, then you would indeed inflate the variance of the uptrending
series artifically. But hold on a minute... this is odd... why would you
scale the series *to* the SD? You would expect to scale the series
*by* the SD. Which would, in fact, *reduce* the variance of upwards
trending series. And also, you might well think, shouldn't you take out a
linear trend over 1902-1980 before computing the SD?

So we need to look at MBH's software, not M&M's description of it.
MBH's software is at:

ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer.f

and you can of course read it yourself... fortran is so easy to read...

what they do is (search down over the reading in data till you get to
9999 continue):

1 - remove the 1902-1980 mean
2 - calc the SD over this period
3 - divide the whole series by this SD, point by point

At this point, the new data are in the situation I described above:
datasets that trend upwards at the end have had their variance *reduced*
not increased. But there is more...

4 - remove the linear trend from the new 1902-1980 series
5 - compute the SD again for 1902-1980 of the detrended data
6 - divide the whole series by this SD.

This was exactly what I was expecting to see: remove the linear trend
before computing the SD.

Then the SVD type stuff begins. So... what does that all mean? It certainly
looks a bit odd, because steps 1-3 appear redundant. The scaling done in 4-6
is all you need. Is the scaling of 1-3 *harmful*? Not obviously.

Perhaps someone would care to go through and check this. If I haven't made
a mistake then I think M&M's complaints are unjustified and Nature
correct to reject their article.

-W.

--
William M Connolley | w...@bas.ac.uk | http://www.antarctica.ac.uk/met/wmc/
Climate Modeller, British Antarctic Survey | Disclaimer: I speak for myself
I'm a .signature virus! copy me into your .signature file & help me spread!

David Ball

unread,

Oct 15, 2004, 6:01:26 PM10/15/04

to

On Fri, 15 Oct 2004 11:02:45 -0400, "Bill Corden"
<Cor...@somwhere.com> wrote:

>"Roger Coppock" <rcop...@adnc.com> wrote in message
>news:25516292.04101...@posting.google.com...
>> w...@bas.ac.uk wrote in message news:<416f...@news.nwl.ac.uk>...
>> From the reference above:
>> -- "Essex and McKitrick claim that physics provides
>> no basis for defining average temperature . . . "
>>
>> This must be true if you use a mathematics where
>> a degree equals a radian.
>>
>> What is next for CEE-OH-TOO (pseudo) science?
>
>I assume you are dismissing Richard Muller too? Or are you just unwilling
>to have an open mind. McKitrick did not write this piece in Technology
>Review, Muller did. Is he a dupe?
>
Actually, it's McKittrick who's being dismissed. He's a clown.
When the man can't even calculate a mean value properly it's awfully
hard to take him seriously on matters more complex.

H. E. Taylor

unread,

Oct 15, 2004, 10:51:12 PM10/15/04

to

In article <steve.schulin-F82...@comcast.dca.giganews.com>,
<steve....@nuclear.com> Steve Schulin wrote:
>
> http://www.technologyreview.com/articles/04/10/wo_muller101504.asp
>[...]
>

I notice Quark Soup has a commentary on Richard Muller's work.

2004/10/15: Quark Soup: Muller
<http://davidappell.com/archives/00000427.htm>

<fwiw>
-het

--
"It is impossible to wake up someone pretending to sleep." -Tamil saying

Global Warming: http://www.autobahn.mb.ca/~het/globalwarming.html
H.E. Taylor http://www.autobahn.mb.ca/~het/

Thomas Palm

unread,

Oct 15, 2004, 4:11:02 PM10/15/04

to

"Bill Corden" <Cor...@somwhere.com> wrote in
news:%FRbd.530$Ob....@fe61.usenetserver.com:

> "Roger Coppock" <rcop...@adnc.com> wrote in message
> news:25516292.04101...@posting.google.com...
>> w...@bas.ac.uk wrote in message news:<416f...@news.nwl.ac.uk>...
>> From the reference above:
>> -- "Essex and McKitrick claim that physics provides
>> no basis for defining average temperature . . . "
>>
>> This must be true if you use a mathematics where
>> a degree equals a radian.
>>
>> What is next for CEE-OH-TOO (pseudo) science?
>
> I assume you are dismissing Richard Muller too? Or are you just
> unwilling to have an open mind. McKitrick did not write this piece in
> Technology Review, Muller did. Is he a dupe?

There are signs of it in what he has written. Scientists tend to be
relatively honest and have trouble dealing with people who deliberately lie
for propaganda purposes. When the US government claimed Saddam must have
had WMD:s Muller believed it and tried to come up with ideas of where they
could be hidden, when Soon&Baliunas wrote their article about climate
reconstructions Muller seems to assume it was attacked just because it
criticized the "hockey stick", and when M&M wrote another critical article
Muller seems to assume it is correct, or at least I see no sign he repeated
the analysis himself.

P.S. Check the comments to the article, they show very clearly the
polarization with half the people attacking Muller for ignoring the problem
of global warming (which he doesn't) and the other thanking him for doing
the same. Very few are able to understand what he actually write, that one
research paper may possibly be wrong. (The headline doesn't exactly help
either)

Thomas Palm

unread,

Oct 16, 2004, 5:42:35 AM10/16/04

to

w...@bas.ac.uk wrote in news:4170...@news.nwl.ac.uk:

> Thomas Palm <Thoma...@chello.removethis.se> wrote:
>>w...@bas.ac.uk wrote in news:416f...@news.nwl.ac.uk:
>>> Steve Schulin <steve....@nuclear.com> wrote:
>>>>http://www.technologyreview.com/articles/04/10/wo_muller101504.asp
>
>>> I haven't seen that. Can you provide a ref? I can provide you with:
>
>>It is on page 4-5 of the article they submitted to Nature:
>>http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.fin
>>al.pdf
>
>>Whether they are correct I leave to someone willing to double check
>>their results to test, McKitrck has at other times proven to be quite
>>ingeniouns in getting the result he wants regardless of the underlying
>>data.
>
> OK, I agree with you there. So we have to be cautious of their
> results.

Thanks for taking the time of looking at the results. One of these days I
just got to learn something about PCA.

> So we need to look at MBH's software, not M&M's description of it.
> MBH's software is at:
>
> ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer
> .f

I hope Bill Corden notes this. He seems upset that he hasn't found MBH's
algorithm, and takes the word of M&M that it isn't available.

w...@bas.ac.uk

unread,

Oct 16, 2004, 9:25:42 AM10/16/04

to

Thomas Palm <Thoma...@chello.removethis.se> wrote:
>w...@bas.ac.uk wrote in news:4170...@news.nwl.ac.uk:

>Thanks for taking the time of looking at the results. One of these days I
>just got to learn something about PCA.

For the purposes of this, you don't need to know about PCA I think.
As I read M&M's submission, they are complaining about the variance
weighting *before* the PCA is done. So there is no reason not to look
at the code yourself and confirm that M&M are wrong...

>> So we need to look at MBH's software, not M&M's description of it.
>> MBH's software is at:
>>
>> ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer
>> .f

>I hope Bill Corden notes this. He seems upset that he hasn't found MBH's
>algorithm, and takes the word of M&M that it isn't available.

Err, something wrong there, because M&M themselves provided the link to
the program.

beli...@aol.com

unread,

Oct 16, 2004, 12:34:18 PM10/16/04

to

w...@bas.ac.uk wrote in message news:<4170...@news.nwl.ac.uk>...

> (aside: fortran is a terrible language for doing this stuff,
> they should use a vector language like IDL).

I think your knowledge of Fortran is outdated -- maybe you are
referring to the old Fortran 77 standard. Since the Fortran 90,
Fortran has had array operations, so that one can write A = B + C,
where A, B, and C are conformant arrays. One can also use array
slices, writing A(:,1) = B(:,1) + C(:,1) if A, B, and C are all
matrices. There is a free Fortran 95 compiler called g95 at
http://www.g95.org . Using an ISO standardized language, with at least
one freely available implementation, makes collaboration easier than
using an expensive, proprietary tool like IDL. Links to Fortran
compilers are at http://www.dmoz.org/Computers/Programming/Languages/Fortran/Compilers/
.

w...@bas.ac.uk

unread,

Oct 16, 2004, 6:14:15 PM10/16/04

to

beli...@aol.com wrote:
>There is a free Fortran 95 compiler called g95 at http://www.g95.org

Good. When I last looked (a few years ago) for a gnu-F90 it was not available.
I shall try this one.

beli...@127.0.0.1

unread,

Oct 16, 2004, 7:03:48 PM10/16/04

to

w...@bas.ac.uk wrote:
>beli...@aol.com wrote:
>>There is a free Fortran 95 compiler called g95 at http://www.g95.org
>
>Good. When I last looked (a few years ago) for a gnu-F90 it was not available.
>I shall try this one.

Good. The official Gnu Fortran 95 compiler is called gfortran, with web site
http://gcc.gnu.org/fortran/ . It forked from g95, for reasons described in
the recent thread "gfortran vs. g95" in the newsgroup comp.lang.fortran .
It is also worth trying. On Linux, Intel Fortran is free for non-commercial
use.

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Nigel Persaud

unread,

Oct 16, 2004, 9:01:33 PM10/16/04

to

> So we need to look at MBH's software, not M&M's description of it.

I've looked at the code and it seems to me that M&M's description of
the software is quite accurate. M&M's objection about code
unavailability pertain to the rest of MBH.

> MBH's software is at:
>
> ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer.f
>
> and you can of course read it yourself... fortran is so easy to read...
>
> what they do is (search down over the reading in data till you get to
> 9999 continue):
>
> 1 - remove the 1902-1980 mean
> 2 - calc the SD over this period
> 3 - divide the whole series by this SD, point by point
>
> At this point, the new data are in the situation I described above:
> datasets that trend upwards at the end have had their variance *reduced*
> not increased. But there is more...
>
> 4 - remove the linear trend from the new 1902-1980 series
> 5 - compute the SD again for 1902-1980 of the detrended data
> 6 - divide the whole series by this SD.
>
> This was exactly what I was expecting to see: remove the linear trend
> before computing the SD.
>
> Then the SVD type stuff begins. So... what does that all mean? It certainly
> looks a bit odd, because steps 1-3 appear redundant. The scaling done in 4-6
> is all you need. Is the scaling of 1-3 *harmful*? Not obviously.
>
> Perhaps someone would care to go through and check this. If I haven't made
> a mistake then I think M&M's complaints are unjustified and Nature
> correct to reject their article.

I think that you have made a mistake. The problem is not the apparent
redundancy. The problem comes from the subtraction of the mean, you've
gotten distracted by the standard deviation. Muller's right about
this, it is a problem.

Only one of the Nature referees said that he was an expert in
principal components and this referee agreed with M&M's concerns about
short-segment standardization. He also said that he was unimpressed
by Mann's "shouting longer and louder".

So of the peer review calibre people who have looked at this, the
scorecard to date seems to read: one Nature reviewer who was expert in
principal components seems to agree with M&M, Muller (who is peer
review calibre) agrees with M&M, one Nature peer reviewer doesn't seem
to have thought about the problem and thought that the matter was too
"difficult" for a Nature article (which seems weird), and one Nature
peer reviewer thought that Mann had answered the questions (but didn't
claim to have checked the calculations.)

It looks like a Florida election with a slight edge to M&M right now.

w...@bas.ac.uk

unread,

Oct 16, 2004, 10:04:24 PM10/16/04

to

Nigel Persaud <pers...@yahoo.com> wrote:
>wmc:

>> ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer.f
>>
>> and you can of course read it yourself... fortran is so easy to read...
>>
>> what they do is (search down over the reading in data till you get to
>> 9999 continue):
>>
>> 1 - remove the 1902-1980 mean
>> 2 - calc the SD over this period
>> 3 - divide the whole series by this SD, point by point
>>
>> At this point, the new data are in the situation I described above:
>> datasets that trend upwards at the end have had their variance *reduced*
>> not increased. But there is more...
>>
>> 4 - remove the linear trend from the new 1902-1980 series
>> 5 - compute the SD again for 1902-1980 of the detrended data
>> 6 - divide the whole series by this SD.
>>
>> This was exactly what I was expecting to see: remove the linear trend
>> before computing the SD.
>>
>> Then the SVD type stuff begins. So... what does that all mean? It certainly
>> looks a bit odd, because steps 1-3 appear redundant. The scaling done in 4-6
>> is all you need. Is the scaling of 1-3 *harmful*? Not obviously.
>>
>> Perhaps someone would care to go through and check this. If I haven't made
>> a mistake then I think M&M's complaints are unjustified and Nature
>> correct to reject their article.

>I think that you have made a mistake. The problem is not the apparent

>redundancy. The problem comes from the subtraction of the mean...

Why? Simply asserting it isn't convincing.

David Ball

unread,

Oct 16, 2004, 10:30:09 PM10/16/04

to

I went through this nearly a year ago, William, and I found
the same odd thing. Here's what I put down in early December of last
year...

I'm continuing to wade through some of Mann's archives, trying
to see what he and his colleagues were doing in the 1998 paper. I have
to admit not having had a lot of time to look at it. Still, I'm making
a little progress. I have to say right off the bat that their pca
software is a bloody mess, but that likely stems from having
everything written in Fortran (yuck!).
The process of doing a proper PCA analysis requires the
construction of either a covariance or correlation matrix. I usually
use the former as most of the data I work with is satellite data and I
can stretch the contrast of the images I'm working prior to running
them through the algorithm. The latter is usually used if the data
have different scales or variances. My initial runs through the SWM
datasets were giving me odd results, because I had blissfully assumed
that they were using a covariance matrix. Not!
Here's what they appear to be doing - feel free to tell me
where I screw up...

1. Determine the mean and standard deviation of the data from
the 1902-present period, subtracting that mean from all of the data
and normalizing it with the calculated standard deviation.
2. Remove any linear trend from the 1902-present data.
3. Re-calculate the standard deviation for the period and
normalize all of the data with that value.
4. Create a complex matrix from which to use SVD to determine
the relevant eigenvectors and eigenvalues.

A couple of notes...

a. Removing the linear trend from the 1902-present data is a
good idea as you don't want it to mask the underlying structure that
should extend from pre-industrial times to the present. I was a little
bothered by the assumption of a linear trend in said data, but I did
some trend analysis tests on the data, selectively checking for both
polynomial and Fourier series up to order 20 using both regular and
robust techniques and everything pointed to a simple linear trend.
b. It appears that they are using a modified version of the
correlation coefficient, determining the standard deviations of the
1902-present data, before and after the trend analysis. This seems to
make a significant difference. I tried detrending the data first, then
simply calculating the standard deviations and the results are not the
same as they get. I can't fault their methodology, though I'm not
entirely sure why they do the before/after trick - any suggestions? -
but I'll test that process next and see what comes out....

Basically, I found the same thing you did. Simply taking your
steps 4 to 6 does give you different results that MBH. I'm simply not
sure why they massaged the data this way. There doesn't seem to be a
physical reason for doing so.

David Ball

unread,

Oct 16, 2004, 10:32:32 PM10/16/04

to

On 16 Oct 2004 14:01:33 -0700, pers...@yahoo.com (Nigel Persaud)
wrote:

Nigel rears his ugly head again, I see. Are you a colleague of
McKittrick's by any chance, Nigel? The only time you make an
appearance here is to defend him and you do so through so many flights
of fancy that it's scary.

w...@bas.ac.uk

unread,

Oct 17, 2004, 9:33:32 AM10/17/04

to

David Ball <wra...@mb.sympatico.ca> wrote:

> I went through this nearly a year ago, William, and I found
>the same odd thing. Here's what I put down in early December of last
>year...

Hmmm. So you did. A year ago it was getting all to tedious: wading through
datasets and so on. I gave up on the details and waited for an answer to
emerge, but so far it hasn't clearly, though M&M's rejection is a partial
answer. Now its down to a few simple steps preparing the data for PCA/EOF/SVD
I am interested again. My main problem with the original M&M was that they
didn't provide a simple explanation of what was wrong. Now they have
provided a simple explanation (I do wish they had started with this rather than
leaving it to the end (or is this the end?)) but at the moment I don't believe
it.

> Basically, I found the same thing you did. Simply taking your
>steps 4 to 6 does give you different results that MBH. I'm simply not
>sure why they massaged the data this way. There doesn't seem to be a
>physical reason for doing so.

Ah, well thats interesting. How different?

David Ball

unread,

Oct 17, 2004, 11:38:11 AM10/17/04

to

On 17 Oct 2004 10:33:32 +0100, w...@bas.ac.uk wrote:

>David Ball <wra...@mb.sympatico.ca> wrote:
>
>> I went through this nearly a year ago, William, and I found
>>the same odd thing. Here's what I put down in early December of last
>>year...
>
>Hmmm. So you did. A year ago it was getting all to tedious: wading through
>datasets and so on. I gave up on the details and waited for an answer to
>emerge, but so far it hasn't clearly, though M&M's rejection is a partial
>answer. Now its down to a few simple steps preparing the data for PCA/EOF/SVD
>I am interested again. My main problem with the original M&M was that they
>didn't provide a simple explanation of what was wrong. Now they have
>provided a simple explanation (I do wish they had started with this rather than
>leaving it to the end (or is this the end?)) but at the moment I don't believe
>it.

My problem is that if they wanted to understand what MBH did
why are they only now getting around to looking at their code? The
other problem I have is that in their updated note to Nature they
argue that the characteristic shape of the MBH results stems from two
main problems, problems that are completely different than those in
their original E&E article. It's almost as if they tried a whole bunch
of things, didn't get anywhere with them, so are trying again with a
couple of more things.

>
>> Basically, I found the same thing you did. Simply taking your
>>steps 4 to 6 does give you different results that MBH. I'm simply not
>>sure why they massaged the data this way. There doesn't seem to be a
>>physical reason for doing so.
>
>Ah, well thats interesting. How different?

If I remember correctly, the difference was significant,
certainly enough to make note of.

Joshua Halpern

unread,

Oct 18, 2004, 1:02:43 AM10/18/04

to

David Ball wrote:
> On 17 Oct 2004 10:33:32 +0100, w...@bas.ac.uk wrote:
>>David Ball <wra...@mb.sympatico.ca> wrote:

SNIP....

>>> Basically, I found the same thing you did. Simply taking your
>>>steps 4 to 6 does give you different results that MBH. I'm simply not
>>>sure why they massaged the data this way. There doesn't seem to be a
>>>physical reason for doing so.
>>
>>Ah, well thats interesting. How different?

> If I remember correctly, the difference was significant,
> certainly enough to make note of.

Since the blade (1900 - 1980) is the best determined part of the curve,
I doubt it would change much, but how about the stick? Did it move
down more towards the other reconstructions (my guess) or move up.

josh halpern

David Ball

unread,

Oct 18, 2004, 2:05:54 AM10/18/04

to

You know, Josh, I honestly don't remember. It's been 10
months. I'll see if I can find some time tomorrow to redo some of the
calculations. I don't kn ow if I'll manage it, though, winter is
making an early and most unwelcome appearance.

Nigel Persaud

unread,

Oct 18, 2004, 3:57:04 AM10/18/04

to

> >> Perhaps someone would care to go through and check this. If I haven't made
> >> a mistake then I think M&M's complaints are unjustified and Nature
> >> correct to reject their article.
>
> >I think that you have made a mistake. The problem is not the apparent
> >redundancy. The problem comes from the subtraction of the mean...
>
> Why? Simply asserting it isn't convincing.
>

Simply denying it isn't convincing.

James Annan

unread,

Oct 20, 2004, 2:59:08 AM10/20/04

to

w...@bas.ac.uk wrote:

> But (having read their paper) I now think I understand what they think the
> problem is (aside: they complain about data issues with some series but
> I think this is beside the point: the *main* point they are talking about
> is below), and I think that they are probably wrong, based on reading
> MBH's fortran (aside: fortran is a terrible language for doing this stuff,
> they should use a vector language like IDL). But anyway:

Having had a quick glance at this and their papers, I think I agree with
you. In fact it appears that we can add not knowing the difference
between multiplication and division, to the already impressive list of
blunders that M&M have made. They even seem to talk about _adding_ the
mean to the time series rather than _subtracting_ it too. I might check
this more carefully over the next few days if no-one else beats me to it.

James
--
If I have seen further than others, it is
by treading on the toes of giants.
http://www.ne.jp/asahi/julesandjames/home/

w...@bas.ac.uk

unread,

Oct 20, 2004, 5:28:05 PM10/20/04

to

James Annan <still_th...@hotmail.com> wrote:

>Having had a quick glance at this and their papers, I think I agree with
>you. In fact it appears that we can add not knowing the difference
>between multiplication and division, to the already impressive list of
>blunders that M&M have made. They even seem to talk about _adding_ the
>mean to the time series rather than _subtracting_ it too. I might check
>this more carefully over the next few days if no-one else beats me to it.

If our current power cut doesn't end you'll certainly beat me... :-(

Back to Victorian days for us. Fortunately we have lots of candles. And
a portable with a bit of power left in its battery...

David Ball

unread,

Oct 21, 2004, 2:45:46 PM10/21/04

to

I had a few minutes last night and I had a quick look at M&M's
submission to Nature from January at:

http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.final.pdf

I was particularly interested in their red noise experiments outlined
at the bottom of page 4 and the top of page 5, so I repeated part of
it step by step. I didn't get the MBH part finished. That will have to
wait until next week, but I did notice a couple of interesting
things...

1. In generating their AR1 random vectors they had to
calculate random normal deviates. In most algorithms I'm aware of,
some type of random number generator is employed and those require a
seed value. Guess what happens if you run the experiment and don't
change the seed? You get back exactly the same information as you did
before. M&M clearly state: "Ten simulations were carried out and a
hockey stick shape was observed in every simulation." That's a pretty
meaningless statement if they've been getting back the same
information every time they run their software. Normally, I would
assume that this is something they'd check on, but this is the same
guy who didn't bother to convert degrees to radians...
2. What's so special about 1081? They create their random
vectors of this length but why not 1000? Or 2000? I did some playing
around and you, not surprisingly, get different results if you vary
the length of the start vectors. Whether this has an impact on the
results, I don't know.

I'll take a look at what they claim MBH did, by merely
removing the 1902-1980 mean (in my case it will be 1902-1971 since I
only used complete records and didn't use any fills) and normalizing
to unit standard deviation. I'll also do exactly what MBH did, since
what M&M did in this submission is clearly not what MBH did. That was
spelled out pretty clearly by William. I do wish M&M would take the
time to read the original MBH98 study. It would be so much easier if
they didn't go changing MBH's methodology on a whim.

James Annan

unread,

Oct 22, 2004, 12:41:25 AM10/22/04

to

David Ball <wra...@mb.sympatico.ca> wrote in message news:<e2hfn0dm3r7577ius...@4ax.com>...

> I had a few minutes last night and I had a quick look at M&M's
> submission to Nature from January at:
>
> http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.final.pdf
>
> I was particularly interested in their red noise experiments outlined
> at the bottom of page 4 and the top of page 5, so I repeated part of
> it step by step. I didn't get the MBH part finished. That will have to
> wait until next week, but I did notice a couple of interesting
> things...
>
> 1. In generating their AR1 random vectors they had to
> calculate random normal deviates. In most algorithms I'm aware of,
> some type of random number generator is employed and those require a
> seed value. Guess what happens if you run the experiment and don't
> change the seed? You get back exactly the same information as you did
> before. M&M clearly state: "Ten simulations were carried out and a
> hockey stick shape was observed in every simulation." That's a pretty
> meaningless statement if they've been getting back the same
> information every time they run their software. Normally, I would
> assume that this is something they'd check on, but this is the same
> guy who didn't bother to convert degrees to radians...

Actually, I believe that McIntyre was not linked to McKitrick's
blunders (I made this same mistake recently).

> 2. What's so special about 1081? They create their random
> vectors of this length but why not 1000? Or 2000? I did some playing
> around and you, not surprisingly, get different results if you vary
> the length of the start vectors.

Rubbish. The only point of the initial vector is to forget the initial
conditions. With the autocorrellation coefficients used, probably
anything more than about 10 discarded points would be ok, beyond that
the results are statistically indistinguishable.

> I'll take a look at what they claim MBH did, by merely
> removing the 1902-1980 mean (in my case it will be 1902-1971 since I
> only used complete records and didn't use any fills) and normalizing
> to unit standard deviation. I'll also do exactly what MBH did, since
> what M&M did in this submission is clearly not what MBH did. That was
> spelled out pretty clearly by William. I do wish M&M would take the
> time to read the original MBH98 study. It would be so much easier if
> they didn't go changing MBH's methodology on a whim.

The wording in M&M's paper seems rather poor, since it does say they
added the mean rather than subracting. However, in fact there is an
effect somewhat similar to their claims when the late-sample mean is
subtracted. I do not believe that their conclusions are justified
however - the effect is an offset at the end of the 1st PC, rather
than a strong slope as in MBH98. In fact, the method itself does not
generate any significant trends. Also, the offset is only about 1sd
(late sample mean is about 1sd from the whole mean), whereas their
nature submission looks like about 3sd (certainly well over 2). I
could not get anything close to 2sd with many replications.

This is only a preliminary look and I might have something more to say
later. But I'm a bit busy for the next week or so...

James

David Ball

unread,

Oct 22, 2004, 2:13:24 AM10/22/04

to

On 21 Oct 2004 17:41:25 -0700, still_th...@hotmail.com (James
Annan) wrote:

>David Ball <wra...@mb.sympatico.ca> wrote in message news:<e2hfn0dm3r7577ius...@4ax.com>...
>> I had a few minutes last night and I had a quick look at M&M's
>> submission to Nature from January at:
>>
>> http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.final.pdf
>>
>> I was particularly interested in their red noise experiments outlined
>> at the bottom of page 4 and the top of page 5, so I repeated part of
>> it step by step. I didn't get the MBH part finished. That will have to
>> wait until next week, but I did notice a couple of interesting
>> things...
>>
>> 1. In generating their AR1 random vectors they had to
>> calculate random normal deviates. In most algorithms I'm aware of,
>> some type of random number generator is employed and those require a
>> seed value. Guess what happens if you run the experiment and don't
>> change the seed? You get back exactly the same information as you did
>> before. M&M clearly state: "Ten simulations were carried out and a
>> hockey stick shape was observed in every simulation." That's a pretty
>> meaningless statement if they've been getting back the same
>> information every time they run their software. Normally, I would
>> assume that this is something they'd check on, but this is the same
>> guy who didn't bother to convert degrees to radians...
>
>Actually, I believe that McIntyre was not linked to McKitrick's
>blunders (I made this same mistake recently).

True enough. I tend not to consider McIntyre too much,
frankly. McKittrick is the professional scientist.

>
>
>> 2. What's so special about 1081? They create their random
>> vectors of this length but why not 1000? Or 2000? I did some playing
>> around and you, not surprisingly, get different results if you vary
>> the length of the start vectors.
>
>Rubbish. The only point of the initial vector is to forget the initial
>conditions. With the autocorrellation coefficients used, probably
>anything more than about 10 discarded points would be ok, beyond that
>the results are statistically indistinguishable.

That's what I thought. If you retain the last 581 points from
the vector of length 1081, keeping the seed value from the random
number generator the same, you get back the same values for PC1 over
and over again after having run the moving average filter over the
data. The variance explained is very low, about 3%.
If you create a vector of length 881, retaining the last 581
values, keeping the seed value the same as in the past example, the
plotted value of PC1 is different. Keep in mind I was tinkering with
this at 7AM at the end of a 12 hour night shift, so all my neurons
might not have been firing properly.

>
>> I'll take a look at what they claim MBH did, by merely
>> removing the 1902-1980 mean (in my case it will be 1902-1971 since I
>> only used complete records and didn't use any fills) and normalizing
>> to unit standard deviation. I'll also do exactly what MBH did, since
>> what M&M did in this submission is clearly not what MBH did. That was
>> spelled out pretty clearly by William. I do wish M&M would take the
>> time to read the original MBH98 study. It would be so much easier if
>> they didn't go changing MBH's methodology on a whim.
>
>The wording in M&M's paper seems rather poor, since it does say they
>added the mean rather than subracting. However, in fact there is an
>effect somewhat similar to their claims when the late-sample mean is
>subtracted. I do not believe that their conclusions are justified
>however - the effect is an offset at the end of the 1st PC, rather
>than a strong slope as in MBH98. In fact, the method itself does not
>generate any significant trends. Also, the offset is only about 1sd
>(late sample mean is about 1sd from the whole mean), whereas their
>nature submission looks like about 3sd (certainly well over 2). I
>could not get anything close to 2sd with many replications.
>
>This is only a preliminary look and I might have something more to say
>later. But I'm a bit busy for the next week or so...
>

I hope to have something by Tuesday or Wednesday next week.
I'll post it here when I get something and will post some graphics to
my website.

David Ball

unread,

Oct 29, 2004, 2:53:07 AM10/29/04

to

On Wed, 20 Oct 2004 11:59:08 +0900, James Annan
<still_th...@hotmail.com> wrote:

>w...@bas.ac.uk wrote:
>
>> But (having read their paper) I now think I understand what they think the
>> problem is (aside: they complain about data issues with some series but
>> I think this is beside the point: the *main* point they are talking about
>> is below), and I think that they are probably wrong, based on reading
>> MBH's fortran (aside: fortran is a terrible language for doing this stuff,
>> they should use a vector language like IDL). But anyway:
>
>Having had a quick glance at this and their papers, I think I agree with
>you. In fact it appears that we can add not knowing the difference
>between multiplication and division, to the already impressive list of
>blunders that M&M have made. They even seem to talk about _adding_ the
>mean to the time series rather than _subtracting_ it too. I might check
>this more carefully over the next few days if no-one else beats me to it.
>

I went through the M&M process of generating random vectors
using an AR(1) model with added gaussian noise and determined the PC's
for that data. I then calculated the mean and standard deviation as
per M&M at:

http://www.uoguelph.ca/~rmckitri/research/fallupdate04/submission.1.final.pdf

pages 4 and 5 and standardized the data then recomputed the PC's. I
don't know what M&M are doing, but as William suspected, the impact of
removing the mean is to reduce the variance in the PC. It doesn't
change the overall shape of the PC. I also added the mean, since as
you pointed out, they kind of hinted that that was what they were
going to do and surprise, surprise, it increased the variance in the
data. At least on that point, M&M are quite wrong.

w...@bas.ac.uk

unread,

Nov 3, 2004, 4:38:28 PM11/3/04

to

OK, I have done a tiny bit more on this. Previous posts refer.

w...@bas.ac.uk wrote:

>MBH's software is at:
>ftp://holocene.evsc.virginia.edu/pub/MBH98/TREE/ITRDB/NOAMER/pca-noamer.f

(btw, McI says I imply that they don't ref this: of course they do: I got
the location of this from M&M's stuff: I'm assuming that it is therefore
the correct file but I could be wrong...)

I'm using this, and answering "1000" to when-should-the-series-date-to. I
don't think the exact date is crucial. So, I said (and no-one seems to dispute
points 1-6) that the above prog:

>1 - remove the 1902-1980 mean
>2 - calc the SD over this period
>3 - divide the whole series by this SD, point by point

>4 - remove the linear trend from the new 1902-1980 series

>5 - compute the SD again for 1902-1980 of the detrended data
>6 - divide the whole series by this SD.

>Then the SVD type stuff begins. So... what does that all mean? It certainly

>looks a bit odd, because steps 1-3 appear redundant. The scaling done in 4-6
>is all you need. Is the scaling of 1-3 *harmful*? Not obviously.

I was wrong to say that step 1 is redundant. Its step 3 (and therefore 2) that
looked redundant, but possibly harmful. I've now done the obvious, and re-run
the code with the appropriate line commented out (its:
C aprox(iyear,j)=aprox(iyear,j)/sdprox(j)
if you're reading the code). When you do this, the output appears identical:
ie, step 3 is redundant but *harmless*. I think (on reflection) that this is
obvious. But it wasn't obvious before. (Removing step 1 as well makes the first
eigenvalue far more dominant).

If you didn't bother follow all that, then the bottom line is that a small piece
of confusion is cleared away and its a bit more likely that M&M are wrong. But
re-doing the Monte-Carlo stuff is still a useful idea and remains to be done.
And I hope someone else will check this.

ps: M&M have (in the past) made a great virtue of having their software available
for others to check. Its how Tim L found the degrees/radians bug after all.
Does anyone know if their current monte-carlo stuff is online?

w...@bas.ac.uk

unread,

Nov 3, 2004, 9:36:18 PM11/3/04

to

I've now done some more of more. Its grown enough that posting it is
silly so its now on my web site:

http://www.wmconnolley.org.uk/sci/mbh/

If what I have put there is right, then M&M are wrong. But is it right?
Well, I invite comment.

w...@bas.ac.uk

unread,

Nov 3, 2004, 9:40:24 PM11/3/04

to

Err, following myself, how dull.

Take a look at:

http://www.uoguelph.ca/~rmckitri/research/fallupdate04/update.fall04.html

I thought they were sailing a bit close to the wind by posting the
referees comments: breach of privacy and all that. And now they've
been dismasted.

Anyone take a copy of the original?