Notes on Hansen & Lebedeff 1987

Gareth Rees

unread,

Dec 13, 2009, 4:03:40 PM12/13/09

to ccc-giste...@googlegroups.com

I’ve been reading Hansen and Lebedeff 1987, “Global Trends of
Measured Surface Air Temperature”, and making notes on things that
stood out to me, that we might want to consider for later stages of
the project. I’m sure you’re all ahead of me on this.

1. Configurable parameters

In Hansen and Lebedeff 1987, they mention that they tried a number of
different values for the parameters of their algorithm, and argue that
the ones they used are either (a) the best in some sense that they
define, or else (b) arbitrary, with other reasonable choices making
little or no difference. One part of replicating their work should
involve checking these claims.

An example of (a): the figure of 20 years for the minimum overlap to
use when combining station records.

“For the results we present, we used only station records which had
an overlap of 20 years or more with the combination of other stations
within 1200 km. We tested other choices for this overlap period and
found little effect on the global and zonal results. Some effect could
be seen on global maps of derived temperature change; a limit of 5
years or less caused several unrealistic local hot spots or cold spots
to appear, while a limit greater than 20 years caused a significant
reduction in the global area with station coverage.”

An example of (b): the order in which stations are considered for
combining.

“However, we have tested the effect of other choices for station
ordering, for example, by beginning with the station closest to the
subbox center, rather than the station of longest record. The
differences between the results for alternative choices were found to
be very small, about 2 orders of magnitude less than the typical long-
term temperature trend.”

So we should present the algorithm with a configuration table of
parameters that it would be sensible to adjust, and test the effects
of these parameters.

Here’s a partial list of things we should consider as parameterizable:

* The number of sub-boxes per box. (Currently 100. This appears in
several places in the code: notably the bare constants ‘10’ and
‘0.1’ in eqarea.py; and the variable ‘nsubbox’ in step3.py and
step5.py.)
* The distance at which a station is no longer considered for
contribution to the temperature of a sub-box. (Currently 1200 km;
variable ‘radius’ in step5.py.)
* The minimum overlap to use when combining station records.
(Currently, this is the variable ‘novrlp’ in code/step5.py, I
think.)
* The order of combination of stations. This isn’t a numeric
parameter: instead there’s a choice of algorithms. (Currently it’s
the the ‘combine’ function in step3.py and step5.py.) See my
discussion below for more about this.

2. Order of combination

The algorithm described by Hansen and Lebedeff (sort the stations by
number of records, then combine them in order) looks as though it runs
the risk of throwing out data from some stations that could have been
combined had the combination occurred in another order.

Suppose, for example, that we have three stations: station A has
records from 1880 to 1940, B from 1920 to 1960, and C from 1940 to
1990. We sort them into order by number of records, getting the order
A;C;B. We then try to combine A and C, but there’s no overlap, so we
can’t combine them, and we ignore the data from C. However, if only
we had combined them in the order A;B;C instead, we would have been
able to include the data from all three stations.

Perhaps this loss of data rarely or never happens in practice, but
that’s something we could check.

In any case, there’s a way to combine the stations that guarantees
that as much data will be included as possible. Consider all pairs of
stations: pick the pair of stations with the greatest overlap and
combine them. Repeat until no stations or combinations can be combined
with any others.

This has a risk that you’ll end up with multiple combinations of
stations, none of which overlap enough with any of the others for it
to be possible to combine them. But if that happens, it’s a real
feature of the data and perhaps ought to be dealt with in some better
way than throwing out all but one of the combinations.

3. Error analysis

Some of Hansen and Lebedeff’s choices are justified by reference to
their error analysis.

“We have also tested alternatives to these procedures and compared
the error estimates for the alternatives, the error estimates being
obtained as described in Section 5. For example, we tried weighting
each box by the box area and each zone by the zone area, rather then
weighting by the area with a defined temperature change. Overall
temperature changes were similar with the different procedures, but
the procedure as we defined it previously was found to yield the
smallest errors of the alternatives which were tested.”

I have to say I don’t yet fully understand the section on error
analysis, but it seems to involve running a Global Climate Model and
seeing how accurately a set of temperature stations can predict global
and regional anomalies within the model.

If I understand rightly, they used a single run from the model to do
their error analysis. I guess morally speaking they ought to run the
model many times with different parameters, but in 1987 I doubt they
had enough computer time to do that. (They’ve undoubtedly done many
more runs since 1987, with updated and improved GCMs.)

Is the error analysis within the scope of our project?

--
Gareth Rees

Richard Hendricks

unread,

Dec 14, 2009, 11:56:46 AM12/14/09

to ccc-giste...@googlegroups.com

On Sun, Dec 13, 2009 at 3:03 PM, Gareth Rees <garet...@pobox.com> wrote:
> “For the results we present, we used only station records which had
> an overlap of 20 years or more with the combination of other stations
> within 1200 km. We tested other choices for this overlap period and
> found little effect on the global and zonal results. Some effect could
> be seen on global maps of derived temperature change; a limit of 5
> years or less caused several unrealistic local hot spots or cold spots
> to appear, while a limit greater than 20 years caused a significant
> reduction in the global area with station coverage.”

I don't understand why shorter overlaps cause temperature spots.
Did they explain this further?

> If I understand rightly, they used a single run from the model to do
> their error analysis. I guess morally speaking they ought to run the
> model many times with different parameters, but in 1987 I doubt they
> had enough computer time to do that. (They’ve undoubtedly done many
> more runs since 1987, with updated and improved GCMs.)

I didn't get that from what you quoted. Was there additional text that
implied they only did one run? I would be very surprised if they did just
one run of a GCM; even then they knew that minor input effects could change
the results of a single run (hence all GCM results are "ensemble" type
outputs, not
just single runs, unless specifically discussing the effects in a single run).

> Is the error analysis within the scope of our project?

I think it should be. "Verifying the code works as intended" implies
checking "code works" and "intended" IMO. :)

--
"Thus, from the war of nature, from famine and death, the most exalted
object of which we are capable of conceiving, namely, the production
of the higher animals, directly follows. There is grandeur in this
view of life, with its several powers, having been originally breathed
into a few forms or into one; and that, whilst this planet has gone
cycling on according to the fixed law of gravity, from so simple a
beginning endless forms most beautiful and most wonderful have been,
and are being, evolved." --Charles Darwin

William Connolley

unread,

Dec 14, 2009, 12:24:40 PM12/14/09

to ccc-giste...@googlegroups.com

>> If I understand rightly, they used a single run from the model to do
>> their error analysis. I guess morally speaking they ought to run the
>> model many times with different parameters, but in 1987 I doubt they
>> had enough computer time to do that. (They’ve undoubtedly done many
>> more runs since 1987, with updated and improved GCMs.)
>
> I didn't get that from what you quoted. Was there additional text that
> implied they only did one run? I would be very surprised if they did just
> one run of a GCM; even then they knew that minor input effects could change
> the results of a single run (hence all GCM results are "ensemble" type
> outputs, not
> just single runs, unless specifically discussing the effects in a single run).

Back in '87 computers were too tiny for ensembles.

However, the error anlysis doesn't depend particularly heavily on
which run you use - it just has to be any old vaguely plausible
temperature field. Or you could use successive years/months/seasons
from one run.

-William

--
William M. Connolley | www.wmconnolley.org.uk | 07985 935400

Gareth Rees

unread,

Dec 14, 2009, 1:34:43 PM12/14/09

to ccc-giste...@googlegroups.com

It's worth reading the entire paper — http://pubs.giss.nasa.gov/docs/1987/1987_Hansen_Lebedeff.pdf
— it's a pretty clear description of the anomaly computation (as it
stood in 1987) and doesn't demand much (if any) knowledge of climate
science.

Richard Hendricks wrote:
>> “For the results we present, we used only station records which had
>> an overlap of 20 years or more with the combination of other stations
>> within 1200 km. We tested other choices for this overlap period and
>> found little effect on the global and zonal results. Some effect
>> could
>> be seen on global maps of derived temperature change; a limit of 5
>> years or less caused several unrealistic local hot spots or cold
>> spots
>> to appear, while a limit greater than 20 years caused a significant
>> reduction in the global area with station coverage.”
>
> I don't understand why shorter overlaps cause temperature spots.
> Did they explain this further?

There's no explanation beyond what I quoted above (see page 13,350,
right column). However, it seems clear how it could happen: allowing
shorter overlaps will lead to station records occasionally being
combined on the basis of unrepresentative (outlier) years. Imagine the
extreme case in which we permit temperature records to be combined on
the basis of a single year of overlap, and imagine two stations being
combined on the basis of a single year in which station A was
unusually hot and station B was unusually cold. We'd end up with a
local anomaly that was misleadingly warm (if A was combined into B) or
cold (if B was combined into A). This will happen only rarely (when
there are stations which only overlap with their neighbours for
outlier years), leading to local hot- and cold-spots.

>> If I understand rightly, they used a single run from the model to do
>> their error analysis. I guess morally speaking they ought to run the
>> model many times with different parameters, but in 1987 I doubt they
>> had enough computer time to do that. (They’ve undoubtedly done many
>> more runs since 1987, with updated and improved GCMs.)
>
> I didn't get that from what you quoted. Was there additional text
> that
> implied they only did one run?

It was a general impression I got from section 5. Here's part of the
introduction to the section (pages 13,360–13,362):

“We obtain a quantitative estimate of the error due to imperfect
spatial and temporal coverage with the help of a 100-year run of a
general circulation model (GCM). The GCM is model II, described by
Hansen et al. [1983]. In the 100-year run the ocean temperature was
computed, but hor/zontal ocean heat transports were fixed (varying
geographically and seasonally, but identical from year to year) as
described by Hansen et al. [1984]. The ocean mixed layer depth also
varied geographically and seasonally, and no heat exchange occurred
between the mixed layer and the deeper ocean. This 100-year run will
be described in more detail elsewhere, since it serves as the control
run for several transient CO2/trace gas climate experiments.”

> I would be very surprised if they did just one run of a GCM; even
> then they knew that minor input effects could change the results of
> a single run (hence all GCM results are "ensemble" type outputs, not
> just single runs, unless specifically discussing the effects in a
> single run).

I don't doubt that they have done multiple runs of their GCMs over the
years, but section 5 of the paper implies to me that they used a
single run of the model to do their error estimation for their 1987
results. Do read the paper and see if you agree with me.

--
Gareth Rees

Gareth Rees

unread,

Dec 14, 2009, 1:50:47 PM12/14/09

to ccc-giste...@googlegroups.com

William Connolley wrote:
> However, the error anlysis doesn't depend particularly heavily on
> which run you use - it just has to be any old vaguely plausible
> temperature field. Or you could use successive years/months/seasons
> from one run.

This isn't obvious to me. Couldn't it be the case that the
distribution of stations we've got is better at detecting certain
kinds of climate change than others? For example, since most stations
are on land, it seems possible that the GISTEMP algorithm is more
accurate when the land and oceanic temperate trends are strongly
correlated, and less accurate when they are weakly correlated. So you
might want to have several different model runs with different
features in order to get a good idea of possible range of GISTEMP
performance.

--
Gareth Rees

David Jones

unread,

Dec 14, 2009, 4:23:39 PM12/14/09

to ccc-giste...@googlegroups.com

This seems like Science to me. And whilst it is interesting, I think
it's well out of scope for our project.

Is what the code does clear?

Cheers,
drj

Reply all

Reply to author

Forward