Questions Regarding Station Identifiers

2 views
Skip to first unread message

Clearscience

unread,
Jan 2, 2011, 10:18:35 AM1/2/11
to CCC GISTEMP discussion
Hello all,

I was wondering how one could alter the code to give information
regarding the stations being used in the CCC-Gistemp analysis. In
particular the amount of stations used per given year (for yearly
confidence intervals) and whether there was a way to identify the
exact name of each station used or geographic location.

Any ideas?

David Jones

unread,
Jan 10, 2011, 10:08:55 AM1/10/11
to ccc-giste...@googlegroups.com

Yes (but to do exactly what you want will require a bit more coding).

Some of this is collected (in logfiles) already, but some is not.

step3.py produces a log file log/step3.log

For each cell processed the list of stations and weights is logged:

-36.5-010.5C stations [["141689060008", 0.64142958169039],
["141689020000", 0.0], ["141689020001", 0.0]]

"-36.5-010.5C" identifies the cell (by the location of its centre);
"stations" is just a keyword used for logging; and the remaining of
the line is a list of (station,weight) pairs. This is a deliberately
short example, typically there are hundreds of stations contributing
to a cell.

Note that (as in this example), some stations have weight 0.0. That
happens when the station does not combine with the reference series
(because of insufficient overlap), but is sufficiently close to the
cell to be considered for combining (within 1200 km for the usual
setting of parameters.gridding_radius).

Different cells use different set of stations, some of these will
overlap. I have just written a tool to extra a list of stations from
the log file. You need to specify a set of cells with a mask file:

python tool/multi.py --mask maskfile

It outputs counts and a list of stations. Like this:

$ tool/multi.py whatstations --mask labradormask
Cells: 15/15
Stations: 284
403710810005 0.00778946286308
403710900006 0.063875092712
403710910000 0.129755186619
403710920000 0.20633374139
...

The weight reported is the maximum weight used over all the cells in
which that station appears.

Some caveats:

[minor] In considering the anomaly series for a large area (the arctic
zone from 64N to 90N say) it's possible that some cells will not
combine, thus it would be incorrect to consider stations that
contribute to that cell. I don't know if this actually happens.

When combining stations, each station is effectively treated as 12
separate months (one for each of month of the year). The station
identifier is output into the log (above) if any month combines, but
it is not (yet) possible to tell which months did or did not combine.
Some stations could contribute all their months, but others might only
contribute March through August (just an example, but might be typical
for a boreal summer occupied station that had only very sporadic
winter data).

So even given a list of stations, it's not possible to count up the
number of stations contributing to each year (or month).

I'm planning on fix that.

To use the "multi.py whatstations" you'll need the latest version of
the sources from subversion (SVN). They're here:
http://code.google.com/p/ccc-gistemp/source/checkout . You'll need a
subversion client.

Cheers,
drj

Reply all
Reply to author
Forward
0 new messages