The actual data is pretty sparse, and gets sparser (and less accurate)
the further back in time one looks.
There are a number of difficulties inherent in producing a global
average temperature using noisy data. Factors from the type of
instrument, how it was read, who recorded the data, how often the
data was read, what time of day the data was read and location of
the instrument all factor into the accuracy of the data. Much of
the pre-1960 data was collected by amateur observers and collated
by NOAA et. al.
One also must consider that the land-based surface temperature
data (which is the bulk of the historic data) is not uniformly
distributed across the globe (but rather concentrated in north
america and western europe). The data is often augmented with
sea-surface temperature measurements, subject to similar levels
of uncertainty (bucket measurements, engine intake measurements,
type of instrument, et. alia).
Most of the data-sets being used for analysis (and re-analysis) are
adjusted with various methods to attempt to account for the factors
in the paragraph above. The quality of the adjustments (which seem,
more often than not, to cool the past) are subject to some concern
by various folks, albeit mostly on the sceptical side.
The average for the day at any given measurement site is produced
by summing the recorded daily low and the recorded daily high and
dividing by two. Note that the warming signal produced is
influenced as much by a rise in the daily low as it is by the
rise in the daily high, and an analysis of the raw data shows
that the bulk of the warming in the 20th century was caused by
higher overnight lows, rather than higher daily high temperatures.
When data (e.g. a daily measurement at a given site) is missing,
the algorithms will attempt to 'infill' the data using the
_trends_ from surrounding stations. Many of the louder
sceptics misunderstand this adjustment - they assume that the
daily values from surrounding (up to 1,000 km away) stations
are used, rather than the trends, which indeed would be foolish.
It's not clear that the trends should necessarily match between
two stations 1000 km apart, but that's a different issue driven
primarily by land-use changes (i.e. urbanization) around the
stations.
One may argue all day about each of the various adjustments
(TOBS - Time of Observation, Type of Thermometer, surrouding
elements - e.g. asphalt, building A/C compressors, etc); but
deriving a global temperature accurate to a tenth of a degree
seems to be an agressive goal given the quality of the data.