In this data set we have four variables: lat and long specify the latitude and longitude of a vertex (i.e. a corner of the polygon), id specifies the name of a region, and group provides a unique identifier for contiguous areas within a region (e.g. if a region consisted of multiple islands). To get a better sense of what the data contains, we can plot mi_counties using geom_point(), as shown in the left panel below. In this plot, each row in the data frame is plotted as a single point, producing a scatterplot that shows the corners of every county. To turn this scatterplot into a map, we use geom_polygon() instead, which draws each county as a distinct polygon. This is illustrated in the right panel below.
To introduce these functions, we rely on the ozmaps package by Michael Sumner which provides maps for Australian state boundaries, local government areas, electoral boundaries, and so on (Sumner 2019). To illustrate what an sf data set looks like, we import a data set depicting the borders of Australian states and territories:
This output shows some of the metadata associated with the data (discussed momentarily), and tells us that the data is essentially a tibble with 9 rows and 2 columns. One advantage to sf data is immediately apparent, we can easily see the overall structure of the data: Australia is comprised of six states and some territories. There are 9 distinct geographical units, so there are 9 rows in this tibble (cf. mi_counties data where there is one row per polygon vertex).
The most important column is geometry, which specifies the spatial geometry for each of the states and territories. Each element in the geometry column is a multipolygon object which, as the name suggests, contains data specifying the vertices of one or more polygons that demark the border of a region. Given data in this format, we can use geom_sf() and coord_sf() to draw a serviceable map without specifying any parameters or even explicitly declaring any aesthetics:
It is worth noting that the first layer to this plot maps the fill aesthetic in onto a variable in the data. In this instance the NAME variable is a categorical variable and does not convey any additional information, but the same approach can be used to visualise other kinds of area metadata. For example, if oz_states had an additional column specifying the unemployment level in each state, we could map the fill aesthetic to that variable.
Adding labels to maps is an example of annotating plots (Chapter 8) and is supported by geom_sf_label() and geom_sf_text(). For example, while an Australian audience might be reasonably expected to know the names of the Australian states (and are left unlabelled in the plot above) few Australians would know the names of different electorates in the Sydney metropolitan region. In order to draw an electoral map of Sydney, then, we would first need to extract the map data for the relevant electorates, and then add the label. The plot below zooms in on the Sydney region by specifying xlim and ylim in coord_sf() and then uses geom_sf_label() to overlay each electorate with a label:
The warning message is worth noting. Internally geom_sf_label() uses the function st_point_on_surface() from the sf package to place labels, and the warning message occurs because most algorithms used by sf to compute geometric quantities (e.g., centroids, interior points) are based on an assumption that the points lie in on a flat two dimensional surface and parameterised with Cartesian co-ordinates. This assumption is not strictly warranted, and in some cases (e.g., regions near the poles) calculations that treat longitude and latitude in this way will give erroneous answers. For this reason, the sf package produces warning messages when it relies on this approximation.
Though geom_sf() is special in some ways, it nevertheless behaves in much the same fashion as any other geom, allowing additional data to be plotted on a map with standard geoms. For example, we may wish to plot the locations of the Australian capital cities on the map using geom_point(). The code below illustrates how this is done:
In this example geom_point is used only to specify the locations of the capital cities, but the basic idea can be extended to handle point metadata more generally. For example, if the oz_capitals data were to include an additional variable specifying the number of electorates within each metropolitan area, we could encode that data using the size aesthetic.
The second issue is the shape of your map. The Earth is approximately ellipsoidal, but in most instances your spatial data need to be drawn on a two-dimensional plane. It is not possible to map the surface of an ellipsoid to a plane without some distortion or cutting, and you will have to make choices about what distortions you are prepared to accept when drawing a map. This is the job of the map projection.
Taken together, the geodetic datum (e.g, WGS84), the type of map projection (e.g., Mercator) and the parameters of the projection (e.g., location of the origin) specify a coordinate reference system, or CRS, a complete set of assumptions used to translate the latitude and longitude information into a two-dimensional map. An sf object often includes a default CRS, as illustrated below:
Most of this output corresponds to a well-known text (WKT) string that unambiguously describes the CRS. This verbose WKT representation is used by sf internally, but there are several ways to provide user input that sf understands. One such method is to provide numeric input in the form of an EPSG code (see ). The default CRS in the oz_votes data corresponds to EPSG code 4283:
In ggplot2, the CRS is controlled by coord_sf(), which ensures that every layer in the plot uses the same projection. By default, coord_sf() uses the CRS associated with the geometry column of the data1. Because sf data typically supply a sensible choice of CRS, this process usually unfolds invisibly, requiring no intervention from the user. However, should you need to set the CRS yourself, you can specify the crs parameter by passing valid user input to st_crs(). The example below illustrates how to switch from the default CRS to EPSG code 3112:
As noted earlier, maps created using geom_sf() and coord_sf() rely heavily on tools provided by the sf package, and indeed the sf package contains many more useful tools for manipulating simple features data. In this section we provide an introduction to a few such tools; more detailed coverage can be found on the sf package website -spatial.github.io/sf/.
To get started, recall that one advantage to simple features over other representations of spatial data is that geographical units can have complicated structure. A good example of this in the Australian maps data is the electoral district of Eden-Monaro, plotted below:
Normally when we print the edenmonaro object the output would display all the additional information (dimension, bounding box, geodetic datum etc) but for the remainder of this section we will show only the relevant lines of the output. In this case edenmonaro is defined by a MULTIPOLYGON geometry containing one feature:
Suppose, however, our interest is only in mapping the islands. If so, we can first use the st_cast() function to break the Dawson electorate into the constituent polygons. After doing so, we can use st_area() to calculate the area of each polygon and which.max() to find the polygon with maximum area:
A second way to supply geospatial information for mapping is to rely on raster data. Unlike the simple features format, in which geographical entities are specified in terms of a set of lines, points and polygons, rasters take the form of images. In the simplest case raster data might be nothing more than a bitmap file, but there are many different image formats out there. In the geospatial context specifically, there are image formats that include metadata (e.g., geodetic datum, coordinate reference system) that can be used to map the image information to the surface of the Earth. For example, one common format is GeoTIFF, which is a regular TIFF file with additional metadata supplied. Happily, most formats can be easily read into R with the assistance of GDAL (the Geospatial Data Abstraction Library, ). For example the sf package contains a function sf::gdal_read() that provides access to the GDAL raster drivers from R. However, you rarely need to call this function directly, as there are other high level functions that take care of this for you.
In the code above, the first argument specifies the path to the raster file, and the RasterIO argument is used to pass a list of low-level parameters to GDAL. In this case, we have used nBufXSize and nBufYSize to ensure that R reads the data at low resolution (as a 600x600 pixel image). To see what information R has imported, we can inspect the sat_vis object:
One limitation to displaying only the raw image is that it is not easy to work out where the relevant landmasses are, and we may wish to overlay the satellite data with the oz_states vector map to show the outlines of Australian political entities. However, some care is required in doing so because the two data sources are associated with different coordinate reference systems. To project the oz_states data correctly, the data should be transformed using the st_transform() function from the sf package. In the code below, we extract the CRS from the sat_vis raster object, and transform the oz_states data to use the same system.
Having done so, we can now draw the vector map over the top of the raster image to make the image more interpretable to the reader. It is now clear from inspection that the satellite image was taken during the Australian sunrise:
What if we wanted to plot more conventional data over the top? A simple example of this would be to plot the locations of the Australian capital cities per the oz_capitals data frame that contains latitude and longitude data. However, because these data are not associated with a CRS and are not on the same scale as the raster data in sat_vis, these will also need to be transformed. To do so, we first need to create an sf object from the oz_capitals data using st_as_sf():
c80f0f1006