Bob's Original Discussion of Potential LandMapR upgrades and improvements

37 views
Skip to first unread message

bobmacm

unread,
Jun 8, 2012, 3:03:11 PM6/8/12
to open-geomorph...@googlegroups.com

LandMapR Program Update - Jan 2012


The purpose of this document is to provide a forum within which all interested participants can contribute ideas and opinions about what improvements or additions the LandMapR programs need to become more useful and how to go about producing those extensions or improvements.


1.0 Overall considerations

Some decisions apply to all components of the LandMapR toolkit. Here is my initial list of things we should discuss and agree upon to arrive at a preferred strategy. Please add your own.


1.1 Data model

Do we retain the current tabular database structure for all data or do we move to a more compact raster structure?

A raster structure is more compact and easier to use directly in most GIS software. A database table format has some advantages for storing and managing huge data sets. The TerraFlow project concluded that a database table approach was actually a faster and more efficient way to structure and process massive terrain data sets for computing flow attributes.

See: http://www.cs.duke.edu/geo*/terraflow/

So, it may not be such a bad idea to retain the original LandMapR data table structure and just provide tools to facilitate conversion into and out of the database format (as we currently do with GridReadWrite).  We could alternately define a R-object with the attributes of a data table.

Another advantage of a data table approach is that we can retain explicit x,y coordinates for every point and so support and facilitate analysis of vector point data (e.g. geographic coordinates) as well as regular raster data.

If we instead adopt a raster data model, which format should we use? Presumably we should choose an existing widely used GRID format that can be read and converted using publicly available tools.

1.2. Raster vs vector

Do we retain the current assumption that all data are formatted in a regular raster structure or do we extend the current program by providing an ability to process all data in non-projected geographic coordinates (lat/long) as essentially vector point objects?

Some existing DEM and GIS programs do support calculations of terrain derivatives from un-projected data in geographic coordinates but most do not. The ones that do permit calculation of terrain derivatives from un-projected data in geographic coordinates that I know of are MicroDEM and TNT-Mips.

MicroDEM see http://www.usna.edu/Users/oceano/pguth/website/microdem/microdem.htm

I like the idea of being able to do all calculations on non-projected data in geographic coordinates, as well as on a regular raster grid. This extends the functionality and applicability and it is really not that great a challenge or computational overhead. This additional functionality would greatly enhance the ability of a revised LandMapR toolkit to contribute covariates for the GlobalSoilMap project. This is something I hope to do.


1.3 Core platform

Do we select and adopt a single existing GIS software environment under which all development will take place or do we write a library of generic, stand-alone functions that can then be called from, or used under, any number of existing platforms?

Here are some of the GIS platforms or software environments that various people have suggested we consider using as a basis for the redevelopment of the LandMapR toolkit. Feel free to add your own additional suggestions and to comment on the existing suggestions.
  • SAGA - GIS - Just develop all new code as an extension of SAGA to make use of the already large amount of open source functionality available through SAGA. Also make use of the SAGA GUI and the existing interface between SAGA and R.
  • R - Structure the redevelopment as an R-package or a set of R packages. Use R as the integrating framework and call all utilities and procedures from R. A package called “raster” (created and maintained by Robert Hijmans) now contains a large number of functions for processing raster data including focal and proximity functions. Raster package already contains a generic function called “terrain” that can be used to derive some 7-8 standard land surface parameters (see documentation). Various products from LandMapR can be visualized in Google Earth via the plotKML package, hence there is no need to develop a new GUI to browse geographical maps.
  • Python - Develop all program modules using Python or a combination of Python and C++ and develop separate wrappers to call the utilities from a number of common open-source platforms such as SAGA, R or others.
  • MapWindows with HydroDesktop - This open source project already prvides both a GIS GUI and a suite of flow modelling tools. http://www.mapwindow.org/
  • TAUDEM - The TAUDEM software is well established and provides many of the same functions as LandMapR. The major difference is the approach to pit removing (pit flooding is used in TAUDEM). http://hydrology.usu.edu/taudem/taudem5.0/index.html
  • Quantum GIS - Quantum GIS (QGIS) is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU General Public License. Quantum might be an ideal candidate to provide the GUI and GIS functionality for a revised LandMapR toolkit.
  • Whitebox GIS - The Whitebox GAT project is an exciting new open-source GIS project that is build upon the earlier Terrain Analysis System (TAS) of John Lindsay. John is at Guelph and is Canadian so it does make some sense to cooperate with him and use what he has produced to date to avoid re-inventing the wheel. http://www.uoguelph.ca/~hydrogeo/Whitebox/
  • ESRI, ArcGIS - Develop the LandMapR toolkit as an extension for ArcGIS. This is not a preferred option as it does not meet the objective of supporting free and open access.

I am sure there are many more. But this list is already long enough to give us some pause to think about why we need another complete stand alone version or a terrain analysis and flow modeling toolkit when so many already exist.

Let the discussions begin.


2.0 FlowMapR redevelopment

FlowMapR is the key module that needs to be improved and redeveloped. The only really major difference between LandMapR and all the other available flow modeling programs is the LandMapR approach to identifying, characterizing and removing pits. All the other benefits of the LandMapR tool kit are based on this unique pit removing strategy. But the current pit removing is very slow and is very limited in the size of data set it can process. So, we need to fix that.

Here is my attempt to list the main improvements that I think we need to make for FlowMapR.


2.1 Process non-projected vector x,y geographic coordinates

I think it would be very useful to extend FlowMapR so that it could proecss non-projected x,y coordinate data, in addition to regular raster data.


2.2 Improve identification of flat areas (both normal and inverted DEMs)

Flat areas are areas of contiguous cells of all the same elevation. Right now, the program does not recognize that all contiguous cells that touch one another are part of the same flat area. If we could label all calls of the same elevation that touch any other cell of the same elevation as belonging to the same unique flat area, we could improve the processing of flat areas, particularly flat areas within depressions. We should then locate a single cell at the centre of each unique flat area that does not have outflow (e.g. is in a depression) and nominate that single cell as a pit centre cell. Maybe change the elevation of that pit centre cell by a very small amount (e.g. 0.0001 m) so as to force it to be the one and only pit centre for that flat area. This will eliminate the present situation where there can be multiple adjacent pits all with the same pit centre elevation and zero volume. This situation increases the number of pits that have to be removed and is unnecessary.

2.3 Add additional flow direction codes to permit flow in multiple directions
Presently, FlowMapR using a single flow direction flow algorithm that forces flow to enter one and only one downslope cell (the cell with the greatest vertical drop from the centre cell). The LandMapR flow direction codes use the numbers 1-9 with 5 being used to denote no flow direction (e.g. a pit centre cell).

I want to retain a single flow direction model for modeling initial flow but I want to suggest adding new codes that will permit recognition of divergent or branching flow along linear channel networks. Presently, I know of no raster flow models that permit recognition of branching of flow in the down slope flow direction. This happens in real life, such as when channels split to flow around islands and also when rivers diverge in deltas (forming a special type of island). I am confident that we can identify these types of special locations and assign them codes that will indicate how many down stream cells reveive flow from a given cell and which ones these are. This would give FlowMapR a unique capability not present in other raster flow modeling software.

2.4 Make the pit removing algorithm much faster and more efficient
This is the key improvement that this redevelopment has to achieve. If we do nothing else, we must improve the pit removing function, while maintaining the ability to identify and characterize depressions.

I have a few ideas about how to make the pit removing more efficient or useful. Here they are.


2.4.1 Try a top down algorithm to identify initial global catchments.

I once saw a suggestion for an algorithm for pit removing that worked from the top down instead of bottom up. The idea was that the entire DEM was “flooded” with water and the water level was incrementally reduced 1 unit of elevation at a time. As the water “fell” it was possible to identify points at which global watersheds emerged that were separated by a continuous divide and contributed flow to different outflow points at the edge of the grid.

At present, FlowMapR runs through a complete pit removing exercise, removing pits from bottom (lowest) up, just to identify the high level global watersheds. It this could be accomplished more efficiently using a top down approach, that would be very useful.

The key point about identifying global watersheds, it that a full DEM can be broken apart into rectangles that contain each unique global watershed and these global watershed can be processed independently, either in parallel or series. If they can be processed independently, then the total size of the raster file that needs to be processed at one time can be reduced, sometimes dramatically. Smaller files process much faster than larger ones, so many small files process a lot faster than one large one.


2.4.2 Extract pit cell data to a separate file when removing each pit.

In the present version of FlowMapR, the entire DEM array is held in memory and the entire DEM is sorted from top to bottom by elevation every time a pit is removed. With some DEMs having millions of pits, this sorting really adds a lot of time and processing to pit removing. The reason for sorting is to be able to compute attributes of each new, joined pit, including, its volume, area and depth. Attributes of mm2flood, vol2flood and pond area are also computed for each grid cell within each depression, with attributes computed for each new grid cell that is flooded with each coalescence of two or more pits.

It is really not necessary to process and sort the entire DEM array every time a pit is removed. I did it this way in the beginning because I developed FlowMapR initially just to process rather small DEMs for individual farm fields. As such, the program usually did not have to act on very large rasters (seldom larger than 1,500 rows by 1,500 columns and usually much less). So, it could afford to be inefficient and sometimes slow. There is another way to remove pits that is every bit as correct but that works on much smaller sub-sets of the whole DEM and is therefore much faster.

In 2005-2006 I wrote some FoxPro code outside the official FlowMapR program to remove pits sequentially in order of their implied order of filling (based on the value of mm2flood). I needed to be able to compute 1 additional piece of information for each grid cell that lay within a pit (the ID number of the nested pit to which it belonged). I also wanted to be able to produce maps of extent and depth of flooding at specified amounts of mm2flood. This FoxPro code worked well and I will make it available for use in this recoding exercise.

The important thing about this 2005-2006 custom code is that it demonstrated conclusively that pit removing could be done just as well, and much faster, by extracting to a separate data file information for only those grid cells that belonged to the two local catchments that were currently being merged and, more particularly, to just those cells whose elevation was less than, or equal to, the elevation of the pour point at which the two adjacent catchments overspilled. In short, you only need to know information for those cells that are part of the current merged pit and that lie below the elevation of the pour point at which the two catchments being merged over spill. This extracted file is many times smaller than the entire DEM array and so it is much faster to sort and to process to compute attributes for the pit cells within the current pit. Extracted pit files can be maintained and can “grow” each time an adjacent pit in an adjacent catchment links to the pit with the extracted data. Eventually, there will be as many separate extracted files as their are unique global catchments, but these files will never be larger than the entire DEM and will usually be orders of magnitude smaller, and therefore faster to process.

So, at present, I have just these two suggestions for how we can make pit removing faster and able to process much larger files.

First, define global catchments via a top down draining exercise and split any DEM up into tiles that completely enclose each unique catchment. These tiles should always be no larger than the entire DEM and should usually be much smaller (depending on how many global catchments exist and what their shape is relative to the DEM rectangle).

Second, extract only the relevant data for each pair of pits being coalesced to a separate pit file and process this pit file to compute the attributes of the joined pit and of all the grid cells that ocur within the joined pit (e.g. below the elevation of the pour point).


2.4.3 Add one more attribute of initial (nested) pit ID to each pit cell.

In the 2005-2006 FoxPro program I wrote, one of the objectives was to be able to assign ID labels to each nested pit, so that I could produce a visual display of which pits (by initial ID number) nested within larger pits that subsumed them (Figure 1). In the current FlowMapR program the field that contains information about the pit ID (PitNo) is overwritten each time two pits coalesce to form a new and larger pit. So the final pit ID for each pit is the ID of only the final, and largest, coalesced pit that any grid cell belongs to.


Figure 1. Illustration of nested pits with each unique nested pit assigned a different color

This ability to assign unique integer ID labels to each nested pit becomes even more useful when applied to the nested peaks computed from an inverted DEM. The sequence of nested peaks identifies a hierarchy of nested upland projected above the surrounding plains that coalesce to form ranges or hills and mountains. To my knowledge, no other DEM analysis program is presently able to identify, characterize and label nested pits and nested peaks. This unique capability should be retained and enhanced in FlowMapR.


2.4.4 Make pit removing identical for both normal and inverted DEMs.

The present FlowMapR program only applies stage 1 pit removing to the inverted DEM. It has been suggested (by Thomas Mayr) that it would be useful to be able to remove all pits, within thresholds set by the user, just as is currently done for the normal, non-inverted DEM.


2.4.5 Fix a problem caused by negative elevations (below sea level).

The present FlowMapR programs did not envisage being used in areas below sea level that had negative elevations. The simple fix is to scan every input file and determine if any valid negative values (e.g. not missing values of -9999) occur within the file. Then simply add the amount of the maximum negative value to all grid cells to ensure that the smallest possible value in the DEM is zero (0).


2.4.6 Reinstate an ability to stop pit removing at a user specified threshold.

The present FlowMapR program continues to remove pits until all pits are removed into catchments that drain to the outside world. There are cases where it is not desirable to remove all pits, regardless of their size or depth. The best example would be the Dead Sea, which is a closed depression that does not drain to the outside world. Other examples are Lake Bakal and Death Valley.  Users should be able to set a threshold by volume, depth or mm2flood at which a pit will not be further removed, even if it does not yet drain to the “outside world”.


2.5 Add ability to “burn-in” specific blue line networks

There are many instances where the location of stream and river channels is already known and mapped and it is desirable to be able to force the DEM to conform to these known networks. It would be useful to have a new pre-processing function (probably with a GIS GUI) that would support the import or drawing of linear channel networks that would then be used to lower elevations for all cells along the linear network to force flow to follow those known networks.

There is one specific case where the ability to burn in linear networks would be really useful. This is the case of large flat areas that may be real filled lakes or may just be large areas of flat terrain. If we add an ability to define a single linear route that crosses the lake from one inlet to outlet, and if we burn this notional channel into the source DEM, then we can force creation of a more realistic and useful flow network into the lake, across it into the defined channel and then out of the lake at the outlet point. If we burn in a linear network across all flat areas, and pick one point to be lower than all others on this linear network ( probably the point just at the outlet), then we will have a much easier and faster job of filling the lake and removing it into the next down slope channel location in a correct fashion.


3.0 FormMapR redevelopment

FormMapR is mostly a collection of other people’s algorithms for computing a fairly standard set of terrain derivatives in a 3x3 raster window. The only thing close to unique and special in FormMapR is the suite of measures of absolute and relative slope position that make use of the pit-removed flow directions to flow up and down from each grid cell to specified target cells. These measures of relief are only unique because the make use of the flow directions and associated channels and ridges defined using the FlowMapR module. These are unique because FlowMapR removes pits in a way that retains all local terrain shape and does not flood pits to fill them. So, I believe that this produces a more realistic and useful set of flow directions, particularly across flat areas and lakes, that would otherwise be filled by flooding and would not receive realistic flow directions.

So, the main point of retaining and improving FormMapR is to retain the ability to compute the original FormMapR measures of absolute and relative landform position.

At present, these measures include vertical distance from each cell to the closest channel (Z2Str), ridge (Z2Div), pit (Z2Pit) and peak (Z2Peak) cell, horizontal distance to the same features (L2Str, L2Div, L2Pit, L2Peak) and distance along the defined flow path from each grid cell to the same features (N2Str, N2Div, N2Pit, N2Peak). It is certainly possible to define other target features to which these distances could be computed and stored.

These absolute distances are converted into measures of relative slope position by adding two complimentary distances together and then dividing this total into one of the values totalled. Thus, slope position relative to the nearest channel and divide is computes as PctZ2Str = Z2Str/ (Z2Str+Z2Pit) * 100. A cell close to the defined stream channel will have a low value for relative slope position and a cell located on a ridge will have a value of 100% for PctZ2Str.

Other than these unique measures of slope position, we could elect to just use existing programs and software to compute all other terrain derivatives for use in subsequent predictive mapping and landform classification programs. This is something to discuss and decide upon.


3.1 Add ability to compute terrain attributes at multiple resolutions

Many new terrain analysis programs now offer an ability to compute a broad suite of window based terrain attributes using windows of user-specified dimensions. When FormMapR was written, this multi-scale terrain analysis was not common and FormMapR only implemented calculations of a few common window-based derivatives within a fixed 3x3 window. We should seriously consider either using someone else’s existing programs for computing terrain derivatives at multiple window sizes or else add this capability to an expanded FormMapR.

The easiest way to add a capability to compute terrain derivatives at multiple window sizes would be to use an approach in which a smooth surface is fitted to all data points located within any window of specified size, then compute the terrain derivatives relative to this fitted surface.


3.2 Add new terrain attributes not computed by current FormMapR

There are many new terrain derivatives that have emerged since FormMapR was written. I will start a list of potential new derivatives to compute below and others can add to it or revise it.


3.2.1 Peter Shary’s complete set of curvatures

Peter has published the algorithms for computing a complete set of terrain curvatures and has implemented these in his GISEco software. These represent the best and most scientifically sound measures of a wide range of curvatures. I recommend that we simply implement Peter’s entire suite of calculations as an extension to FormMapR.


3.2.2 John Gallant’s MRVBFI and MRRTFI

The multi-resolution valley bottom flatness index (MRVBFI) and multi-resolution ridge top flatness index (MRRTFI) of John Gallant is now widely used and accepted. We should consider adding these calculations to an expanded FormMapR.


3.2.3 SAGA’s multiple flow direction relative slope position algorithm

SAGA now provides a smoothed measure of relative slope position that averages out the distances to channel and ridge across multiple flow pathways. This smoothed representation of relative slope position seemed to be quite useful


3.2.4 Any of several existing window-based measures of relative slope position

There are quite a few different algorithms for computing relative slope position as a function of the elevation of a cell relative to the elevations of the maximum and minimum elevation cells within a window of fixed dimensions. Others use difference in elevation of the central cell from the mean value of elevation within the window or some other measure. We should research these other measures of relative slope position and add the most common ones to FormMapR.


3.2.5 A new measure of relative slope position based on Quinn Upslope Area

I have experimented in the past with using the output for Log of Diffuse Upslope Area by the Quinn algorithm (LN_Quinn_Up) and Diffuse Down slope area (Ln_Quinn_Down) to compute a new, and smoother, measure of relative slope position. The calculation is simple; it is just Rel_Quinn = Ln_Quinn_Down/(Ln_Quinn_Up+Ln_Quinn_Down) * 100. A maximum value can be specified for Ln_Quinn_Up and Ln_Quinn_Down and all values greater than this can be truncated to this maximum value. This has the effect of defining grid cells with a value greater than the specified maximum as belonging to a channel or divide. So relative slope position becomes relative to defined channels and divides. This measure does not have the abrupt breaks in value along the contour that occur in the current PctZ2St and PctZ2Pit measures.


3.2.6 Other suggestions....?


4.0 FacetMapR redevelopment

FacetMapR is a pretty minimalist implementation of a Fuzzy Logic computation engine. It currently only supports calculation of a single form of Fuzzy Logic estimate, that being a fuzzy weighted average. We should probably look at modernizing and extending the present FacetMapR engine. We also need to remove a couple of bits of hard code that force the input data sets to contain particular input variables, regardless of whether they are used or needed.


4.1 Replace current Fuzzy Calculation with Xun Shi Algorithm.

Xun Shi has published an equation that permits calculation of a wide variety of different measures of Fuzzy Likelihood, including weighted average, Fuzzy Minimum and others. His equations also support definition of a wide variety of forms for the Fuzzy Likelihood graphs, and not just the one tailed and symmetric graphs presently used by FacetMapR. If we were to implement Xun Shi’s equations we could still run an original FacetMapR calculation by selecting the appropriate variables and term values.But we would also extend the present functionality to permit other forms of Fuzzy Logic to be selected and applied.


4.2 Locate and remove bits of hard coding that force inputs

A revised FacetMapR has to remove those bits of code that require specific input variables to be present in the input data sets. This requirement causes FacetMapR to crash if the required variables are not present in the input data set and if they do not have appropriate values. This needs to be fixed.


5.0 WeppMapR redevelopment

WeppMapR is hardly used by anyone these days but the basic idea of extracting defined hydrological entities continues to be useful and is being used in other ways. For example, Thomas Mayr is using some custom FoxPro code to extract and characterize pit sheds, peak sheds and hill sheds.  These hydrological spatial entities can be very useful for delineating and classifying landform patterns. Similarly, several users have asked to have an update whereby flow networks of linked channel cells can be assigned to Strahler and Shrieve channel orders.

I would like to propose that we completely redo WeppMapR to produce a program that extracts and characterizes a variety of hydrological spatial entities. I would call this module HydroMapR.


5.1 Start by extracting linear boundaries of pit sheds & peak sheds.

I have come to believe that the most effective and correct way to delineate a full, single line, complimentary stream and divide network is to define these lines as being the boundaries of an intersection of initially defined pit sheds and peak sheds.

Boundaries of peak sheds identify the most correct locations of all possible stream channel cells. By using this approach, there is no need to be concerned with finding ways to automatically thin drainage lines that are several cells in width, because of parallel flow and groups of cells that all exceed the minimum specified threshold for being a stream channel. In fact, there is no need to even try to identify a threshold value for recognizing stream initiation.
All cells along a peak shed boundary are potential stream line cells. Because islands are individual, stand alone, peaks, their boundaries identify the location of both stream channels that encircle the island. So, unlike with the threshold method where it is only possible to identify one channel where a river flows around an island, with this method we identify both. It would still be possible to recognize two classes of channel cells, those being cells that exceed some minimum threshold value for upslope area count, and are therefore likely to be able to support continuous surface flow and those that do not exceed this threshold and are therefore not likely to support continuous surface flow.

By definition, the boundaries of local pit sheds must define a linear network of divide cells. If we use initial local pit sheds, we identify all possible divide cells. There is no need to rely on local surface shape (convexity) to define divide networks. Divides are defined on solely hydrological attributes as being locations where surface flow goes in opposite directions and the flow paths do not eventually arrive at a common point.


5.2 Use the linear boundaries of pit sheds & peak sheds to define channels and ridges.

As per the above discussion, create a vector network that defines the boundaries of peak sheds and pit sheds. Rasterize this linear vector network to produce raster cell networks one cell in width (no line thinning required). Follow flow directions to link each up slope cell to its closest and lowest down slope neighbor. This creates a directed single line channel network.

Repeat this procedure for the pit shed boundaries to define a linked linear divide network.

Identify cells at the junction of channels and divides as junction cells. These junctions will always occur where peak shed and pit shed boundaries intersect. Label and number junction cells.

Process the single line linear channel networks to identify stream segments (lengths of channel between any two junctions). Process the linked stream channel segments to assign each segment a Strahler and Shrieve order number.


5.3 Use the linear boundaries of pit sheds & peak sheds to define hill sheds.

Hill sheds are defined by the intersection of peak sheds and pit sheds. Each hill shed drains into a single length of linear stream segment. Each stream segment can have hill sheds that drain into it from the left, right or top (initial source hill shed). As in the existing WeppMapR program, we can label each and every hill shed and record its hydrological connectivity (e.g. which stream segment it drains into). Once done, this information can be reprocessed to produce hydrological spatial entities, and heir attributes, as required for widely used hydrological models such as WEPP, SWAT, and so on. We can write utilities to output the data for these hydrological spatial entities in the specific formats required by selected hydrological models.


5.4 Include custom programs for partitioning hill slopes.

Thomas Mayr has been using a custom Fox Pro program that processes each unique hill slope sequentially to detect and label upper and lower slope breaks and so divide each hill slope into upper, mid and lower segments based on recognition of breaks in slope along flow paths from divide to channel. This little utility has proven useful for classifying landform types. It can easily be added to the proposed HydroMapR utility as an extension or option.

This little utility flows up slope from every grid cell to a ridge and then flows down slope until a channel cell is reached. It measures the change in slope between every cell and its up slope and down slope neighbors to identify the locations of maximum change in slope along the flow path. It uses this data to identify one cell in an upper slope position with the maximum convexity and one call in the down slope position with maximum concavity. These become the upper and lower slope breaks and are used to partition each hill slope into upper, mid and lower components. It is possible to identify more slope breaks but we tried this and it got messy.


6.0 Closing thoughts

This has just been my initial list of things that I think would improve the LandMapR toolkit. I guess it is up to others now to review this list and add to it or delete from it as they feel is warranted.  I can help with some of the coding but I am really hoping that I do not have to do any actual code writing. If necessary., I can demonstrate an algorithm using FoxPro code and then let someone smarter than me figure out how to program it efficiently in our programming language of choice.

I have probably left out some things that have previously been identified as useful of necessary. I leave it to you others now to try to identify what I have forgotten to include and to include your own wish list of new or improved features.
 
Reply all
Reply to author
Forward
0 new messages