Solutions to many issues using SegOptim

135 views
Skip to first unread message

ACM

unread,
Oct 19, 2020, 12:14:05 PM10/19/20
to SegOptim user group
Hi everybody,

Here I share the e-mail conversation I had with João. Hopefully you'll find it useful for your researches as it was for mine.

 

Dear João Gonçalves,

I am trying to use the package SegOptim to segment and classify an orthoimage (not multispectral). When I run segmentation_OTB_LSMS, no files were created. I have checked all the paths, includind the OTB bin folder. It seems that they were OK and I do not know what could be wrong. Below is the code I ran.


The code:
 library(raster)
  library(randomForest)
  library(SegOptim)
  setwd("C:/XXXXXX")
  inputimage.path <- "./01 inputs/Ortofotos/1051/2019/2019_1m.tif"
  trainData.path <- "./01 inputs/raster/TTA_MASK_arbol_matorral_suelo_Polygon11.tif"
  classificationFeatures.path <- "./01 inputs/Ortofotos/1051/2019/2019_1m.tif"
  otbPath <- "C:/OTB/bin"
  # Check if the files and folders exist!
  if(!file.exists(inputimage.path))
    print("Could not find data for image segmentation!")
  if(!file.exists(trainData.path))
    print("Could not find training data!")
  if(!(file.exists(classificationFeatures.path)))
    print("Could not find data for classification!")
  if(!dir.exists(otbPath))
    print("Could not find Orfeo Toolbox binaries!")
  ## Output file from OTB image segmentation     
  outSegmRst.path <- "./02 outputs/R-analysis/segmRaster.tif"
  # The final output file containing the distribution of the target species
  outClassRst.path <- "./02 outputs/R-analysis/segmRaster2.tif"
  ## Run the segmentation
  outSegmRst <- segmentation_OTB_LSMS(
    # Input raster with features/bands to segment
    inputRstPath = inputimage.path,
    # Output
    outputSegmRst = outSegmRst.path,
    verbose = TRUE,
    otbBinPath = otbPath,
    lsms_maxiter = 50,
     # Algorithm params
    SpectralRange = 3.1,
    SpatialRange = 4.5,
    MinSize = 21)
  # Check the file paths with outputs
  print(outSegmRst)
  # Load the segmented raster and plot it
  segmRst <- raster(outSegmRst$segm)
  plot(segmRst)

And here is the error:


print(outSegmRst) $FilteredRange
[1] "./02 outputs/R-analysis/otb_filt_range_7d4lq4.tif"
$FilteredSpatial
[1] "./02 outputs/R-analysis/otb_filt_spatial_7d4lq4.tif"
$Segmentation
[1] "./02 outputs/R-analysis/otb_segm_init_7d4lq4.tif"
$SegmentationFinal
[1] "./02 outputs/R-analysis/"
$segm
[1] "./02 outputs/R-analysis/segmRaster.tif"
attr(,"class")
[1] "SOptim.SegmentationResult" # Load the segmented raster and plot it
segmRst <- raster(outSegmRst$segm) Error in .local(.Object, ...) :
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer",  :
  Cannot create a RasterLayer object from this file. (file does not exist) plot(segmRst) Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'plot': object 'segmRst' not found

Looking forward to hearing from you soon,

 

ACM.

---------------------

Dear ACM,

Thanks for using SegOptim! I am really glad that you find it useful :-)

Seems that you were able to correctly run the segmentation with OTB_LSMS. The output object that you called "outSegmRst" only holds metadata from the segmentation stage and not the data itself. So when you read the segmented raster in segmRst <- raster(outSegmRst$segm) it seems that is nothing there...

In this scenario, please check if there is a file called segmRaster.tif with your segmented image in the output folder: "./02 outputs/R-analysis/" which is a subfolder of: "C:/xxxxxxx/xxxxx". If the file is there, then the error is simply in the output path that the raster() function is unable to use and correctly load the file. Correct it and try again to load the image using segmRst <- raster("path_to_file/segm.tif"). If the file is not there then the issue is other which I am not sure... but please let me know if you manage to find the segmented image file.

Also, usually it is better to use smaller paths without latin  and avoid to use blank spaces (also in the previous example).  Or "01 inputs" to "01_inputs". Try it and let me know how this works out.

Cheers

-------------
Dear João,

There was no file in the folder, it was empty. Fortunately I followed your suggestion and I changed the working path. It worked!

Thanks a lot for your answer! I go on with the process.

Best regards,

ACM.

-----------------

Hi João,

I was able to run the segmentation, eventually. But now I need to take a segmentated raster (Black and white orthoimage) resulted from an Ecognition software process.

When I run prepareCalData I used the segmentated image for both rstSegm and rstFeatures (otherwise the error "Different rasters defined in rstFeatures and rstSegm!" appears).

I was able to run prepareCalData with the three steps (Loading train data, Calculating feature statistics, Merging train and feature data).

The problem was when I tried to run Random Forest. Is the segmented image the problem? what can I do from Ecognition to make the result suitable for a RF classification with SegOptim? I am wondering what I am doing wrong.

 

Thank you in advance.

 

Below is the code and after the error messages (pasted):

 

library(raster)
library(randomForest)
library(SegOptim)
setwd("D:/XXXXX")
inputimage.path <- "./input/2019_1m.tif"
trainData.path <- "./input/trainplots.tif"
classificationFeatures.path <- "./input/2019_1m.tif"
otbPath <- "C:/OTB/bin"
segimage.path <- "./output/segmentation.tif"
# Check if the files and folders exist!
if(!file.exists(inputimage.path))
print("Could not find data for image segmentation!")
if(!file.exists(trainData.path))
print("Could not find training data!")
if(!(file.exists(classificationFeatures.path)))
print("Could not find data for classification!")
if(!dir.exists(otbPath))
print("Could not find Orfeo Toolbox binaries!")

# Convert to raster
segimage <- raster(segimage.path)
# Train data
trainDataRst <- raster(trainData.path)
# Classification features
classificationFeatures <- raster(classificationFeatures.path)
#In this step we will assemble the training data required to the supervised classification. This will import training data into each image segment (via a threshold rule). In this case, segments that are covered in 50% or more by the training pixels will be considered as valid cases either for 0's (non-invaded) or 1's (invaded).
CalData <- prepareCalData(rstSegm = segimage,
                          trainData = trainDataRst,
                          rstFeatures = segimage,
                          thresh = 0.5,
                          funs = "mean",
                          minImgSegm = 30,
                          verbose = TRUE)

# Calibrate/evaluate the RF classifier
classifObj <- calibrateClassifier( calData                    = CalData,
                                   classificationMethod       = "RF",
                                   balanceTrainData           = FALSE,
                                   balanceMethod              = "ubOver",
                                   evalMethod                 = "10FCV",
                                   evalMetric                 = "Kappa",
                                   minTrainCases              = 30,
                                   minCasesByClassTrain       = 10,
                                   minCasesByClassTest        = 5,
                                   runFullCalibration         = TRUE)

# Get more evaluation measures
evalMatrix <- evalPerformanceClassifier(classifObj)

print(round(evalMatrix,2))

# Finally, predict the class label for the entire image (i.e., outside the training set)
# and also save the classified image
rstPredSegmRF <- predictSegments(classifierObj = classifObj,
                                 calData = calData,
                                 rstSegm = segmRst,
                                 predictFor = "all",
                                 filename = outClassRst.path)
print(rstPredSegmRF)

 

And the output is:

 

## --- TRAINING ROUND 1 --- ## .. Frequency table by class for train data: 1 2 3 704 250 615 .. Frequency table by class for test data: 1 2 3 77 29 68 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero Warning message: In calibrateClassifier(calData = CalData, classificationMethod = "RF", : An error occured while performing RF classifier training!

--------------------

Hi ACM,

I am not completely sure what happened but, when looking to the prepareCalData() function I noticed that you are not providing a suitable input in the rstFeatures parameter (instead you just inserted again the segmented image: rstFeatures = segimage).

Usually this parameter uses a multiband raster or raster stack containing features that will be used to calculate segment statistics and then used for training the classifier. In general, multispectral bands, spectral/biophysical indices, texture and/or or elevation features (among other) are used for this purpose. This will allow the classifier algorithm to learn spectral (or thematic) differences between your classes (using function: calibrateClassifier) and then predict them across the whole image (function: predictSegments).

According to your code (if spectral/thematic features for training are indeed in the "./input/2019_1m.tif" image) it should be something like:

rstFeatures = classificationFeatures.path  OR

rstFeatures = stack(classificationFeatures.path)

In the predictSegments there are also a couple of typos:

rstPredSegmRF <- predictSegments(classifierObj = classifObj,
                                 calData = CalData, # <-- correct this
                                 rstSegm = segimage, # <-- correct this
                                 predictFor = "all",
                                 filename = outClassRst.path) # <-- correct this

Also, correct the outClassRst.path variable because it does not exist. You may define it as (example):

outClassRst.path <- "./output/segments_classified.tif"

Hope this helps!

Cheers

João

--------------------

Hi João,

I made all the changes you pointed and prepareCalData worked fine (I just needed to make the raster layers to have the same extention and pixel size). But when I tried to calibrate, the error persists.

library(raster)
library(randomForest)
library(SegOptim)
setwd("D:/XXXXX")
inputimage.path <- "./input/2019_1m.tif"
trainData.path <- "./input/trainplots.tif"
classificationFeatures.path <- "./input/2019_1m.tif"
otbPath <- "C:/OTB/bin"
segimage.path <- "./output/segmentation2.tif"
# Check if the files and folders exist!
if(!file.exists(inputimage.path))
  print("Could not find data for image segmentation!")
if(!file.exists(trainData.path))
  print("Could not find training data!")
if(!(file.exists(classificationFeatures.path)))
  print("Could not find data for classification!")
if(!dir.exists(otbPath))
  print("Could not find Orfeo Toolbox binaries!")
# The final output file containing the distribution of the target species
outClassRst.path <- "./output/classified.tif"

# Convert to raster
segimage <- raster(segimage.path)
# Train data
trainDataRst <- raster(trainData.path)
# Classification features
classificationFeatures <- raster(classificationFeatures.path)
#In this step we will assemble the training data required to the supervised classification. This will import training data into each image segment (via a threshold rule). In this case, segments that are covered in 50% or more by the training pixels will be considered as valid cases either for 0's (non-invaded) or 1's (invaded).
calData <- prepareCalData(rstSegm = segimage,
                          trainData = trainDataRst,
                          rstFeatures = stack(classificationFeatures.path),
                          thresh = 0.5,
                          funs = "mean",
                          minImgSegm = 30,
                          verbose = TRUE)

# Calibrate/evaluate the RF classifier
classifObj <- calibrateClassifier( calData                    = calData,
                                   classificationMethod       = "RF",
                                   balanceTrainData           = FALSE,
                                   balanceMethod              = "ubOver",
                                   evalMethod                 = "10FCV",
                                   evalMetric                 = "Kappa",
                                   minTrainCases              = 30,
                                   minCasesByClassTrain       = 10,
                                   minCasesByClassTest        = 5,
                                   runFullCalibration         = TRUE)

# Get more evaluation measures
evalMatrix <- evalPerformanceClassifier(classifObj)

print(round(evalMatrix,2))

# Finally, predict the class label for the entire image (i.e., outside the training set)
# and also save the classified image
rstPredSegmRF <- predictSegments(classifierObj = classifObj,
                                 calData = calData,
                                 rstSegm = segimage,
                                 predictFor = "all",
                                 filename = outClassRst.path)
print(rstPredSegmRF)

 

 

And the output is:

 


## --- TRAINING ROUND 1 --- ## .. Frequency table by class for train data: 1 2 3 703 251 615 .. Frequency table by class for test data: 1 2 3 78 28 68 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero Warning message: In calibrateClassifier(calData = calData, classificationMethod = "RF", : An error occured while performing RF classifier training!
----------------------

Hi ACM,

After checking your data I noticed that you are using a single variable/feature for classification because the 2019_1m.tif file has a single band which reflects the amount of brightness in the image (or the green channel/band)? Is that really what you want?

If yes, doing supervised classification with a single feature is maybe too much (?)... It would be better/simpler to segregate classes based on an univariate partitioning algorithm such as Jenks or similar. In that case you would be able to create three classes based on the amount of soil-to-shrubs ratio. Check the results in attachment using Jenks for creating three discrete classes based on the average value of 2019_1m.tif per segment.

Also, I suppose that because you are using a single variable/feature in supervised classification that is the reason why some algorithms are not working properly in SegOptim. In this case you could add other features such as other bands, ratios between bands or spectral indices(?). I don't think per-pixel texture operators will improve much because segmentation and segment statistics are already working in that way.

Still, if you want to use a supervised methodology you can expand on the type of segment statistics that you are using to create more and better features. For example you can add the standard-deviation and/or add custom functions to calculate the 25 and 75% quartiles per segment by changing the line below. This will be similar to calculate local statistics as texture operators but we will do that by segment and not per pixel as usually done (all changes in the new script attached). If you still wnat to use textures check Orfeo Toolbox for that: here for Local stats, here for SFS texture operators and here for Haralick textures.


q25 <- function(x,...) as.numeric(quantile(x,probs=0.25,na.rm=TRUE,...))
q75 <- function(x,...) as.numeric(quantile(x,probs=0.75,na.rm=TRUE,...))

calData <- prepareCalData(rstSegm = segimage,
                          trainData = trainDataRst,
                          rstFeatures = stack(classificationFeatures.path),
                          thresh = 0.5,
                          funs = c("mean","sd","q25","q75"),
                          minImgSegm = 30,
                          verbose = TRUE)

Another advice is to use the segment IDs layer formatted as integer values. Will save space and problems! :-) Check the attachment.

Meanwhile I also noticed that there are differences between your images which results in errors, namely the segmented image has different pixel size:

-> [1/3] Loading train data into image segments...
Error in getTrainData_(x = x, rstSegm = rstSegm, useThresh = useThresh,  :
  Differences between train and segment rasters in getTrainData method!

> trainDataRst
class      : RasterLayer
dimensions : 2968, 3880, 11515840  (nrow, ncol, ncell)
resolution : 1.000064, 1.000064  (x, y)
extent     : 322486.2, 326366.5, 4063442, 4066410  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=30 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
source     : C:/Users/JG/Desktop/input/trainplots.tif
names      : trainplots

> segimage
class      : RasterLayer
dimensions : 2968, 3880, 11515840  (nrow, ncol, ncell)
resolution : 1.000064, 0.9999158  (x, y)
extent     : 322486.2, 326366.5, 4063442, 4066410  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=30 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
source     : C:/Users/JG/Desktop/input/segmentation2.tif
names      : segmentation2

The stack(classificationFeatures.path) has the same problem as well:

class      : RasterStack
dimensions : 2968, 3880, 11515840, 1  (nrow, ncol, ncell, nlayers)
resolution : 1.000064, 0.9999158  (x, y)
extent     : 322486.2, 326366.5, 4063442, 4066410  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=30 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
names      : layer
min values :     0
max values :   255

 

You can correct this by reprojecting the input image with:

segimage_reproj <- projectRaster(from = segimage,
                                 to = trainDataRst)

classificationFeatures_reproj <- projectRaster(from = classificationFeatures,
                                               to = trainDataRst)

And then use it later for calculations prepareCalData() - also check new script in attachment (don't forget to correct the setwd() line to your own working directory in the script!!).

Using these modifications I was able to run the random forest algorithm with good accuracy

After comparing the results from Random Forest classification and Jenks univariate algorithm these were quite similar!! :-D But anyway, overall the RF seems quite OK! :-)

 

About your questions:

"1- Inspecting the SVM confusion matrix I found this: (...)"

This is not the confusion matrix! This is a simply a table containing the evaluation metrics (kappa, accuracy,...) by test set and a final round including all train data without a test/train split (i.e. "FULL"). You can check the final confusion matrix with the following line of code:

print(classifObj$ConfMat$FULL)

         Observed
Predicted      1   2   3
        1       630  69 129
        2         61 204   8
        3         95  9 549

This 10 testsets are the validation areas from the training layer? I guess the rest of the areas in the layer are chosen as training.

Yes, in 10-fold cross-validation 90% of the data are used for training the random forest classifier and 10% are used for testing. Then you have 10 test/train split datasets. Each set that you see in the evalMatrix object is a 10% fraction used to evaluate the classifier.

How I can now how accurrate is the algorithm predicting class 1, 2 and 3?

You can calculate the average (and standard-deviation) of the Kappa, accuracy, GSS and PSS metrics in the evalMetrics to obtain the overall performance of the classification as:

> print(apply(evalMatrix[-11,],2,mean)) Kappa PSS GSS Accuracy 0.6527280 0.6511529 0.6454318 0.7856494 > print(apply(evalMatrix[-11,],2,sd)) Kappa PSS GSS Accuracy 0.04167467 0.04221714 0.04580271 0.02834638

What are the Peirce and Gerrity skill scores?

You can read about these indices of performance/verification based on the confusion matrix here: https://www.cawcr.gov.au/projects/verification/

2- When I tried to extract textures from the orthoimage with Ecognition, the software crashed. I read in some forums that working with textures is a long time process. Can you suggest any tutorial to extract textures with R? I would like just explore this option.

Don't advise to use complex textures per-pixel as explained before. It is better to use local statistics per-segment as the ones above including the standard deviation and the quantiles. Otherwise check OTB.

Will wait for our feedback to check if I am getting this right. Also, if you found this useful, please post the solution to google groups. Thanks in advance for that! :-)

Cheers

João

- - -

Reply all
Reply to author
Forward
0 new messages