preparing CalData

88 views
Skip to first unread message

Ricardo Martínez Prentice

unread,
Dec 9, 2020, 4:30:43 AM12/9/20
to SegOptim user group
Dear João,

I am using the preCalc process: 

Cal <- prepareCalData(rstSegm = RSegmentation,
                      trainData =trainingSamp, 
                      rstFeatures = rstClassifFeatures,
                      thresh = 0.4, 
                      funs = c("mean","sd"), 
                      minImgSegm = 150, 
                      verbose = TRUE, 
                      bylayer = TRUE)

where RSegmentation is the variable holding the .tif file with of the segmentation , trainingSamp is the variable with the rasterized training sample polygons and rstClassifFeatures, the variable with the raster variables for classification. 

All of them have this size (columns and rows): 4263, 3916 of 0.12011 x 0.12011 m resolution except  RSegmentation which has a size of 4261, 3915 columns and rows but having the same resolution.

I have the following error:
Error in getTrainData_(x, rstSegm, useThresh, thresh, na.rm, dup.rm, minImgSegm,  : 
  Different size between train and segment rasters for getTrain method!
Warning message:
In prepareCalData(rstSegm = RSegmentation, trainData = trainingSamp,  :
  An error occurred while generating train data! Check segmentation parameter ranges? Perhaps input train data?

I think that RSegmentation is the problem, but I do not know how to fix it because it is the product of the first step using SegOptim Terralib Segmentation. I send a screenshot where the yellow square is one of the training data samples, the black line is the boundary of one segment and background is the RGB raster. All the pixels match the resolution. Could it be another problem ?

Thank you very much,

Regards,


Cap_Segmentation.PNG

João Gonçalves

unread,
Dec 9, 2020, 5:48:55 PM12/9/20
to SegOptim user group
Hi Ricardo,

Thanks for your post.

All the raster inputs needed for running SegOptim must have exactly the same coordinate reference system, number of rows, columns, resolution and, spatial extent. This is needed to overlap labelled examples for each class with segments and create the training dataset. So a small difference in the number of rows is enough to generate this error.

To check if you have these parameters equal across your data you can use the compareRaster() function from the raster package.

You can also use the resample() function (from the same package) to reshape/modify your raster data and "force" them to have the same features above. Use writeRaster() function to save the resampled data.

Hope this helps.

Cheers
João

Ricardo Martínez Prentice

unread,
Dec 14, 2020, 10:42:26 AM12/14/20
to SegOptim user group
Thank you for your help. It worked, and I am using already the prepareCalData() and calibrateClassifier(). 
 I have one question. In order to classify using the Random Forest algorithm,  I chose the evaluation Method as "OOB" but then I get the following error: 
Error in unique(testDF$train) : object 'testDF' not found

I am not really sure what I am doing wrong. I have a training dataset (transects made on site, converted to polygons and then rasterized, ). This gives me the pure segments with train categories (multi-class).  If I chose Out Of Bag in evalMethod argument , does the model generate the 36.8% of training samples as validation so I don't need to set a minCasesByClassTest

Thank you very much for your help. 

Regards, 

João Gonçalves

unread,
Dec 14, 2020, 11:39:39 AM12/14/20
to SegOptim user group
Hello,

Thanks for your feedback. It seems that the reported error is from the package side :-( I will try to correct it asap

Meanwhile, please make sure you are using the latest version of the package. For that, please reinstall it using the following line of code:
remotes::install_github("joaofgoncalves/SegOptim", ref="experimental")

The "experimental" branch (in ref) includes recent modifications to the package, bug corrections and new/experimental features.

As for running Random Forest cross-validation I do not advise to rely on OOB. You will do much better using other options such as:
evalMethod == "10FCV" which will apply 10-fold cross-validation (i.e. 10 sequential data partitions with 90% for training and 10% for testing).
Evaluation metrics are calculated for the test fraction.

As for the minCasesByClassTest parameter, this is defined as the: "Minimum number of cases by class for each test data split so that the classifier is able to run".
This is a baseline minimum number of test cases per class to guarantee that the confusion matrix calculated for the test data is valid and meaningful across the different classes. However, you may reduce this value sligtlhy if needed (default: 10) but you cannot 'ignore' it for running the classification stage. Even better is to increase the number of training samples for problematic classes with lower sizes ;-)
Also, slightly modifying the segmented image to produce smaller segments will increase your training dataset size but, (sometimes) at the expense of decreasing segment usefulness/relevance.
Iit's clearly a trade-off situation that deserves some reflection depending on your specific case.

Please try these steps and let me know how it goes. Hope it helps!! :-)

Cheers
João
- - -

Ricardo Martínez Prentice

unread,
Dec 15, 2020, 3:20:51 AM12/15/20
to SegOptim user group
Thank you very much!

I am calling the experimental branch and using 10FCV in evalMethod  argument and it is working. Nevertheless, I have an error which I have not before , coming from evalPerformanceClassifier() and  predictSegments().

Error in evalPerformanceClassifier(classifObj) : 
  The input in obj must be an object of class SOptim.Classifier generated 
         by calibrateClassifier with option runFullCalibration = TRUE! 

Error in predictSegments(classifierObj = classifObj, calData = calData,  : 
  classifierObj must be an object of class SOptim.Classifier generated by 
         calibrateClassifier function with option runFullCalibration = TRUE

the value of runFullCalibration  argument is TRUE in calibrateClassifier() function. 

Could be due to my input data? 

Regards,

gülnihal kurt

unread,
Oct 17, 2023, 6:16:33 AM10/17/23
to SegOptim user group
Dear  Ricardo  and João,
It's been too long when this conversation happenned but now I am strating to work with usefull tool to perform OBIA based crop classification. I double-checked 
 my input data and nevertheless I'm still getting this error.  

"> evalMatrix <- evalPerformanceClassifier(classifObj)
Error in evalPerformanceClassifier(classifObj) : The input in obj must be an object of class SOptim.Classifier generated  
by calibrateClassifier with option runFullCalibration = TRUE!"


And here is my full code:



#--------------------------------------------------------------------------------------------------------------

"#1. Use image3 as referenceRaster
referenceRaster <- raster("C:/Users/gulni/img3.tif")

#2. Load or create other raster objects
rstTrainData <- raster("C:/Users/gulni/train8.tif")
rstSeg <- raster("C:/Users/gulni/00_segmentasyon/aso_1.tif")
image3 <- raster("C:/Users/gulni/img3.tif")
image4 <- raster("C:/Users/gulni/S2_06_NDVI.img")
image5 <- raster("C:/Users/gulni/S2_07_NDVI.img")
image6 <- raster("C:/Users/gulni/S2_08_NDVI.img")
image7 <- raster("C:/Users/gulni/S2_09_NDVI.img")

#3. Project other raster objects
rstTrainData_proj <- projectRaster(rstTrainData, referenceRaster)
rstSeg_proj <- projectRaster(rstSeg, referenceRaster)
image3_proj <- projectRaster(image3, referenceRaster)
image4_proj <- projectRaster(image4, referenceRaster)
image5_proj <- projectRaster(image5, referenceRaster)
image6_proj <- projectRaster(image6, referenceRaster)
image7_proj <- projectRaster(image7, referenceRaster)

classfeat <- stack(image7_proj,image6_proj,image5_proj,image4_proj,image3_proj)


plot(rstTrainData_proj)


# Prepare data before classification
# This will populate each segment with statistics of each one of the layers in rstClassifFeatures ------
#
?prepareCalData

calDataPrep <- prepareCalData( rstSegm = rstSeg_proj,
                                trainData = rstTrainData_proj,
                                rstFeatures = classfeat,
                                thresh = 0.5,
                                funs = "mean",
                                minImgSegm = 1,
                                bylayer = TRUE,
                                tiles = NULL,
                                verbose = TRUE,
                                progressBar = FALSE
                                )

# Check the generated datasets
head(calDataPrep$calData)
head(calDataPrep$classifFeatData)
unique(calDataPrep$calData$train)



evalPerf <- evalPerformanceClassifier(calDataPrep)
print(evalPerf)



# Run the final classification --------------------------------------------- ----------------------------


?calibrateClassifier

classifObj <- calibrateClassifier(calDataPrep,
   classificationMethod = "RF",
   classificationMethodParams = NULL,
   balanceTrainData = FALSE,
   balanceMethod = "ubOver",
   evalMethod = "5FCV",
   evalMetric = "Kappa",
   minTrainCases = 30,
   minCasesByClassTrain = 10,
   minCasesByClassTest = 5,
   runFullCalibration = TRUE,
   verbose = TRUE
)


warnings()



# Get more evaluation measures
evalMatrix <- evalPerformanceClassifier(classifObj)
print(round(evalMatrix,2))
# Finally, predict the class label for the entire image (i.e., outside the training set)
# and also save the classified image
rstPredSegmRF <- predictSegments(classifierObj = RFclassif,
                                  calData = calData,
                                  rstSegm = segmRst,
                                  predictFor = "all",
                                  filename = outClassRst.path)
print(rstPredSegmRF)"
#--------------------------------------------------------------------------------------------------------------


I wonder if you can find a solution or different approach for this problem.
Thank you in advance,
Cheers,
Gulnihal KURT
PhD Researcher
Cukurova University
Adana, Turkey






15 Aralık 2020 Salı tarihinde saat 11:20:51 UTC+3 itibarıyla martinez...@gmail.com şunları yazdı:

João F Gonçalves

unread,
Oct 17, 2023, 6:54:43 AM10/17/23
to segoptim-...@googlegroups.com

Hi Gülnihal,

Thanks for the feedback. Regarding your code, the performance evaluation will not work until you have trained the Random Forest classifier, so it should throw an error here:

evalPerf <- evalPerformanceClassifier(calDataPrep)
print(evalPerf)


However, performance evaluation should work after using the calibrateClassifier() function with the classifObj object, here:

# Get more evaluation measures
evalMatrix <- evalPerformanceClassifier(classifObj)


To sort out this issue two things are required:

1) Confirm that calibrateClassifier ran correctly and provide a print of the classifObj object;

2) Provide a reproducible example with code and data so I can check what is happening and attempt to reproduce the error in my computer.


Cheers,

João

- - -
--
You received this message because you are subscribed to the Google Groups "SegOptim user group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to segoptim-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/segoptim-user-group/7707dad6-7e99-45a2-994a-6ab741dc4d2dn%40googlegroups.com.

gülnihal kurt

unread,
Oct 25, 2023, 3:42:52 PM10/25/23
to SegOptim user group
Dear  João,
 I have identified the underlying cause of the error mentioned above, and I wanted to share it to assist everyone. I had previously saved the segmentation data, which is the output of SegOptim that I wanted to classify, as .tif in a local folder. And I was calling it from there during the classification process. When I included the segmentation output in the classification function within the same R working environment, my issue was resolved. Thank you very much...
Cheers,
Gülnihal

17 Ekim 2023 Salı tarihinde saat 13:54:43 UTC+3 itibarıyla SegOptim user group şunları yazdı:
Reply all
Reply to author
Forward
0 new messages