Error with calibrateClassifier

70 views
Skip to first unread message

ACM

unread,
Jun 8, 2021, 12:55:24 PM6/8/21
to SegOptim user group

Hi João!

I write you back again since we are retaking the land cover classfication of our project but with the rasters masked, they are RGB orthoimages from aerial flights. I spent some days studying what is wrong and I wonder if you could have a quick look at the code to see what is failing. I always get this error:

Error in evalPerformanceClassifier(classifObj) : The input in obj must be an object of class SOptim.Classifier generated by calibrateClassifier with option runFullCalibration = TRUE!

Before that there is a warning message:

In calibrateClassifier(calData = calData, classificationMethod = "RF", ... : The number of train cases by class is below minCasesByClassTest! Unable to do performance evaluation.

The train data is a integer raster with the same extent as the segmented raster. I have minimum 16 train areas for each class and the error is always the same, no matter what I change: 

This is our last step, so I hope we could fix the bug easily, please find the code attached.

Thank you very much in advance!

Álvaro.

error_classification.txt

João F Gonçalves

unread,
Jun 8, 2021, 8:47:07 PM6/8/21
to segoptim-...@googlegroups.com

Hi Álvaro,

Thanks for your feedback. Just to put things into context, here goes a short explanation on what is happening:

SegOptim uses cross-validation to evaluate classifier performance which can be performed in different ways such as 10- or 5-fold CV. This procedure partitions data into train and test sets with 10 or 5 partitions (respectively).

It is also important to note that if you have 16 train areas that does not mean you have that many objects in training. That will depend much on the way segmentation is parametrized to produce large or small segments. It depends also on the spectral and spatial characteristics of each class (which will have different average sizes).

In your case, you have many objects but when 10-fold CV is used (with 90% for training and 10% for testing) some testing partitions have a number of objects below the minCasesByClassTest parameter (which was set to 5). See the test partition example below for your data which has only 4 objects for class 3:

## --- TRAINING ROUND 2 --- ##

.. Frequency table by class for train data:
  1   2   3
106 207  39

.. Frequency table by class for test data:
 1  2  3
10 24  4

This issue will produce a warning but let you finish the training step. One quick-fix for solving this issue is doing 5-fold CV which will increment the test set from 10% to 20%. This can be done by replacing the following lines of code:

classifObj <- calibrateClassifier(calData          = calData,
                                  classificationMethod       = "RF",
                                  balanceTrainData             = FALSE,
                                  balanceMethod                = "ubOver",
                                  evalMethod                       = "5FCV", # <<<---- Changed from 10-fold CV to 5-fold CV
                                  evalMetric                         = "Kappa",
                                  minTrainCases                  = 30,
                                  minCasesByClassTrain     = 10,
                                  minCasesByClassTest       = 5,
                                  runFullCalibration            = TRUE)

By fixing this, the evalPerformanceClassifier() function will run correctly. To calculate overall statistics across all five partitions the following lines have also to be changed:

# Calculate the average and standard deviation of performance measures:
print(apply(evalMatrix[-6,],2,mean))
print(apply(evalMatrix[-6,],2,sd))

This will remove the sixth line in the evaluation matrix which holds performance stats for the full training round (this round uses the entire dataset for training without partitioning it).

Check out the modified script in attachment (don't forget to change the file paths..).


Hope it helps! 🙂 Let me know if this works for you.

Cheers

João

- - -

--
You received this message because you are subscribed to the Google Groups "SegOptim user group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to segoptim-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/segoptim-user-group/f95e8363-dda2-48ca-a6b8-2afcfddeb271n%40googlegroups.com.

error_classification.R

ACM

unread,
Jun 9, 2021, 1:46:37 PM6/9/21
to SegOptim user group
Hi João

That works! But would you suggest to have more training plots or to have them in a bigger size for a better statistical robustness?

Álvaro.

João Gonçalves

unread,
Jun 11, 2021, 7:06:21 AM6/11/21
to ACM, SegOptim user group
Hi Álvaro, 

Yes increasing training areas can definitely help to improve the calibration process. Besides that, keep in mind the representiveness of different classes in terms of spectral and spatial features. Train areas should also approximate the best possible the "natural" shape and size of target objects. 

 Best regards, 
João 
.... 


ACM

unread,
Jun 13, 2021, 9:38:59 AM6/13/21
to SegOptim user group
Hi João  , I have the same problem with the same code but differente images (of the same study area) even if make bigger the training plots o create more and more for each class. What could be missing?

João Gonçalves

unread,
Jun 14, 2021, 7:59:15 AM6/14/21
to SegOptim user group
Hi Álvaro,

Can you please send the console output when you run the calibrateClassifier function? Need to check how many training and testing cases you have there.

This issue may be resolved by changing the segmentation step to produce (slightly) smaller image segments and/or increment the number of training samples.

ACM

unread,
Jun 14, 2021, 9:25:38 AM6/14/21
to SegOptim user group
Hi João,

I run the segmentation step in Ecognition and use the segmentated images to classify with segoptim. It works with 8/10 images I have (from the same study site, a multitemporal RGB images analysis) the remaining 2 have this issue.

I enclose the error file.

Thanks!
error.docx

ACM

unread,
Jun 14, 2021, 1:31:46 PM6/14/21
to SegOptim user group

I also made more training plots and also bigger for each class, so I don't know what is wrong

João F Gonçalves

unread,
Jun 14, 2021, 1:41:16 PM6/14/21
to segoptim-...@googlegroups.com

Hi Álvaro,

Seems like you have too few training segments, in this case smaller than the minimum number which is defined in the minTrainCases value (set to 30).

If increasing the number of samples in the training dataset is not enough, you may try to decrease the minimum number to: minTrainCases = 20  (note: less than 20 is too small to train the classifier )

Modifying segmentation parameters may help also to create smaller segments and have a higher number of training cases for the classification step.


Try it out and give some feedback

Cheers

- - -

Alvaro CM

unread,
Jun 16, 2021, 1:43:46 PM6/16/21
to João F Gonçalves, segoptim-...@googlegroups.com
Hi João,
Even with the minTrainCases = 20 still fails. The segments are really small and I have like more than 100 training plots. I will send you the data by email in case you find something missing!
thanks a lot for your patience!


Virus-free. www.avast.com

--
You received this message because you are subscribed to a topic in the Google Groups "SegOptim user group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/segoptim-user-group/RYrHDLLbwvY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to segoptim-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/segoptim-user-group/20aa24f5-7c88-be36-7bc4-f2139414c727%40gmail.com.

Virus-free. www.avast.com

João Gonçalves

unread,
Jun 16, 2021, 7:49:12 PM6/16/21
to SegOptim user group
Hi,
Álvaro it seems the problem is on the data side more specifically in the input segmented raster which is not properly formated. I corrected it and the classification runs well with average test kappa ~0.7

- First, the segmented raster should be an integer (floating precision is OK but it takes more space);

- Second, image segments must have a unique identifier number (integer) assigned to each one. In your case, the data in clip_segmentation_2016.tif identifies 'only' 200 unique segments when it is clear that the segmented image has much more. In this situation, you need to run a region grouping tool (with four/rook-rule or eight/queen-rule connections) to assign each segment a unique integer ID - check this or this tool to do that. 

Also, check your image segmentation workflow to verify if some step is missing in your pipeline. Because you are running multiple images verify if all segmented rasters are properly formated with unique IDs for each segment.

Will send you an email with the raster grouped data with unique IDs so you can check it.

ACM

unread,
Jun 18, 2021, 4:07:18 PM6/18/21
to SegOptim user group
Hi João,
I was focused on the training data and didn't realize that the segmentated image was float and not integer! I have checked the rest of the images, they have a proper number of objects.
Thanks a lot! I finally was able to classify all the images, hope the support didn't take too much from your time.
will cite the package in our next paper :D
Best regards,
Álvaro.
Reply all
Reply to author
Forward
0 new messages