Error: Long vectors not supported yet

Lesley Bulluck

unread,

Mar 30, 2020, 2:03:19 PM3/30/20

to SegOptim user group

Hello,

I am excited to find your package for doing object-based image classification in R. I have been doing pixel based classification using the randomForests package, but think this object-based approach will increase my accuracy.

I am having trouble with the prepareCalData step that I think may be related to my large raster sizes. Here are teh details for my training data raster:

class : RasterLayer

dimensions : 42537, 51180, 2177043660 (nrow, ncol, ncell)

resolution : 1, 1 (x, y)

extent : 425471, 476651, 4054485, 4097022 (xmin, xmax, ymin, ymax)

When I run the following code:

calData <- prepareCalData(rstSegm = segRst,

trainData = trainDataRst,

rstFeatures = classificationFeatures,

thresh = 0.5,

funs = "mean",

minImgSegm = 30,

verbose = TRUE)

I get the following error message that I am unsure how to deal with:

-> [1/3] Loading train data into image segments...

Error in rgdal::getRasterData(con, offset = offs, region.dim = reg, band = object@data@band) :

long vectors not supported yet: memory.c:3717

Warning message:

In prepareCalData(rstSegm = segRst, trainData = trainDataRst, rstFeatures = classificationFeatures, :

An error occurred while generating train data! Check segmentation parameter ranges? Perhaps input train data?

Any recommendations you have are appreciated.

Thank you.

João Gonçalves

unread,

Mar 31, 2020, 12:57:40 PM3/31/20

to SegOptim user group

Hi Lesley,

Thanks for using SegOptim 👍 Seems that you bump into a limitation of R (and the package) that we haven't yet managed to solve. What is happening relates to the maximum size of a vector in your R build which is equal to: 2^31-1 = ‭2 147 483 647‬ values and your raster size is much larger with 2 177 043 660 pixels.

Two possible solutions (that may not be feasible depending on your specific case):

(i) upsample your raster data from one meter to a (slightly) larger value;

(ii) slice your data into tiles and process them separately (this also means processing them independently...).

Also, even if you managed to allocate larger vectors it is likely that you may not have enough memory to load the entire images into R and then calculate segment statistics but that depends on your system. If you do have a lot of RAM in your machine you can also try the following:

1) Install from source the latest version of the raster package (see it here: https://github.com/rspatial/raster with install instructions) because it seems that later builds may have this problem solved - check it here: https://github.com/rspatial/raster/issues/33. Updating base R may also help to sort out this issue;

2) In SegOptim, use the option bylayer=TRUE in the function prepareCalData() which will load each image layer separately to calculate segment statistics and use much less memory;

Meanwhile I will see if it is possible to implement an 'internal' tiling system for prepareCalData() but that will take a while to develop...

Hope this helps. Let me know how it works out.

Cheers

João

- - -

Lesley Bulluck

unread,

Mar 31, 2020, 6:13:36 PM3/31/20

to SegOptim user group

Thank you for these suggestions. I installed the latest version of R, updated the raster package and added the bylayer = TRUE option in the prepareCalData() function, but still got the same error.

I can increase resolution to 2 or 3 meters, but really hate to do that unless absolutely necessary. It is interesting that I have been able to successfully carry out pixel-based randomForest classification with these same rasters, and never got this error.

Thanks again. I will continue to let you know if I figure out a solution.

Lesley

João Gonçalves

unread,

Apr 1, 2020, 1:39:03 PM4/1/20

to SegOptim user group

Hi,

Thanks for testing out these solutions.It's rather unfortunate but this is a inherent limitation of R that is difficult to overcome.

In case of pixel-based classifiers it's rather easy because we can directly extract values of labelled pixels and use those to train the classification algorithm. However, in OBIA, we have to calculate segment statistics first. The thing with this step is that we don't know a priori which pixels compose a given segment and the simplest way to do this is to load all pixels and then perform segment-wise aggregation using the mean, std-deviation (or any other stats). Even if we do a operations by tiles we end up having the same border segments in multiple tiles which causes difficulties to quantify/estimate some statistics which can potentially bias the final result... although an approximation can be attempted. In contrast, for pixel-based classifiers it is very easy to perform per block/tile operations.

Let me know if you manage to solve this. I will try to implement a solution but (to be completely honest) that can take a couple of weeks before being released.

Cheers

João

Lesley Bulluck

unread,

May 18, 2020, 9:16:45 AM5/18/20

to SegOptim user group

I wanted to post an update on this, and get your feedback.

I increased the resolution of my rasters to 2m (up from 1m). Here is the summary information for my Region group raster where each value is a unique ID for each image segment.

> segRst

class : RasterLayer

dimensions : 21269, 25590, 544273710 (nrow, ncol, ncell)

resolution : 2, 2 (x, y)

extent : 425471, 476651, 4054484, 4097022 (xmin, xmax, ymin, ymax)

crs : +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0

source : /home/lpbulluck/SmythObject/RegionGroup_VBMP2m.tif

names : RegionGroup_VBMP2m

values : 1, 5006705 (min, max)

I am now able to get through the first step in the prepareCalData() function but get the following error on step 2:

-> [2/3] Calculating feature statistics for all image segments...

Error in calculateSegmentStats(rstFeatures = rstFeatures, rstSegm = rstSegm, :

Not enough memory to process the input raster files when using option bylayer!

Warning message:

In prepareCalData(rstSegm = segRst, trainData = trainDataRst, rstFeatures = classificationFeatures, :

An error occurred while calculating segmentation statistics for feature data!

I have even tried running this on a Linux server through my university and am getting the same error.

Is there any way you can estimate how much RAM this should take so that I can tell the Linux server host what I might need? Or am I still running into a ceiling in what R can handle?

Thanks.

Lesley

João Gonçalves

unread,

May 19, 2020, 7:30:09 PM5/19/20

to SegOptim user group

Hi Lesley,

Thanks for the feedback. By resampling the imagery, seems that you managed to overcome the previous R issue related to the maximum vector size since a 4-fold decrease in the number of pixels enabled this. What is happening now is different and relates to the amount of RAM needed to calculate segment-wise statistics.

To answer your question, the amount of RAM space needed really depends on how you have your data formatted (especially in terms of bit-depth; check function dataType for details). Put simply if you have your data stored as integers you will need much less space than if you have decimal numbers. One solution, to minimize used memory is to check/convert all your inputs (segmented raster and raster features) to integers in a meaningful manner (e.g. applying a multiplication conversion factor and then converting to integer by rounding; this often occurs in sat imagery where reflectance is available in integer rasters with range: 0-10000).

Meanwhile I tried to simulate some data in R to assess the amount of needed RAM space - check code below:

# Integer matrix with 24-bits -------
x <- matrix(sample(1:16777216, size= 21269*25590, replace = TRUE), nrow = 21269, ncol = 25590)

print(object.size(x),units="Mb")
rm(x)

# 2076.2 Mb

# Float/decimal -------
x <- matrix(rnorm(21269*25590), nrow = 21269, ncol=25590)

print(object.size(x),units="Mb")
rm(x)
# 4152.5 Mb

Turns out that for a raster like yours, with ~544 million pixel values, you would need roughly 2.1 Gb (if integer like) or 4.2 Gb (if decimal). When calculating segment statistics by layer you may need (best case scenario) roughly 8 Gb. Although R data.table package is working by reference (to minimize object copies) sometimes R does that behind the scenes so it would be better/safer if you could work in a machine with 16 or even 32 Gb of allocatable memory. Recent workstations or high-performance computers currently provide this amount of memory so I hope you can have access to one to process your data.

Hope this helps.

Lesley Bulluck

unread,

May 20, 2020, 5:20:41 PM5/20/20

to SegOptim user group

Thanks so much for your reply.

I do have access to a high performance computing cluster so hopefully I will be able to run these here soon!

I checked my dataType for each input raster and found that most are integers (INT4S, INT1U, INT2U) but my NDVI raster is decimal (FLT4S).

It is good to know that I can convert this to an integer to use less RAM.

Is it ok that the integer rasters are of different types or should I convert them to be the same?

Thanks again,

Lesley

João Gonçalves

unread,

May 20, 2020, 7:43:51 PM5/20/20

to SegOptim user group

Hi Lesley,

OK, great to know that you have more computational resources available.

You don't need to use exactly the same integer data type (that may not be even possible). Simply converting from decimal/float to integer will greatly decrease the size of data in memory.

To convert your NDVI layer, you can use the following example code in R (needs a couple of modifications to work with your own data, check the arrows):


library(raster)

ndvi <- raster("C:/MyFiles/my_ndvi.tif") # <-- Point the raster function to your own NDVI layer

print(object.size(values(ndvi)), units = "Mb") # Check data size / comment this line if it throws an out-of-memory error

# Convert data as integer: INT2S = [-32767, 32767]
ndviInt <- round(ndvi * 10000)

# Write a new NDVI raster file as integer type
writeRaster(ndviInt, "C:/MyFiles/ndviInt.tif", datatype="INT2S") # <-- Change the output path here

# Reload the data now as integer
ndviInt <- raster("C:/MyFiles/ndviInt.tif") # <-- Change the inpput path here to the new ndvi raster file

print(object.size(values(ndviInt)), units = "Mb") # Check new size / comment this line if it throws an out-of-memory error

All the best.

Lesley Bulluck

unread,

May 22, 2020, 7:34:53 AM5/22/20

to SegOptim user group

Hello again,

I ran the SegOptim code with only 4 classification rasters on a cluster computer, by itself, with 64 cpu cores and 128GB of ram.

It consumed about 40GB of ram before it errored on step 2 of the prepareCalData function: Not enough memory to process the input raster files when using option bylayer!

I am thinking there may be something wrong with my input data?

I did the mean shift segmentation in ArcGISPro followed by the regionGroup function in ArcGISPro to give each segment a unique ID. This output is my segRst raster. The output is similar to the one used in the tutorial.

I am putting the summary and dataType information for each raster below to show that they are all the same extent, crs, and dimension. I have not converted the one floating point raster to integer (I will do that today), but based on your response, I should have enough computing power to handle that.

Any thoughts on why this might be happening? Thank you.

> segRst

class : RasterLayer

dimensions : 21269, 25590, 544273710 (nrow, ncol, ncell)

resolution : 2, 2 (x, y)

extent : 425471, 476651, 4054484, 4097022 (xmin, xmax, ymin, ymax)

crs : +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0

source : D:/5CountyClass/SmythObject/RegionGroup_VBMP2m.tif

names : RegionGroup_VBMP2m

values : 1, 5006705 (min, max)

> trainDataRst

class : RasterLayer

dimensions : 21269, 25590, 544273710 (nrow, ncol, ncell)

resolution : 2, 2 (x, y)

extent : 425471, 476651, 4054484, 4097022 (xmin, xmax, ymin, ymax)

crs : +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0

source : D:/5CountyClass/SmythObject/PolygonPixels_2mres.tif

names : PolygonPixels_2mres

values : 1, 8 (min, max)

> dataType(trainDataRst)

[1] "INT1U"

> classificationFeatures

class : RasterStack

dimensions : 21269, 25590, 544273710, 5 (nrow, ncol, ncell, nlayers)

resolution : 2, 2 (x, y)

extent : 425471, 476651, 4054484, 4097022 (xmin, xmax, ymin, ymax)

crs : +proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0

names : Elev_2mres, NDVI_SM_2mres, VB_SM_2mres, VG_SM_2mres, VR_SM_2mres

min values : 0, -1, 0, 0, 0

max values : 1746, 1, 254, 254, 254

> dataType(classificationFeatures)

[1] "INT2U" "FLT4S" "INT1U" "INT1U" "INT1U"

João Gonçalves

unread,

May 25, 2020, 1:39:16 PM5/25/20

to SegOptim user group

Hi Lesley,

I am really sorry that SegOptim was not up to the task of processing your data even in a better and bigger machine. You are doing it all correctly. Unfortunately, I have to conclude that the current version of the package is not suited to process image data with this magnitude size (> 500M pixels). In previous tests I did, the largest images employed were ~5 to times 10 smaller, which is quite different in terms of memory usage.

Overall, this means that some of the package methods need to be completely re-design to make them more "memory safe". This mainly happens because this 'first' version of the package targets optimization with "small" image sets (~1-10M pixels) and these memory issues are so important.

In any case, now I am starting to lay the plans to a major update of the package and make it more memory safe and faster. It will take probably a couple of weeks to conclude.

Will keep you posted on relevant news regarding this.

Lesley Bulluck

unread,

May 26, 2020, 7:58:31 AM5/26/20

to SegOptim user group

Hello,

Thank you for being willing to work on this issue. I look forward to seeing the new and improved version of SegOptim.

In the meantime, I recently obtained LiDAR data for my region of interest and will see if the addition of a canopy height model improves the cell-based classification I had been using prior to learning about SegOptim.