Reposting a question here.
"When we have categorical layers (soils, veg type) and few presence points (a rare plant in this case ~100 points) we hit the ‘too few data points in a category’ problem. I thought we should make our template cell size bigger (30m to 90m) and try to push the few data points out of cases where there are only a couple points in a soil type to begin with. That didn’t seem to help.
Then I thought I should look in the parameters under MARS, Random Forest, GLM… and use something like a threshold to help. I’m just not sure what to manipulate to allow for few points or let the model throw out few points and disregard so we can still use these layers. Or, if we will always have 0-9 points in a category, will these layers never work? "
Answer:
Hello Michelle,
Categorical layers can be problematic. When you say few points, how few? Keep in mind that you should have at least 10 observation points per covariate layer, but each category in each categorical layer counts against this rule of thumb.
Changing your template size can sometimes help. It can also help to create a new categorical raster with fewer categories. i.e. lump your categories into more general groups. If your categorical raster has a finer cell size than your template you can also convert your categorical raster into a set of continuous ones for each category. In other words you can create a set of rasters that has %forest, %grass, %ice instead of a single categorical raster. This is how we generally handle NLCD.
Good luck,
Colin