Hello Marco,
Good to hear you worked it out.
The issue with removing groups is that you can reduce the range and
abundance estimates for some of your labels, which could then be an
issue for endemism and rarity calculations.
The approach in that case is to run a spatial analysis to calculate
richness, and then use a definition query in the cluster analysis to
only consider cells (groups) with more than your richness
threshold.
The steps are:
1.
Run a spatial analysis with a single neighbour set of
sp_self_only(). This is the default neighbour set in v3, so you
should not need to do anything except calculate the richness
indices. The next step assumes this is called "sp1".
2.
Run the cluster analysis, specifying the definition query to be like
the one below. Edit the "> 3" at the end to change the
threshold.
sp_get_spatial_output_list_value (
output => 'sp1', # using spatial output called sp1
list => 'SPATIAL_RESULTS', # from the SPATIAL_RESULTS list
index => 'RICHNESS_SET1', # get index value for RICHNESS_SET1
) > 3 # and return true if the value is >3
That condition is probably more than most people would want to type
(or copy and edit), so I have opened an issue to implement a simpler
approach in version 4.
https://github.com/shawnlaffan/biodiverse/issues/783
The approach above does have some advantages, though. The main one
is that one can apply any amount of complexity in the neighbour
sets, and threshold using any index value. For example, one could
calculate phylogenetic diversity (PD) using bioregions as the
neighbour sets, and then threshold on those values. Then one would
exclude any cell in a bioregion with PD less than the desired
threshold.
All that said, if your cells with few labels are likely to be errors
then it is better to use the Run Exclusions approach. And if you
are not interested in rarity or endemism type indices then the Run
Exclusions approach will work perfectly well.
Regards,
Shawn.