Convolution terminology is a bit confusing. I think it's easiest to think of the dimensionality as the number of dimensions the filter is repeated along.
A typical 2D convolution applied to an RGB image would have a filter shape of (3, filter_height, filter_width), so it combines information from all channels into a 2D output.
If you wanted to process each color separately (and equally), you would use a 3D convolution with filter shape (1, filter_height, filter_width).
1D convolution is useful for data with local structure in one dimension, like audio or other time series.