You are right in your intuition that we "apply" i-th channel of a filter to i-th channel of the input blob. You must however understand what it really means to "apply" a filter.
The procedure can be imagined as cutting out a part of the input blob (receptive field on a neuron). To go with your example, it'd be a 3x11x11 image, containing pixels (for example) 0:10 in width & height axes. Now we flatten that image into a vector and do the same with the 3x11x11 filter, and compute a dot product between the two. This way we obtain a single value for each filter for each spatial location in the image.
For more detailed reading, I encourage you to check
this article (link is to the section on conv layers).