Abstract is slower than concrete. You are asking the CPU to repeatedly discover at run-time what you were unwilling or unable to specify at compile time. Maybe you're being lazy or maybe there is a good reason, but the key notion is crucial: the actual instructions that the computer performs are type-specific so any coding that is type-unspecific forces an analysis and branch table somewhere (preprocessor expansion, compiler concretization, run-time dispatch among equals, or in hardware.)
Differences are often insignificant. Common CPUs are fast compared to common data sizes (so the difference is not important to the user) and most real programs do more in their inner-loops than the exchanges in these sort functions (so the amortized extra time may barely be measurable).
Point of view is important. Programmers coming from the dynamic typing world expect abstraction and find system design without it difficult. They've always been paying the overhead, even for "i = i + 1." From this perspective, it is not that interfaces are slow, but that not using interfaces is unusually fast. (The same argument applies to writing critical sections in assembler vs a higher-level language, making use of special SIMD or AES-assist instructions vs assembler, GPU/MIC coprocessors vs the main CPU, or BitCoin and security ASICs.)
Usage trumps mechanism. Examine the sort package's source code (go/src/pkg/sort/sort.go) and look at Ints() in line 270. What happens is that the API that knows you want to sort an int slice in non-decreasing order uses the abstract sorting engine by specifying the use of default int-specific Len(), Less(), and Swap() functions. That's easy, but it forces the default sorting engine to be abstract "all the way down." Instead, it could have said, "Oh, I have special code for this" and simply been the int-specific version of sorting in to non-decreasing order with custom code for this case (and similarly for other built-in types). If it were coded this way, then the abstract-to-concrete mechanism would happen once per sort call and be immeasurable in terms of performance penalty. That's an example of style of usage being more important than cost of usage.
Solutions are well known. The situation of wanting to perform a fixed algorithm on various types of data is common. The desire to do this in a "factored" way, where the logic is written once with data types as parameters is as old as algorithms. I can easily imagine a student asking Euclid, "but, master, what if we need the GCD of a 64-bit number? You only showed us what to do for 16-bit numbers. Is the algorithm for the larger type related in any way to the smaller?" Both Euclid and I agree that the algorithms are the same and the difference in types is trivial. I use m4 or mcpp (
http://mcpp.sourceforge.net/) to make such trivialities, well, trivial. I do this because Go does not have a native macro, parameterized, or other abstract-to-specific type management facility. Maybe this could be a good research project for Go 1.3 interval -- not to adopt an answer but to explore alternatives with vigor in search of consensus.
Michael