Hi Robert,
I think it's fine for lists to be faster than binaries when comparing Elixir/Erlang implementations. But it is not fine for a NIF to be slower than any of them.
I've written a basic implementation in C that compares naive copying with a more efficient algorithm:
https://gist.github.com/alco/7d75b87b77bb7c113499#file-results
slow Input size = 100000, time = 1918.361000 µs
fast Input size = 100000, time = 254.149000 µs
Compare this to the Elixir vs :binary.copy measurements:
StringDuplicateBench.strdup 100000: 20000 88.69 µs/op
StringDuplicateBench.binary copy 100000: 5000 546.27 µs/op
Of course, the real binary_copy implementation has to take a lot of things into consideration, but the compiled BEAM code almost always has more run time overhead compared to the native implementation. At the native level you have more ability to optimize things, so these results can't be called satisfactory. It may just be the fact that the code complexity of the C implementation is too high to get it right.