tl;dr – That PyPy performance post is very misleading. PyPy only has better printf performance than C in the particular case where you print the same expression twice. Specifically, their JIT figures out that it can decode the value only once instead of twice, which C cannot do because libc's printf is all run-time, so PyPy is about twice as fast since decoding is almost all of the work. If you change the test to anything where you're printing different expressions, PyPy is no faster than C. It seems to me that this post was either intentionally misleading or they didn't understand why their compiler was doing so suspiciously well on this benchmark (any time you're handily beating C is cause for suspicion). Moreover, as soon as PyPy has to print to an output stream instead of just creating a string in memory, the performance tanks becoming 7x slower than C.
Currently Julia's
printfd benchmark time is around 55 ms – about 2x slower than C. I'm sure we can get closer with more work on I/O, but there are bigger fish to fry. Your benchmark code isn't actually doing a printf test but rather a string interpolation equivalent to
strcat(string(i)," ",string(i)) – what it expands to at compile time. Since I don't know what your PyPy benchmark code is doing, it's hard to say how PyPy could be 20x faster. I get the following relative timings in Julia:
julia> @elapsed begin
for i = 1:100000
"$i $(i+1)"
end
end
0.30051112174987793
julia> @elapsed begin
for i = 1:100000
@sprintf("%d %d\n",i,i+1)
end
end
0.18344807624816895
julia> @elapsed begin
open("/dev/null","w") do io
for i = 1:100000
@printf(io,"%d %d\n",i,i+1)
end
end
end
0.10406017303466797
Since we're 2x slower than C for printf, this implies that sprintf is 3.5x slower than C, which is not great, and interpolation is almost 6x slower than C, which is kind of terrible. If PyPy is really 20x faster than our interpolation, though, that still implies that it's 3.33x faster than C, which seems pretty implausible.