You need to be cautious when you are using averages. It would be far better to run each test, 10-100k times and then use a Students-T on the distributions (given that the variance looks the same and it looks normal, further tests might be needed for this).
The problem is that an average can hide that the program is multi-modal: it often answers slighty faster than what is measured (150us say) but once in a while it falls apart fully and uses 400us or more. It is usually better to work on the full data set rather than the average in these cases. If you don't have drops like these, I'd be cautious as well since a real-world-system is likely to get into situations where your programs don't get to run. Bootstrap methods can generally be used for outlier detection in the data set.
A good start is to store the data in files and use Poul Henning Kamp's 'ministat' tool on the data set (Standard in FreeBSD, and there are ports). You can also use R in which case you can also get nice plots.
If I should guess on a reason, the C program is doing less work than the Go program. OTOH, that also means the C program has limits if you have multiple of these routines doing work like this. So you are paying a higher initial constant cost for some flexibility in the long run.