Several areas for improvement for sync.Map are known: see
this search.
I notice that your LockCache implementation uses sync.RWMutex, for which the current implementation is also known not to scale well (
#17973).
Finally, the ChannelCache implementation does twice as many goroutine switches as necessary: you're using a goroutine as a mutex, when it would be more efficient (and arguably clearer) to use the channel itself. (For detail, see the slides from my talk on
Rethinking Classical Concurrency Patterns from GopherCon last week. The relevant pattern appears in slides 60-68.)
That said, I don't think your benchmarking methodology is measuring the properties you actually care about. You say that you're implementing a “shared cache”, but the benchmarks only access each key once per goroutine or thread, and access those keys in order (so that cache misses for each key will tend to correlate in time). That access pattern seems extremely atypical for all of the caches I've worked with. I would recommend benchmarking a real program against a real set of requests before you jump to any firm conclusions.
Moreover, you seem to be only benchmarking two threads / goroutines despite apparently having 8 logical cores on the test machine. If you're in a situation where performance matters, you should test with realistic resources too: is this really only a two-core problem in a production setting?