The basic building block would be inserting and removing elements lock freely in a single array, then you could work your way to circular buffers and perhaps automatically scaling buffers.
You need to maintain 2 indexes: one for the latest read (past tense) position, one for the latest written position. When you want to add a new element to the queue, you atomically increment the writer index, and you got your position where to insert the new thing. To read an element, you atomically increment the reader, and read whatever there is (if you passed the writer index (load read index first, then write index), then you decrement the reader and retry until you succeed).
This is a spin-lock solution, so it will probably kill your CPU if the producers/consumers are not of the same speed (also Go will hog your thread). To prevent resource exhaustion, you'll need to add some mechanism to back off after some failed read attempts, and either sleep a bit, or desirably fall back to a lock.
Of course, for every added feature, you code will get significantly more complicated :) But it's a nice concurrency exercise :D
Cheers,
Peter