if we assume, that the number of elements N is roughly stable (so pops and pushes are roughly balanced):
• If we append and reslice, we need to reallocate every N pops, because when space runs out, append will allocate 2N elements, so it has space for N new ones. After N pop/push sequences, it will run out of space and the capacity is again N, so it will allocate a new array with 2N elements, copy over N and continue. So every N pops, we need to allocate N elements and copy N elements, but don't incur any other costs.
• If we shift down, then we ~never need to reallocate, but need to copy N elements on each pop, so over N pops we need to shift N² elements
So, really, the question is how (N•copy(N)) compares (alloc(N) + copy(N)). Which, as it seems to me, heavily depends on N, the memory you have, the GC pressure you have and your requirements.
The answer, as always with these kinds of question is: Benchmark both on your specific machine and your specific workload. If there is a clear winner, pick that one. Otherwise, choose either.
In general, I'd indeed assume that for somewhat filled fifo-queues the append-approach is faster. But I'd be willing to be surprised.