I found that applying amortized cost analysis to locking to be a very elegant approach towards making an LRU cache concurrent (see
ConcurrentLinkedHashMap). This allowed avoiding the lock contention of reordering the recency dequeue immediately to instead amortizing the penalty of being the thread chosen to apply the batched work. I'm now exploring further ideas based on that structure, such as sharding into multiple recency queues and recombining them via merge sort back into a strict LRU order. I suspect that the technique could be applied to other scenarios, but I haven't seen any existing literature leveraging it.