I would try to use dedicated array of boolean indicator or embed such indicator in each element of buffer instead of read/pending-write/actual-write cursors.
something like:
struct Item
{
bool consumed;
Data data;
};
This consumed boolean indicator helps to prevent data races
Producer algo:
1. check if there is empty space - regular insert, consumed = false
2 else
2.1 Remember current cursor (curCursor) and advice reader cursor (readCursor) to desired distance using atomic_fetch_add. If atomic_fetch_add fails goto #1.
2.2. Set each element.consumed=true for each element between curCursor and readCursor (value shall be cached on step 2.1)
2.3 Loop each element from write cursor till cached readCursor.
2.3.1 if element.consumed=false do busy wait
2.3.2 write element.data
2.3.3 element.consumed=false
2.4 Advance write cursor
Consumer algo:
1. Advance readCursor using atomic_fetch_add
2. Consume data and set element.consumed=true
If such algo works producer will be non-wait-free (wait for consumer who acquire element but doesn't retrieve data from it yet) and wait-free consumer.