Showcase: Work-stealing deque in C#

76 views

Skip to first unread message

Dmitriy V'jukov

unread,

Aug 23, 2008, 2:48:58 PM8/23/08

to Relacy Race Detector

As proof-of-concept of C#/CLI support, I apply Relacy Race Detector to
work-stealing deque algorithm implemented in C#. You can see original
implementation in Joe Duffy's weblog:
http://www.bluebytesoftware.com/blog/CommentView,guid,1665653b-b5f3-49b4-8144-cfbc5e8c632b.aspx
(there is complete source in the end of the post)

Here is algorithm translation for Relacy.
Note that I use latest version (1.1.1) of Relacy. You can download it
here:
http://groups.google.com/group/relacy/files
To enable C#/CLI mode you must define RL_CLI_MODE. volatile variables
you must replace with rl::nvolatile<>, all other plain variables you
must replace with rl::nvar<>. Atomic operations available in
rl::Interlocked namespace, for example rl::Interlocked::Exchange().

#define RL_CLI_MODE
#include "../relacy/relacy.hpp"

using rl::nvar;
using rl::nvolatile;
using rl::mutex;

template<typename T>
class ws_deque
{
public:
ws_deque()
{
m_mask($) = initial_size - 1;
m_headIndex($) = 0;
m_tailIndex($) = 0;
m_array($) = RL_NEW_ARR(nvar<T>, initial_size);
m_arraySize($) = initial_size;
}

bool IsEmpty()
{
return m_headIndex($) >= m_tailIndex($);
}

size_t Count()
{
return m_tailIndex($) - m_headIndex($);
}

void push(T item)
{
size_t tail = m_tailIndex($);
// original version:
//if (tail < m_headIndex($) + m_mask($))
// corrected version:
if (tail <= m_headIndex($) + m_mask($))
{
m_array($)[tail & m_mask($)]($) = item;
m_tailIndex($) = tail + 1;
}
else
{
m_foreignLock.lock($);
size_t head = m_headIndex($);
size_t count = Count();
if (count >= m_mask($))
{
size_t arraySize = m_arraySize($);
size_t mask = m_mask($);
nvar<T>* newArray = RL_NEW_ARR(nvar<T>, arraySize *
2);
nvar<T>* arr = m_array($);
// original version:
//for (size_t i = 0; i != arraySize; ++i)
// corrected version:
for (size_t i = 0; i != count; ++i)
newArray[i]($) = arr[(i + head) & mask]($);
m_array($) = newArray;
m_arraySize($) = arraySize * 2;
m_headIndex($) = 0;
m_tailIndex($) = count;
tail = count;
m_mask($) = (mask * 2) | 1;
}
m_array($)[tail & m_mask($)]($) = item;
m_tailIndex($) = tail + 1;
m_foreignLock.unlock($);
}
}

bool pop(T& item)
{
size_t tail = m_tailIndex($);
// original version:
//if (m_headIndex($) >= tail)
// corrected version:
if (tail == 0)
return false;
tail -= 1;
rl::Interlocked::Exchange(m_tailIndex, tail, $);
if (m_headIndex($) <= tail)
{
item = m_array($)[tail & m_mask($)]($);
return true;
}
else
{
m_foreignLock.lock($);
if (m_headIndex($) <= tail)
{
item = m_array($)[tail & m_mask($)]($);
m_foreignLock.unlock($);
return true;
}
else
{
m_tailIndex($) = tail + 1;
m_foreignLock.unlock($);
return false;
}
}
}

bool steal(T& item)
{
if (false == m_foreignLock.try_lock($))
return false;
size_t head = m_headIndex($);
rl::Interlocked::Exchange(m_headIndex, head + 1, $);
if (head < m_tailIndex($))
{
item = m_array($)[head & m_mask($)]($);
m_foreignLock.unlock($);
return true;
}
else
{
m_headIndex($) = head;
m_foreignLock.unlock($);
return false;
}
}

private:
static size_t const initial_size = 2;
nvar<nvar<T>*> m_array;
nvar<size_t> m_mask;
nvar<size_t> m_arraySize;
nvolatile<size_t> m_headIndex;
nvolatile<size_t> m_tailIndex;
mutex m_foreignLock;
};

Note that implementation is really very close to original. The
translation is mostly 'mechanical'.
Here is simple test suite that I apply to implementation:

struct ws_deque_test : rl::test_suite<ws_deque_test, 2>
{
ws_deque<int> q;
bool state [2];

void before()
{
state[0] = true;
state[1] = true;
}

void after()
{
RL_ASSERT(state[0] == false);
RL_ASSERT(state[1] == false);
}

void thread(unsigned index)
{
if (0 == index)
{
q.push(1);
q.push(2);

int item = 0;
bool res = q.pop(item);
RL_ASSERT(res && item == 2);
RL_ASSERT(state[1]);
state[1] = false;

item = 0;
res = q.pop(item);
if (res)
{
RL_ASSERT(state[0]);
state[0] = false;
}

item = 0;
res = q.pop(item);
RL_ASSERT(res == false);
}
else
{
int item = 0;
bool res = q.steal(item);
if (res)
{
RL_ASSERT(item == 1);
RL_ASSERT(state[0]);
state[0] = false;
}
}
}
};

Test have revealed 2 minor moments.
First. In push() function, when array is copied to new, incorrect
range is used:
// original version:
//for (size_t i = 0; i != arraySize; ++i)
// corrected version:
for (size_t i = 0; i != count; ++i)
newArray[i]($) = arr[(i + head) & mask]($);
It's detected by Relacy as access to uninitialized variable.

Second.In push() function incorrect condition is used to determine
when array is full:
// original version:
//if (tail < m_headIndex($) + m_mask($))
// corrected version:
if (tail <= m_headIndex($) + m_mask($))

Also Relacy have revealed more subtle moment. When there is only one
element in the queue, and owner and foreign thread both can decide
that deque is empty. I think that with some particular thread blocking/
signaling logic this can lead basically to deadlock.
To fix this I change pop() function as follows:
// original version:
//if (m_headIndex($) >= tail)
// corrected version:
if (tail == 0)
return false;
Basically this means that in order to correctly determine that deque
is empty owner thread must acquire m_foreignLock mutex, otherwise
owner thread can falsely decide that deque is empty.

Dmitriy V'jukov

Reply all

Reply to author

Forward

0 new messages