RL_VERIFY(n.count_ == limit) failed

34 views
Skip to first unread message

Ronald Landheer-Cieslak

unread,
Aug 2, 2011, 3:39:08 PM8/2/11
to rel...@googlegroups.com
Hello,

I haven't found a better place to post this, so if this isn't the right place, please point me where I should go...

I'm using Relacy 2.3 to test one of my algorithms, and I'm running into a little problem using the fair_full_search_scheduler_type on one of my tests - and I'm not having the same  problem with the default scheduler. Inside rand_impl, the assertion RL_VERIFY(n.count_ == limit); fails - apparently, n.count_ is one higher than limit. Going up the stack, it looks like the running_thread_count is one below the number of threads in the test: the test should run with 64 threads but there are only 63 running. (I get the same thing with 6 threads, btw.) I'm a bit confused as to how that can happen...

I've posted the stack trace below. As you can see I'm in a WaitForSingleObject with infinite time-out (some other thread will eventually wake me up). As far as I can tell, relacy is trying to decide which thread (or fiber, rather) to run next and attempts to pick one randomly. It picks an entry from stree_, which at this time has size 17662 (stree_depth is 2). The tree is filled with values that look like: { 64, 0, sched_type_sched, 0 }.

The 63 comes from the running_threads_count variable, which is at 63 at the time rand_impl is called. That's because it was decremented in the block_thread function, working for the WaitForSingleObject. And that's what I don't understand: why an assertion that basically says "all possible threads must be running" while it is obviously possible that all possible threads aren't running? Should that assertion say RL_VERIFY(n.count_ >= limit) (i.e. "no more than all possible threads are running) and, if index_ > limit, just loop so it can try again?

If that looks like the right solution to you, I'll be happy to provide a patch (against the current SVN, if preferred) that does that.

Thanks,

rlc

Stack trace follows:
RelacyTests.exe!rl::tree_search_scheduler<rl::full_search_scheduler<64>,rl::tree_search_scheduler_thread_info<64>,64>::rand_impl(unsigned int limit=63, rl::sched_type t=sched_type_sched)  Line 275 + 0xc bytes C++
  RelacyTests.exe!rl::scheduler<rl::full_search_scheduler<64>,rl::tree_search_scheduler_thread_info<64>,64>::rand(unsigned int limit=63, rl::sched_type t=sched_type_sched)  Line 134 C++
> RelacyTests.exe!rl::tree_search_scheduler<rl::full_search_scheduler<64>,rl::tree_search_scheduler_thread_info<64>,64>::schedule_impl(rl::unpark_reason & reason=unpark_reason_normal, unsigned int yield=1)  Line 219 + 0x10 bytes C++
  RelacyTests.exe!rl::scheduler<rl::full_search_scheduler<64>,rl::tree_search_scheduler_thread_info<64>,64>::schedule(rl::unpark_reason & reason=unpark_reason_normal, unsigned int yield=1)  Line 121 + 0x17 bytes C++
  RelacyTests.exe!rl::context_impl<ClockTest,rl::full_search_scheduler<64> >::schedule(unsigned int yield=1)  Line 545 + 0x16 bytes C++
  RelacyTests.exe!rl::context_impl<ClockTest,rl::full_search_scheduler<64> >::park_current_thread(bool is_timed=false, bool allow_spurious_wakeup=false, const rl::debug_info & info={...})  Line 370 C++
  RelacyTests.exe!rl::waitset<64>::park_current(rl::context & c={...}, bool is_timed=false, bool allow_spurious_wakeup=false, const rl::debug_info & info={...})  Line 41 + 0x1d bytes C++
  RelacyTests.exe!rl::event_data_impl<64>::wait(bool try_wait=false, bool is_timed=false, const rl::debug_info & info={...})  Line 240 + 0x1a bytes C++
  RelacyTests.exe!rl::generic_event::wait(bool try_wait=false, bool is_timed=false, const rl::debug_info & info={...})  Line 349 + 0x25 bytes C++
  RelacyTests.exe!rl::rl_WaitForSingleObjectEx(rl::win_object * obj=0x00b1d868, unsigned long timeout=4294967295, int alertable=0, const rl::debug_info & info={...})  Line 56 + 0x1d bytes C++
  RelacyTests.exe!rl::rl_WaitForSingleObject(rl::win_object * obj=0x00b1d868, unsigned long timeout=4294967295, const rl::debug_info & info={...})  Line 69 + 0x13 bytes C++


Dmitriy V'jukov

unread,
Aug 2, 2011, 5:43:26 PM8/2/11
to rel...@googlegroups.com
Hi Ronald,

It is the right place for this.
The situation described in the comment above the assert is the most likely reason:

            // If you hit assert here, then probably your test is non-deterministic

            // Check whether you are using functions like ::rand()

            // or static variables or values of object addresses (for hashing) in your test

            // Replace ::rand() with rl::rand(), eliminate static variables in the test

 
Relacy requires all tests to be fully deterministic. That is, Relacy itself behaves exactly the same way, the test should behave exactly the same way either. This allows to deterministically replay tests, do systematic state exploration and allows for some optimizations as well - namely, Relacy does not collect execution history during normal execution, only if test fails Relacy replays it and collects history.

The path taken in rand_impl() suggests that the assert is violated exactly during replay. The assert says not "all possible threads must be running", but rather "the previous time in this point in execution 64 threads were running, while this time only 63, so I am confused and can't collect execution history of the failing execution".

Things that cause non-deterministic execution are use of functions like ::rand(), hashing of pointers (pointer values are not preserved during replays), collection of state in global variables across test executions, etc.


Ronald Landheer-Cieslak

unread,
Aug 3, 2011, 1:58:15 PM8/3/11
to rel...@googlegroups.com
Hi Dmitry,

Thanks for the explanation. I didn't know the test was run twice in case of a detected error: while the code being tested doesn't have any static variables or other non-deterministic code, my test suite apparently did.

Thanks,

Ronald
Reply all
Reply to author
Forward
0 new messages