On Thu, May 10, 2018 at 10:49 PM, Доронин Олег <
doro...@yandex.ru> wrote:
> Motivation:
> At the moment TSan is designed in such a way that if there are errors in the multithreaded code, they will be found only if the OS scheduler leads to the event of their occurrence. Such an upproach is quit good for lock-based algorithms because "lock" operations take significant part of the time. Concerning atomic operations and lock-free algorithms, potential contention has lower probability and the OS scheduler may not always reproduce the problems. We propose to implement scheduler in TSan that carry out manual modelling of different executions as it is done in RRD (
https://github.com/dvyukov/relacy), and also simulate the order of variables visibility. Partial discussions can be viewed
https://github.com/dvyukov/relacy/issues/7,
https://github.com/dvyukov/relacy/pull/6,
https://github.com/dorooleg/compiler-rt/issues/1
Hi Олег,
TSan must not lose it's ability to work with arbitrary programs
out-of-the box, it's one its main advantages. Fibers are quite
problematic in this context:
> There are several approaches:
>
> 1. Write a scheduler based on fibers (Linux has makecontext, swapcontext). It is made in RRD.
> Advantages:
> 1. It is easy to implement
This looks questionable to me. As compared to threads, fibers will
require more work to do manual scheduling, context switches, turning
threads into fibers, etc. So this looks like strictly more work as
compared to threads.
> 2. An ability to make focus on specified parts of source code (If we have a server application, it is likely to have a "while (true)", hence, it is difficult to simulate different executions. Also, if we are interested in a certain area in the code, we may highlight it using annotations, and simulate different executions only between these annotations.
This part is the same for threads, right? We can use annotations for
threads too.
> Moreover, we can use fork mechanism to highlight code fragments because of existing only one real thread in the application, as a rule, before such fragments, and fork will create a copy of the entire application)
This is different.
> 3. Easy application debug (Since the code is actually linear)
> Disadvantages:
> 1. TLS (thread local storage) problem. (Because "swapcontext" does not consider TLS. One way to solve this problem is to hold manual TLS for each thread and copy it. However, it influences performance strongly. The other way is to substitute the TLS address for each thread, for example, on x86_64 via FS:0. By the way, it is not clear for me why the calculation of TLS in TSan goes through asm("mov %%fs:%c1,%0" : "=r"(descr_addr) : "i"(kThreadSelfOffset));
> // Perhaps, is it right to calculate as described here
https://www.uclibc.org/docs/tls.pdf ? Could anyone explain?
Note that besides TLS there are also signal masks, gettid and most
likely a very long tail of ways to observe difference between fibers
and threads.
> 2. Low performance (This approach can work slowly. For example, if we have a 20-core processor and 20 threads are running in the application, using fiber simulteniously, the application will run 20 times slower. On the other hand, we can run 20 different applications or write a planning algorithm that will be able to simulate several paths in parallel)
> 3. Looping (Fiber can go into an endless loop, for instance, on spinlock or other reasons. To fix the hang problem, we can use a timer after which the fiber will be removed from the execution forcefully).
This is a problem. There are sure waits that we will not intercept and
tsan must not hang in such cases. ote that asynchronous switch is very
problematic, I am not even sure how to do it. Relacy is all
synchronous, one the thread itself can decide to switch.
> 4. Partly natural thread environment (Unfortunately, fiber is not a natural environment for threads that can strongly affect the operation of the program).
>
> 2. We can write a scheduler based on POSIX threads. In order to manage threads, we will use the wait-notify mechanism based on spinlock and create dedicated thread that will manage others. Only one thread will work at a time. Probably, it will be desirable to run several threads simultaneously, but it is not clear yet how to do it with benefit.
> Advantages:
> 1. This approach can work faster than with fiber since we have "hot threads" waiting on spinlock.
> 2. No need to monitor TLS (this is already done).
> 3. More natural environment in terms of OS multithreading
> Disadvantages:
> 1. We need to create an additional thread to control the rest which affects performance.
Not necessary. I would even say adding an additional thread would be
wrong. pthread_mutex_unlock interceptor simply decided if it wants to
continue running the current thread, or switch to the newly unblocked
thread.
> 2. It will not be possible to make fork as it is proposed with fiber because after the fork we will have only one thread
True. But note that fork is problematic for multiple other reasons,
for example, if a program has created a socket, after a fork, both
processes will refer to the same socket and chaos will ensue.
Resources can't generally be forked.
> 3. No understanding how to run multiple threads at the same time.
There is some research on this:
https://dl.acm.org/citation.cfm?id=2254128
https://www.cs.rutgers.edu/~santosh.nagarakatte/papers/pldi12_needlepoint.pdf
and probably more.
> 4. Need to monitor thread hangs.
The same for fibers. But for threads it's much simpler. If we suspect
hangs, we can unblock other threads and let more threads run in
parallel.
> The problem of global memory should be taken into account in both approaches: after the first iteration of the application global memory will be changed. In order to keep the initial state of the program we will use the fork mechanism.
>
> Now, there are several questions to community:
> 1. Does it make sense to add to TSan? Can it be useful?
> 2. Which approach of planning is better to choose (or implement both of them)?
> 3. What pitfalls are not taken into account?
> 4. Are there any other options how to do it?
> 5. What can be done with TLS for fiber?
> 6. How to activate planning mechanism in TSan in a better way ("#include" with additional "DEFINE", immediately embed by default, planning types include through TSAN_OPTIONS)?
It would definitely be useful.
Since this is a large task, we need to figure out what would be an MVP
(minimal viable product). I think we need to start with adding
required annotations to synchronization primitives and thread control
functions (a-la what was done for TSan v1:
https://github.com/dvyukov/data-race-test-1/blob/master/earthquake/earthquake_wrap.h).
Then this already can be turned into a useful thing by just adding
random sleeps in the annotations (a-la
https://github.com/dvyukov/data-race-test-1/blob/master/earthquake/earthquake_core.c).
And this is commit-able by itself, so this is a good starting point.
After that I would add a systematic scheduling with threads as it's
simpler and will work with any programs. But note that some programs
will always use only the random mode, because they are too huge for
any systematic analysis.
After that I would look if we still want fibers for some cases or not.
Yes, the runtime behavior should be controlled with flags passed in
TSAN_OPTIONS.
FTR, some research in this area:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2007-149.pdf
https://www.semanticscholar.org/paper/Delay-bounded-scheduling-Emmi-Qadeer/6e69ad3daf1d4dfe98b83a4ed448cfa0ff016102
https://research.cs.wisc.edu/wpis/papers/cav08.pdf
https://dl.acm.org/citation.cfm?id=1375625
https://link.springer.com/chapter/10.1007/978-3-642-00768-2_11
https://dl.acm.org/citation.cfm?id=1736040
https://dl.acm.org/citation.cfm?id=1453121
https://dl.acm.org/citation.cfm?id=1321679