And writing such tests is often harder than writing the code being tested in
the first place, since the tests are concurrent programs too. For example, if
the journaling you describe entails synchronizing, you've introduced timing
biases that could well hide bugs. Doing so without synchronizing (such as by
keeping thread-local logs and merging them at certain intervals) is tricky too.
Further, running tests on concurrent code means running on a variety of
architectures, since some CPU architectures (e.g., IA32) have stronger memory
models than others, and therefore certain reorderings will not occur (at least
not due to the hardware) on such systems.