Assembled friends, enemies, indifferent associates:
Jepsen 0.2.7 is now available on Clojars and GitHub. My sincere thanks
to everyone who contributed!
https://github.com/jepsen-io/jepsen/releases/tag/v0.2.7
This release introduces improved performance for `control/exec` by
default. It adds new features for testing filesystem failures:
`nemesis/bitflip`, which flips random bits in files, and `lazyfs`
(experimental, known bugs) which loses un-fsynced writes to files. It
fixes several minor bugs--for instance, failing to thread state
correctly through the nemesis setup lifecycle--and catches up to new
APIs and file locations in recent versions of Debian.
As an aside: getting Jepsen to run in Docker has been an ongoing
tirefire for years, and the docs now recommend using plain old LXC or
AWS instead.
## API Changes
- The default remote for SSH is now `control.sshj`, not `control.clj-
ssh`. This has been available for a few releases now, and is
significantly faster than clj-ssh. This should be a basically
transparent change, but some error messages thrown during e.g. unstable
connections might change, and you might encounter different behavior
around how it handles host and identity keys, agents, etc.
- The `db/Process` protocol is now called `db/Kill`; the
metaprogramming hacks we had to do to call it `Process` were fragile
under some AOT scenarios. `Process` remains as an alias.
- `util/await-fn` now catches all `Exception`s, rather than just
`RuntimeException`s. It turns out some things you'd really like to
retry, like SQL connection exceptions from JDBC, aren't
`RuntimeException`s.
- `control.util/grepkill` now uses `pgrep` to kill processes, rather
than grep. Some tests were killing unexpected processes.
## Bugfixes
- `db/Process` could, under some AOT scenarios, get compiled multiple
times and fail to register as the same protocol. This meant that tests
could quietly fail to actually kill a process because they thought the
DB didn't support the `Process` protocol. This should hopefully be
fixed now by Even More Metaprogramming Hacks, but we recommend moving
to `db/Kill` just in case there are more bugs along these lines.
- `lein run serve` no longer trusts the local clock when listing local
tests. This should fix issues with copying files from a machine in the
future to one in the past, and those tests not showing up until the
second node's clock catches up.
- The `control.sshj` remote now respects `{:dummy? true}`.
- `nemesis.time`'s programs for bumping and strobing the clock no
longer ran properly on newer platforms, thanks to a change which made
it illegal to pass a time *and* a timezone to `settimeofday`. We didn't
change the timezone, but it still failed to run.
- `core/run!` discarded the return value of `nemesis/setup!` and used
the original nemesis throughout the test. Now it correctly uses the
returned nemesis.
- `nemesis/Validate` returned invocation, not completion ops, and also
did nothing after the initial `setup!` call. Both of these bugs are now
fixed, which should provide better error guidance to users who make
mistakes writing nemeses.
- `tcpdump` is located in `/usr/bin` on more recent versions of Debian;
we now use the newer path.
- `control/exec` no longer incorrectly reports a command's STDIN as
`nil` when throwing exceptions.
- `docker/bin/up` works on OS X again.
## New Features
- `jepsen.lazyfs`, an experimental project for simulating the loss of
un-fsynced writes, is now available. It does not work correctly--lazyfs
has both crash and safety bugs in this version--but it still might help
you find bugs.
- `nemesis/bitflip` is a new nemesis which can flip a random fraction
of bits in a file. Helpful for fuzzing DBs' ability to handle
filesystem corruption.
- `store/fressian` now serializes exceptions as data. A recurring
problem in Jepsen tests is having a `Throwable` get into the history
somewhere, and then exploding the serializer when it comes time to
write the test. This is especially frustrating when *nothing* in the
test itself logs that exception--you have no idea where it's coming
from. Jepsen now serializes exceptions to data; this will not round-
trip properly, but it *does* help you figure out the exception and
operation that went wrong. These exceptions are also logged at level
WARN during serialization. At the repl you can load the test and use a
new utility function, `jepsen.util/deepfind`, to find the offending
object.
- `util/rand-exp` generates random, exponentially-distributed values
around a given mean.
- `tests.cycle.wr` now has a test constructor and docstring aligned
with `tests.cycle.list-append`, as well as updated docs.
## Small Changes
- We used to round off milliseconds in tests' `:start-time`, but this
causes collisions when you run multiple tests in the same second. We
now use millisecond resolution again.
- `reconnect` now passes through `InterruptedIOException` in the same
way as `InterruptedException`, which should speed up/clarify the abort
procedure when something goes wrong in e.g. DB setup using the
`control.sshj` remote.
- `util/stop-daemon!` now throws a timeout when the `kill` operation
hangs.
- `nemesis.time` now throws more informative errors when compilation
fails
- New tests for nemeses
- Tests are a little quieter about logging now
- Clojure 1.11.1
- Unilog 0.7.30
- SSHJ 0.33.0
- Fipp 0.6.26
- Elle 0.1.5
- HTTP-kit 2.6.0
--Kyle