Jepsen 0.2.4

32 views
Skip to first unread message

Kyle Kingsbury

unread,
Jun 3, 2021, 5:07:06 PM6/3/21
to anno...@jepsen.io
Ladies & Gentlethems,

I am pleased to announce Jepsen 0.2.4 is now available!

This release is all about automation. It introduces a new SSH backend based on
SSHJ which is significantly faster than the current clj-ssh. This release also
shells out to `scp` for uploads and downloads, which is much, *much* faster than
using clj-ssh or SSHJ. SSH errors are less frequent, and don't clog the logs
with stacktraces.

For databases with expensive setup processes (especially those which need to be
compiled from source), this release introduces `jepsen.fs-cache`: a lightweight,
concurrency-controlled, filesystem-backed cache for strings, Clojure data, and
entire files. This cache is persistent across Jepsen invocations, so you can
build a binary or perform initial datafile allocation once, cache it, and skip
that process on subsequent test runs.

There's also a new checker which looks for patterns in downloaded log files.
This is particularly helpful for catching stacktraces, panics, segfaults, etc.

# API Changes

- In test SSH options, `:password*` is no longer used for sudo by default. To
set a sudo password, set `:sudo-password`. This fixes a (likely rare) issue
where `sudo` would skip a password prompt, sending that password to the stdin of
whatever command was being invoked instead.
- `control/upload` and `download` no longer take rest args, which used to be
passed directly to `clj-ssh`. These were unused in Jepsen itself, but you may
have relied on this behavior. If so, you should call into `clj-ssh` directly.
- `control.remote` has been moved to `control.core`, and has been restructured
to take option maps instead of relying on dynamically bound variables. This
should only affect you if you wrote a custom Remote implementation.

# New Features

- `control.sshj`: a new `Remote` backend for the control system. This is orders
of magnitude faster than `clj-ssh`. Unfortunately, like `clj-ssh`, it *also*
exhibits weird race conditions.
- `control.scp` allows Jepsen to upload and download files by shelling out to
SCP, which is dramatically faster for large files. This is the default for both
`sshj` and `clj-ssh` remotes.
- `fs-cache`: a lightweight, local-filesystem-backed cache for Jepsen's control
node. Well-suited for DBs that require an expensive build or setup process. Can
cache strings, EDN structures, and remote files alike, and includes a basic
locking mechanism.
- A new checker, `log-file-pattern`, scans downloaded log files for given
regular expressions. Handy for finding server crashes!
- `cli/test-all-cmd` now merges opt specs like `test-cmd` does, allowing you to
override default options.
- `util/sh`: a wrapper for invoking local shell commands on the control node.

# Bugfixes

- `control.util/tmp-file!` now creates `/tmp/jepsen` if  it doesn't already exist
- `control.clj-ssh` (and the new `sshj` backend) now include a
concurrency-limiting semaphore, which prevents at least some (but not all) of
the weird, nondeterministic bugs we've seen with session initiation.

# Minor Changes

- `checker.timeline` is dramatically faster now: it uses a custom pretty-printer
for events.
- Large parts of `control` have been refactored into `control.core`,
`control.retry`, etc. to improve readability and composability
- Docker and AWS environments now also set up ed25519 keys by default
- Lots of new tests for jepsen.control
- When test-all tests crash, we now display their full paths, not just test names
- Removed tea-time, a now-unused dependency
- Removed `:active-histories`: a now-unused part of test maps
- `j.u.c.TimeoutException` is now considered an "uninteresting" exception when
choosing which exception to throw from a concurrent failure; this should result
in more helpful stacktraces.
- Control no longer logs a full stacktrace when it encounters a recoverable
exception. Users consistently complained about these kinds of errors: they
happen constantly but unpredictably, I can't eliminate them, and they don't
really require user action. We log a one-line message instead.
- `os/debian` no longer tries to install the old `libzip2` package for Debian Jessie
- `nemesis.time` uses fewer samples for ntpdate, is generally faster to set up
- `control.util/await-tcp-port` can now take separate intervals for retry and
logging: shorter latency, less log spam!

Happy testing!


--Kyle

Reply all
Reply to author
Forward
0 new messages