Hello everyone,--I am using Jepsen to test certain properties of Java applications on Cassandra. I was able to successfully set up a local cluster of Cassandra Nodes each inside a Docker container (using Jepsen's Docker-compose feature).Then I wanted to move away from docker solution and scale my test setup to Cassandra nodes running on EC2 (possibly tens of nodes) and thousands of clients.I was able to do so and the tests seem to be running smoothly (tested up to 4 Cassandra nodes on the same availability zone and 3000 clients) until the very end of the tests. Jepsen breaks right before finishing the tests throwing:ERROR [2019-01-06 18:54:00,263] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) ~[na:1.8.0_191]
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) ~[na:1.8.0_191]
at jepsen.core$do_worker_BANG_.invokeStatic(core.clj:194) ~[na:na]
at jepsen.core$do_worker_BANG_.invoke(core.clj:171) ~[na:na]
at jepsen.core$run_workers_BANG_$fn__2655$fn__2656.invoke(core.clj:256) ~[na:na]
at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.9.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.9.0.jar:na]
at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.9.0.jar:na]
at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.9.0.jar:na]
at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.9.0.jar:na]
at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.9.0.jar:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
I suspect that some of my java clients (which are called through Cclojure by Jepsen when invoking Jepsen's operations) crash while awaiting on the teardown-latch (@core.clj:L210) which throws the InterruptedException. However, I am not sure why Jepsen also fails at the end (it should tolerate client failure, right?). Can someone please explain to me how to fix this problem, or at least where to look for its main cause?Thanks!
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To post to this group, send email to ta...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/0e7fa0c6-d44a-4807-ac94-191bfc9bff2f%40jepsen.io.