Re: [jepsen-talk] Error in Jepsen when running on EC2 with a large number of clients

39 views
Skip to first unread message

Kyle Kingsbury

unread,
Jan 6, 2019, 7:28:47 PM1/6/19
to ta...@jepsen.io
This is fixed in the latest version of Jepsen. :)

On Sun, Jan 6, 2019, 20:10 Kiarash Rahmani <kiara...@gmail.com wrote:
Hello everyone,

I am using Jepsen to test certain properties of Java applications on Cassandra. I was able to successfully set up a local cluster of Cassandra Nodes each inside a Docker container (using Jepsen's Docker-compose feature).
Then I wanted to move away from docker solution and scale my test setup to Cassandra nodes running on EC2 (possibly tens of nodes) and thousands of clients. 

I was able to do so and the tests seem to be running smoothly (tested up to 4 Cassandra nodes on the same availability zone and 3000 clients) until the very end of the tests. Jepsen breaks right before finishing the tests throwing:

ERROR [2019-01-06 18:54:00,263] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:

java.lang.InterruptedException: null

at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) ~[na:1.8.0_191]

at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) ~[na:1.8.0_191]

at jepsen.core$do_worker_BANG_.invokeStatic(core.clj:194) ~[na:na]

at jepsen.core$do_worker_BANG_.invoke(core.clj:171) ~[na:na]

at jepsen.core$run_workers_BANG_$fn__2655$fn__2656.invoke(core.clj:256) ~[na:na]

at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.9.0.jar:na]

at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]

at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]

at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.9.0.jar:na]

at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.9.0.jar:na]

at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.9.0.jar:na]

at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]

at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]

at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.9.0.jar:na]

at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.9.0.jar:na]

at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]


 I suspect that some of my java clients (which are called through Cclojure by Jepsen when invoking Jepsen's operations) crash while awaiting on the teardown-latch (@core.clj:L210) which throws the InterruptedException. However, I am not sure why Jepsen also fails at the end (it should tolerate client failure, right?). Can someone please explain to  me how to fix this problem, or at least where to look for its main cause?

Thanks!


--
You received this message because you are subscribed to the Google Groups "Jepsen Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to talk+uns...@jepsen.io.
To post to this group, send email to ta...@jepsen.io.
To view this discussion on the web visit https://groups.google.com/a/jepsen.io/d/msgid/talk/0e7fa0c6-d44a-4807-ac94-191bfc9bff2f%40jepsen.io.

Kiarash Rahmani

unread,
Jan 7, 2019, 9:49:03 AM1/7/19
to Jepsen Talk
Thanks, Kyle!

Yes! I noticed after I posted the question, that's why I deleted it. I apologize for the inconvenience and thanks for the prompt response.
Reply all
Reply to author
Forward
0 new messages