Performance issues

Robbie Strickland

unread,

Nov 6, 2012, 10:49:40 AM11/6/12

to twitter...@googlegroups.com

Hello all,

We'd like to use Cassie since the API is much nicer than Hector, but we've run into issues where the write performance for similar queries is significantly worse with Cassie (latencies are about double). Has anyone else run into this? As a point of reference, I am not a Cassandra newbie--been using it in production since 0.5 with a variety of clients, including a homegrown one I wrote early on. I really like Cassie and want to make it work, so any insight would be welcome!

Thanks,

Robbie

Robbie Strickland

unread,

Nov 6, 2012, 2:37:57 PM11/6/12

to twitter...@googlegroups.com

BTW, here's the test:

https://github.com/rstrickland/cassie_hector

Running with 4 threads shows Hector performing twice as fast.

Ryan King

unread,

Nov 6, 2012, 4:02:13 PM11/6/12

to twitter...@googlegroups.com

On Tue, Nov 6, 2012 at 11:37 AM, Robbie Strickland
<rostri...@gmail.com> wrote:
> BTW, here's the test:
>
> https://github.com/rstrickland/cassie_hector
>
> Running with 4 threads shows Hector performing twice as fast.

My guess is that you'll get much better results from cassie if you use
it asynchronously, rather than blocking like you are right now.

Cassie is built on top of finagle, which is designed more for
scalability (lots of concurrent requests) rather than speed. The async
implementations add some overhead which can make using it in a
blocking manner slower.

-ryan

Stu Hood

unread,

Nov 6, 2012, 4:33:38 PM11/6/12

to twitter...@googlegroups.com

To make your individual "workers" async, you could remove your calls to `par` and do tail recursive loops like:

val loops =
(1 to concurrency).map { _ =>
dep loop(): Future[Unit] = {

val bi = LexicalUUID(MicrosecondEpochClock).toString()

batchInfo.batch()
.insert(bi, Column("test1", IntCodec.encode(5)).ttl(ttl))
.insert(bi, Column("test2", LongCodec.encode(MicrosecondEpochClock.timestamp)).ttl(ttl))
.execute()
.flatMap { _ =>

if (completedRequests.getAndIncrement < totalRequests) {

// fire another request
loop()
} else {

// we're done
Future.Unit
}

}
}

// fire an async loop per `concurrency`
loop()

}
Future.join(loops)

Interested in the results.

Thanks,

Stu

Robbie Strickland

unread,

Nov 7, 2012, 12:28:03 PM11/7/12

to twitter...@googlegroups.com

Using this model with 1 million requests & concurrency set to 4, we got a 5% improvement in throughput. Hector used in a blocking fashion (its only built-in mode) still performs right at 15% better. I updated the github code with the new test if you're interested.

Ryan King

unread,

Nov 7, 2012, 1:27:22 PM11/7/12

to twitter...@googlegroups.com

Why only a concurrency of 4? That seems really low for a load test.

-ryan

On Wed, Nov 7, 2012 at 9:28 AM, Robbie Strickland

--
@rk / theryanking.com

Stu Hood

unread,

Nov 7, 2012, 2:56:06 PM11/7/12

to twitter...@googlegroups.com

Just FYI: without the collect() call, you're not blocking for the loops to finish. What I missed in my example is that in order to block you'd want to call:

Future.collect(loops).get

Agreed with Ryan that more parallelism would be good, if only for testing the clients in overload situations: would be good to saturate all CPUs with both clients. Also, note that the default connection pool size for Cassie is 5, so you'll want to increase that as you add parallelism.

Robbie Strickland

unread,

Nov 7, 2012, 3:14:07 PM11/7/12

to twitter...@googlegroups.com

Ok, so increasing the parallelism to 20 makes a big difference, making Cassie perform about 10% better than Hector. This is good news, and gives me exactly the kind of data I was looking for! Thanks for helping me through this...

On another note, is there anyone reviewing pull requests? I put in a request about a month ago with a codec to support composites, and it doesn't look like anyone has looked at it.

Thanks again!

Robbie

Ryan King

unread,

Nov 7, 2012, 6:18:36 PM11/7/12

to twitter...@googlegroups.com

On Wed, Nov 7, 2012 at 12:14 PM, Robbie Strickland
<rostri...@gmail.com> wrote:
> Ok, so increasing the parallelism to 20 makes a big difference, making
> Cassie perform about 10% better than Hector. This is good news, and gives
> me exactly the kind of data I was looking for! Thanks for helping me
> through this...
>
> On another note, is there anyone reviewing pull requests? I put in a
> request about a month ago with a codec to support composites, and it doesn't
> look like anyone has looked at it.