Cassandra Stress Testing

556 views
Skip to first unread message

Mohit Anchlia

unread,
Mar 21, 2011, 1:00:24 PM3/21/11
to hector-users
One of the use cases I need to performance test is the write and read
of blobs (avg 64K, max 30 MB). Is there a hector tool I can use or
should I just write simple hector clients of my own?

Patricio Echagüe

unread,
Mar 21, 2011, 1:02:36 PM3/21/11
to hector...@googlegroups.com, Mohit Anchlia

Mohit Anchlia

unread,
Mar 21, 2011, 2:18:42 PM3/21/11
to Patricio Echagüe, hector-users
Thanks!

Can I also give the size of the blobs I want to test with? Is
stress-schema configurable to what I need to test? Can I give min, max
and avg sizes of blobs in the test?

2011/3/21 Patricio Echagüe <patr...@gmail.com>:

Patricio Echagüe

unread,
Mar 21, 2011, 2:26:43 PM3/21/11
to Mohit Anchlia, hector-users
 options.addOption("w","colwidth",true,"The widht of the column in bytes. Default is 16");

-w <column_size> is the column size.  But I think for better blob manipulation you will have to extend the framework a bit.

Nate, any light on this?

Nate McCall

unread,
Mar 21, 2011, 2:32:52 PM3/21/11
to hector...@googlegroups.com
Never tried with large blobs. The code generates a random string of
'w' length, (it's all binary on the wire at the end of the day, string
vs. bytes should not matter too much), so give it a try and let me
know how it goes :-)

-N

2011/3/21 Patricio Echagüe <patr...@gmail.com>:

Mohit Anchlia

unread,
Mar 21, 2011, 2:43:31 PM3/21/11
to hector...@googlegroups.com
I'll take a look at the code and see. But first I will to setup and compile :)

Mohit Anchlia

unread,
Mar 21, 2011, 3:03:31 PM3/21/11
to hector...@googlegroups.com
Is RF and CL also configurable or do I need to make code changes for that?

Nate McCall

unread,
Mar 21, 2011, 3:24:00 PM3/21/11
to hector...@googlegroups.com, Mohit Anchlia
RF is at the config level of the cluster. CL is not yet configurable
via the CLI. I have something close to that locally which I can add
shortly here though.

Nate McCall

unread,
Mar 21, 2011, 4:09:05 PM3/21/11
to hector...@googlegroups.com
(Including hector-users - not sure why/where this was crossposted?).

On Mon, Mar 21, 2011 at 3:05 PM, Nate McCall <na...@datastax.com> wrote:
> Requires maven install to be run on the source initially. I just
> updated the readme with instructions for such. In short
>
> - download source
> - mvn install
> - cd target/appassembler
> - sh bin/stress ... options....
>
> I'll do some fancier packaging at some point, but the above should
> work ok for now.
>
> On Mon, Mar 21, 2011 at 2:49 PM, Mohit Anchlia <mohita...@gmail.com> wrote:
>> Thanks!
>>
>> I ran mvn compile install and produced
>> /drives/c/proj/zznate-cassandra-stress-90ff926/target/cassandra-stress-0.7_25.jar.
>>
>> When I try to run:
>> C:\proj\zznate-cassandra-stress-90ff926>java -jar target\cassandra-stress-0.7_25
>> .jar
>> Failed to load Main-Class manifest attribute from
>> target\cassandra-stress-0.7_25.jar
>>
>> Am I doing something wrong?

Mohit Anchlia

unread,
Mar 21, 2011, 4:11:06 PM3/21/11
to Nate McCall, hector-users
Thanks! I didn't realize that script need to be executed instead of
directly using the jar file.

Now if I want to compile on windows and then take it to linux just the
jar files should I just zip the "target" directory and install it that
way?

On Mon, Mar 21, 2011 at 1:05 PM, Nate McCall <na...@datastax.com> wrote:
> Requires maven install to be run on the source initially. I just
> updated the readme with instructions for such. In short
>
> - download source
> - mvn install
> - cd target/appassembler
> - sh bin/stress ... options....
>
> I'll do some fancier packaging at some point, but the above should
> work ok for now.
>
> On Mon, Mar 21, 2011 at 2:49 PM, Mohit Anchlia <mohita...@gmail.com> wrote:
>> Thanks!
>>
>> I ran mvn compile install and produced
>> /drives/c/proj/zznate-cassandra-stress-90ff926/target/cassandra-stress-0.7_25.jar.
>>
>> When I try to run:
>> C:\proj\zznate-cassandra-stress-90ff926>java -jar target\cassandra-stress-0.7_25
>> .jar
>> Failed to load Main-Class manifest attribute from
>> target\cassandra-stress-0.7_25.jar
>>
>> Am I doing something wrong?
>>
>> On Mon, Mar 21, 2011 at 12:24 PM, Nate McCall <na...@datastax.com> wrote:

Nate McCall

unread,
Mar 21, 2011, 4:13:36 PM3/21/11
to Mohit Anchlia, hector-users
Sounds like it should work (ymmv on Windows :-)

Mohit Anchlia

unread,
Mar 21, 2011, 4:16:32 PM3/21/11
to Nate McCall, hector-users
What I meant to ask was that are all jar files required to run are in
"target" directory? So if I just zip the target directory and unzip on
other host it should work as is?

Nate McCall

unread,
Mar 21, 2011, 4:18:16 PM3/21/11
to Mohit Anchlia, hector-users
Yes - all those files are required, iirc.

Mohit Anchlia

unread,
Mar 21, 2011, 5:09:23 PM3/21/11
to Nate McCall, hector-users
Just started looking at the code. I just looked real quick and from
what I understand that "threads" are actually client threads doing
insert/reads from the cluster. But I was also expecting to see
"cassandraHostConfigurator.setMaxActive" in case no. of threads are
really waiting on hector to finish current execution even though there
might be more capacity on cassandra to take more load or the opposte
where we don't want to test with hector connection pool of default 50.

Nate McCall

unread,
Mar 21, 2011, 5:16:16 PM3/21/11
to Mohit Anchlia, hector-users
Threads (-t option) are the number of threads dealing with the
workload. Clients (-C option) are the number connections sent to
CHC#setMaxActive.

Given the overhead in process management, if you really want to test
throughput of a cluster, you should be running the stress tool on
multiple client machines.

Mohit Anchlia

unread,
Mar 21, 2011, 5:27:52 PM3/21/11
to Nate McCall, hector-users
Thanks!

I am getting two 12 core machines with 96GB RAM to use for client test.

It looks like latencies are printed after executing the batch. Is it
possible to print latencies and other statistics in between
periodically? I can change the code if it makes sense for my purpose.

How do I run test for "n" mts or hrs? I am trying to test concurrent
inserts and reads. All concurrent inserts and reads are from diff end
clients (not a batch). Also every row in this case has only one column
unless file (data) is too big (Still working on chunking logic).

Is JMX connecting to hector the best way to monitor how each threads
are doing? Is there anything in particular that I should be paying
more attention to when connecting to hector or cassandra's JMX?

Nate McCall

unread,
Mar 21, 2011, 5:36:44 PM3/21/11
to Mohit Anchlia, hector-users
I've running this all through a profiler lately, so not really looking
at counter output - I think I may have even broken it recently on
reads.

Let me know what would be handy feature wise and I will add it in as
time permits.

See the "Optional Performance Counters" section of the user guide for
how to track performance:
https://github.com/rantav/hector/wiki/User-Guide

Mohit Anchlia

unread,
Mar 21, 2011, 5:51:30 PM3/21/11
to Nate McCall, hector-users
Does it mean I can't use this tool in it's current shape? What I
really need is to run stress test for 'n' mts or hrs with 'x' no of
threads doing inserts and read concurrently. Each insert and read are
not batch transaction but lookup of one row and one column. This is my
first test.

How are reads done? Does this tool cache the keys to do lookups?

Also, is there a way to just do lookups? For eg: Today I inserted 1M
rows and tomorrow I just want to read those 1M rows in random way, is
that possible?

Nate McCall

unread,
Mar 21, 2011, 6:18:56 PM3/21/11
to Mohit Anchlia, hector-users
Keys are all created based on integer ranges and the calculations for
reads are based of calls to random() within the range of valid keys. I
would like to add in some gaussian distribution to the key selection
at some point for better control of modelling of "misses", but that is
as much time as I have put in there. You can start stress with "-o
read ..." at any time to send only read operations with key selection
as described above.

There is no concept of "run for time T", this seems like a useful
enough feature though, so I will add it as time permits (or gladly
accept a contribution).

Mohit Anchlia

unread,
Mar 22, 2011, 4:29:46 PM3/22/11
to Nate McCall, hector-users
Is there a way to test updates to the column too?

Also, it looks like column width is fixed no of bytes. Can it also be
made random between 0 - n bytes. I will probably make this change if
it;s currently not possible.

Also, what's the best way to look hector and cassandra and monitor
throughput as test is progressing?

Mohit Anchlia

unread,
Mar 23, 2011, 1:04:59 PM3/23/11
to Nate McCall, hector-users
Can someone please respond to my mail?

Ajay

unread,
Apr 4, 2012, 7:39:57 PM4/4/12
to hector...@googlegroups.com
Nate McCall <nate@...> writes:
>
> RF is at the config level of the cluster. CL is not yet configurable
> via the CLI. I have something close to that locally which I can add
> shortly here though.
>

Is it possible to use some other placement strategy (e.g.
NetworkTopologyStrategy)
instead of the default SimpleStrategy for the stress tool?
Ideally, can we specify our own stress-schema instead of the default one?
That would allow users to define their own keyspace.

--Ajay

Reply all
Reply to author
Forward
0 new messages