Read the data from pelops client and then benchmark it.

compte...@gmail.com

unread,

Apr 24, 2013, 2:25:06 AM4/24/13

to sca...@googlegroups.com

I am trying to read the data from `Cassandra database` using `Pelops client`. I am successfully able to do that.

Now I have started doing `benchmarking`, meaning how much time read takes from Cassandra database using Pelops client. So I have added my `benchmarking code` in below code.

Now I am not sure whether I have added my `benchmarking code` at correct place to measure the read latency of `Cassandra database` using `Pelops client` or not?

Below is my code-

     // It will retrieve all the attributeNames given a rowKey from the Cassandra database
    public Map<String, String> getAttributes(final String rowKey, final Collection<String> attributeNames, final String columnFamily) {

        final Map<String, String> attributes = new ConcurrentHashMap<String, String>();

        try {
            final SlicePredicate myPredicate = Selector.newColumnsPredicate(attributeNames.toArray(new String[attributeNames.size()]));

            final Selector selector = Pelops.createSelector(CassandraPelopsConnection.getInstance().getPoolName());

            // this is the right place to start the timer?
            CassandraTimer timer = CassandraTimer.getInstance();

            final List<Column> columnList = selector.getColumnsFromRow(columnFamily, rowKey, myPredicate, ConsistencyLevel.ONE);

           // And this is the right place to end the timer incase of Pelops client?
            timer.getDuration();

            for (Column column : columnList) {
                attributes.put(new String(column.getName()), new String(column.getValue()));
            }
        } catch (Exception e) {

        }

        return attributes;
    }

Can anyone take a look and let me know whether I am doing right or not? Why I am asking this is, because whatever benchmarking I have done so far, all the time 95th percentile is coming as 1 milliseconds.

Dan Washusen

unread,

Apr 24, 2013, 2:38:12 AM4/24/13

to sca...@googlegroups.com

If you're just interested in timing how long it takes to read columns from a row then your timing code looks correct...

You haven't mentioned how many nodes you're testing against, how much data, how much of that data is cached, how many concurrent clients are accessing that data etc. If your test is running on localhost with a small dataset then 1ms could be about right...

If you just want to test your Cassandra cluster performance then take a look at: http://www.datastax.com/docs/1.1/references/stress_java

--
You received this message because you are subscribed to the Google Groups "Scale 7 - Libraries and systems for scalable computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scale7+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Dan Washusen

Make big files fly!

https://digitalpigeon.com

Message has been deleted

compte...@gmail.com

unread,

Apr 24, 2013, 3:12:37 AM4/24/13

to sca...@googlegroups.com

Thanks Daniel for your email. I have around one millions unique rows in Cassandra database.

I have a single cluster with four nodes. I have created keyspace like this-

            create keyspace profilekeyspace
            with placement_strategy = 'NetworkTopologyStrategy'
            and strategy_options = {DC2 : 1, DC1 : 1}
            and durable_writes = true;

And my column family name is- `profile_columnfamily`

These are my four nodes-

          lp-host01.vip.slc.qa.host.com:9160
          lp-host02.vip.slc.qa.host.com:9160
          lp-host03.vip.phx.qa.host.com:9160
          lp-host04.vip.phx.qa.host.com:9160

I have Key Caching enabled and SizeTieredCompaction strategy is enabled as well.

As I was trying to measure the read latency of Cassandra database using Pelops client. And also I am not doing this test locally, we have Cassandra database installed in LnP environment and I am running my client program from similar machines as well..

I started my client program initially with 2,3,4,5,6,7,8,9,10 threads and all the time I got 95th percentile as 1 milliseconds.

Let me know if this still looks right to you.

Dan Washusen

unread,

Apr 24, 2013, 4:11:15 AM4/24/13

to sca...@googlegroups.com

Hmm, yep, you're using ConsistencyLevel.ONE so only one node is coming into play per read and I'd assume the network is capable of sub-ms pings so those numbers still seem reasonable.

How large are the rows in bytes? I'd guess that all your data is in memory (1 million * 1KB is less than a GB of data) so reads will be super quick... Are you doing writes as the same time as the reads (over time your data could become fragmented and reads will slow down until a compaction is performed).

Pelops is a very thin wrapper of the Thrift client. It's not going to add much to your operations...

compte...@gmail.com

unread,

Apr 24, 2013, 1:30:07 PM4/24/13

to sca...@googlegroups.com

Thanks Daniel.

1) Yup the network is capable of sub-ms pings
2) I am storing JSON String in each columns, so average row size is approximate 300 bytes.
3) You mentioned data is in memory? You mean data is cache? I am not sure how verify this thing out. Please let me know if there is any to figure it out I have access to OPSCenter portal as well. I populated one million rows yesterday and after the insert was done then I started reading the data from Cassandra database.
4) No I am doing read operation only after the write is done.

One important thing that I wanted to add here is, with the same program that I am using with Pelops client to retrieve the data from Cassandra, I used it with Astyanax client as well. But with Astyanax client performance keeps on degrading as soon as I keeps on increasing Thread Size, So I am wondering doesn anyone have done any benchmarking with Astyanax and Pelops ever? Below is result for Astyanax client

Number of Threads    Read Latency (95th Percentile)    Total duration the program was running (in minutes)    Throughput (requests/seconds)
2                              1 milliseconds                              5                                                                            1566 requests per second
3                            1 milliseconds                              5                                                                             2388 requests per second
4                              2 milliseconds                              5                                                                             2929 requests per second
5                              3 milliseconds                              5                                                                             3105 requests per second
6                              4 milliseconds                              5                                                                             3116 requests per second
7                              5 milliseconds                              5                                                                             3090 requests per second
8                            6 milliseconds                              5                                                                             3194 requests per second
9                              8 milliseconds                              5                                                                             3128 requests per second
10                      9 milliseconds                              5                                                                             3130 requests per second

Dan Washusen

unread,

Apr 24, 2013, 5:23:19 PM4/24/13

to sca...@googlegroups.com

Yeah, so you 'only' have ~286mb of data in total. I'd bet that OS level file buffer would reduce the need for any disk reads (http://linux.about.com/od/lsa_guide/a/gdelsa44.htm).

I'm not familiar with the Astyanax client so I can't really comment, but the first thing that jumps to mind is that there might be contention over connections to Cassandra (do they use a pool, if so how many connections are being maintained). By default Pelops will hold up to 20 connections per node...

compte...@gmail.com

unread,

Apr 24, 2013, 8:35:50 PM4/24/13

to sca...@googlegroups.com

Thanks a lot Daniel for pointing out about the Connection Pooling. I did some research by default Astyanax client will have only one connections per node.

So in Pelops, there will be 20 connections per node which means, if I have 4 nodes to which I have made the connection, then there will be 80 open connections for all those four nodes? Am I right?

Dan Washusen

unread,

Apr 24, 2013, 10:02:20 PM4/24/13

to sca...@googlegroups.com

Yep, that's correct.

compte...@gmail.com

unread,

Apr 25, 2013, 12:59:08 AM4/25/13

to sca...@googlegroups.com

Thanks Daniel. And what is the default Maximum connection Pelops can handle apart from per host/node?

Any idea?

Dan Washusen

unread,

Apr 25, 2013, 1:41:09 AM4/25/13

to sca...@googlegroups.com

Not sure on a hard number but I'd say Cassandra would freak out before Pelops did. Just like any DB you'll need to find the sweet spot; too many operations contending for disk will slow all operations down. In saying that, with the setup you've described (read only data, small dataset, etc) there isn't going to be much of an IO/disk issue...

Reply all

Reply to author

Forward