YCSB Driver for Scalaris

Steve Ramage

unread,

Nov 7, 2011, 12:24:35 PM11/7/11

to scalaris

I am in the middle of a project to benchmark various Key-Value stores
and Scalaris was chosen as one project to check. To the best of my
knowledge there is no YCSB driver (as a recent mailing list posting
suggests). I have written one (it's actually quite simple), but i
wanted to check if there was any additional tuning or approach that
would be better.

For those who don't know a YCSB driver is simply an interface that
implements the following operations:

read()
scan()
update()
insert()
delete()

From what I can gather, the delete and scan operations are not
supported.

For the others, I can simply use a single instance of the
TransactionSingleOp class to preform the read(), update(), and
delete(). If I have multiple nodes in my scalaris cluster, I should
simple list them in the scalaris.properties file as a semi-colon
seperated list, or should I have a seperate instance for each node and
randomly select it during runtime?

Are there any other details that are particularly important when using
the Java library?

Nico Kruber

unread,

Nov 11, 2011, 6:52:00 PM11/11/11

to scal...@googlegroups.com

Hi Steve,
thank you for choosing Scalaris among the K/V stores to bench. We are very
interested in the results you gather.
I haven't dug that deep into this topic yet but let me know if I can help any
further (next time a bit quicker I hope).

Regarding your questions, see below (inline)

On Monday 07 November 2011 09:24:35 Steve Ramage wrote:
> I am in the middle of a project to benchmark various Key-Value stores
> and Scalaris was chosen as one project to check. To the best of my
> knowledge there is no YCSB driver (as a recent mailing list posting
> suggests). I have written one (it's actually quite simple), but i
> wanted to check if there was any additional tuning or approach that
> would be better.
>
> For those who don't know a YCSB driver is simply an interface that
> implements the following operations:
>
> read()
> scan()
> update()
> insert()
> delete()
>
> From what I can gather, the delete and scan operations are not
> supported.

we do support delete but discourage using it as some side-effects might occur,
see https://code.google.com/p/scalaris/wiki/FAQ#How_do_I_delete_a_key?

> For the others, I can simply use a single instance of the
> TransactionSingleOp class to preform the read(), update(), and
> delete().

Yes, if the benchmark does not use transactions, then this is the right class
to go with. It also supports bulk operations which will be committed each
separately, but I suppose the YCSB interface only supports single operations.

> If I have multiple nodes in my scalaris cluster, I should
> simple list them in the scalaris.properties file as a semi-colon
> seperated list, or should I have a seperate instance for each node and
> randomly select it during runtime?

If you list all nodes in the properties file, they will all be entered into
the connection policy's node list.
A single connection created by the policy will stick to the node chosen at
that time until it is not available anymore. If you use multiple Connection
objects or use the parameter-less TransactionSingleOp constructor (which
creates a connection on its own), each connection will randomly choose a node
from the provided node list. Alternatively you can set a round-robin
connection policy which I added in svn trunk.
It might be good to use a connection pool - I also added one into svn trunk
recently. (you should be able to use both with 0.3.0 as well if you do not
want to use svn trunk)

Let me know about the strategy you want to use and I can provide you with more
details.

> Are there any other details that are particularly important when using
> the Java library?

We previously had some problems regarding host names and the different
interpretation by Java and Erlang. If you come along connection errors, you
may use IP addresses instead, e.g. to start a Scalaris node use
> ./bin/scalarisctl -s -n no...@10.0.0.1
You can then enter "no...@10.0.0.1" into the properties file.

You should also note that we only support some data types which are mentioned
in our user-dev-guide distributed with the sources. I don't know what YCSB
needs, but among our basic data types are String, Integer, Long, BigInteger,
Boolean, byte[], Double...

Regards
Nico Kruber

signature.asc

Steve R

unread,

Nov 12, 2011, 12:16:43 AM11/12/11

to scal...@googlegroups.com

Thank you for your reply,

1) Yes YCSB only supports single operations that are generally independent of each other.

2) The attached file shows an instance of the interface for YCSB to interact with Scalaris, as I alluded to previously it's quite simple really. YCSB will simply associate a list of key-value pairs of strings, for each key. This maps into Scalaris's interface easily (on or around) line 47, we simply take the Map<String, Object> we get from Scalaris-API and put it into the Map<String, String> YCSB wants. Because YCSB inserted the value, we can simply do this directly as opposed to taking the hit with casting toString on every object in the collection.

3) YCSB will use one instance of the attached class per thread, and you generally scale up the number of threads. So if as I understand your response, TransactionSingleOp will randomly pick one of the nodes listed in the properties file, we are okay without the Connection pool.

4) Yes the Hostname issue was problematic originally but we have the culster up.

The one issue I do seem to be having is exceptions when I increase the number of threads, if you check the exceptions.txt file, you'll see an exception with a number like X-YY

the X is the workload type (YCSB specific), and Y is the number of threads.

I'm running on the cross product of

{1,2,3} of {1,2,4,8,16}

Of note is that it only appears in

in (2,2),(2.4),(2,8),(2,16),(3,4),(3,8),(3.16).

I have run the tests twice and it appears to be the same set for each.

The workloads just correspond to what kind of operations are run. In workload 1 it's only reads, and in workload 2 it's 50% read 50% update, and in workload 3 it's 10% read, 10% update and 80% insert.If I had to venture a very uninformed guess is this perhaps a concurrency bug in the library?

exceptions.txt

ScalarisDB.java

Florian Schintke

unread,

Nov 15, 2011, 4:39:08 AM11/15/11

to scal...@googlegroups.com

Hi Steve,

> The one issue I do seem to be having is exceptions when I increase the number
> of threads, if you check the exceptions.txt file, you'll see an exception with
> a number like X-YY
> the X is the workload type (YCSB specific), and Y is the number of threads.
>
> I'm running on the cross product of
> {1,2,3} of {1,2,4,8,16}
>
> Of note is that it only appears in
> in (2,2),(2.4),(2,8),(2,16),(3,4),(3,8),(3.16).
> I have run the tests twice and it appears to be the same set for each.
>
> The workloads just correspond to what kind of operations are run. In workload 1
> it's only reads, and in workload 2 it's 50% read 50% update, and in workload 3
> it's 10% read, 10% update and 80% insert.If I had to venture a very uninformed
> guess is this perhaps a concurrency bug in the library?

(1) Do you have concurrent operations on the same key?

Then, you have to expect {fail, abort} events from time to time and
should retry the operation.

Scalaris performs each operation as a transaction:

(a) It reads the status of the distributed replicas and remembers their
version stamp of the majority of the replicas.
(b) It tries to establish the update on a majority of replicas.
(c) If another transaction modified the item in the meantime,
Scalaris detects this and reports {fail, abort}.

In general {fail, abort} may happen when a read and a write operation
or two writes occur at the same time (on the same key). As you perform
only single operations, conflicts can only happen on two concurrent
writes. Single op reads should always succeed as they just do a quorum
read.

> 2-16:de.zib.scalaris.AbortException: Erlang message: {fail,abort}

If you just want to override the key and are not interested in the
newer content another thread has written, you can just retry until you
succeed.

You scenario 1 remains fail-free as concurrent reads are not conflicting.

(2) If all your operations are running on disjunct keys and
concurrency on the same keys cannot happen, Scalaris is doing
something wrong and we have to investigate this further.

Cheers

Florian

Steve R

unread,

Nov 16, 2011, 6:32:40 PM11/16/11

to scal...@googlegroups.com

Yeah that makes sense, I assume YCSB is probably writing the same key concurrently. Yes I will simply have the threads retry until they can make the write.

Thanks,

--
You received this message because you are subscribed to the Google Groups "scalaris" group.
To post to this group, send email to scal...@googlegroups.com.
To unsubscribe from this group, send email to scalaris+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scalaris?hl=en.

Steve Ramage

unread,

Nov 28, 2011, 8:51:47 AM11/28/11

to scalaris

Hello,

So I've run some benchmarks with Scalaris, and the driver I wrote. To
handle the AbortExceptions, I've tried two strategies, one spinning
retry, and the other sleeping about 100 ms, before retrying. Even with
a read only load, the results I get for Scalaris seem like something
is configured incorrectly.

We are generally concerned with measuring how various Key Value
stories handle Latency while throughput increases, we have 4x Large
(7.5 GB RAM, 4x cores) machines running on Amazon EC2, and another
running YCSB to generate traffic, we run the tests with more and more
threads, until the throughput starts decreasing and latency increases.

For Oracle NoSQL on Reads, that seems to occur at about ~6000 ops/sec,
with average latency being 4.5 ms. For a MySQL Cluster, ~8000 ops/sec
and 3.5 ms. For a sharded redis (which will be much faster), ~10,000
op/sec latency 3.0 ms, and for Voldemort it seems at ~4000 ops/sec ~
1.0 ms.
For Scalaris the peak throughput is at ~4000 ops/sec, but the latency
is at around 250 ms. Now there is a bunch of sensititivity in the
larger numbers, but the Latency vs Throughput graph, has Scalaris
shoot way up, and the others remain flat. I'm wondering if this is to
be expected.

I have one node as the master. All the nodes share a configuration
file mounted over network (I hope this isn't the problem). The highest
throughput seemed to occur at around 32-64 threads making
simulataneous requests.

Anyway I was wondering if you had any thoughts as to why the latency
was so high.

On Nov 11, 3:52 pm, Nico Kruber <kru...@zib.de> wrote:
> Hi Steve,
> thank you for choosing Scalaris among the K/V stores to bench. We are very
> interested in the results you gather.
> I haven't dug that deep into this topic yet but let me know if I can help any
> further (next time a bit quicker I hope).
>
> Regarding your questions, see below (inline)
>
> On Monday 07 November 2011 09:24:35 Steve Ramage wrote:
>
>
>
>
>
>
>
>
>
> > I am in the middle of a project to benchmark various Key-Value stores
> > and Scalaris was chosen as one project to check. To the best of my
> > knowledge there is no YCSB driver (as a recent mailing list posting
> > suggests). I have written one (it's actually quite simple), but i
> > wanted to check if there was any additional tuning or approach that
> > would be better.
>
> > For those who don't know a YCSB driver is simply an interface that
> > implements the following operations:
>
> > read()
> > scan()
> > update()
> > insert()
> > delete()
>
> > From what I can gather, the delete and scan operations are not
> > supported.
>
> we do support delete but discourage using it as some side-effects might occur,

> seehttps://code.google.com/p/scalaris/wiki/FAQ#How_do_I_delete_a_key?

> may use IP addresses instead, e.g. to start a Scalaris node use> ./bin/scalarisctl -s -n n...@10.0.0.1
>
> You can then enter "n...@10.0.0.1" into the properties file.

>
> You should also note that we only support some data types which are mentioned
> in our user-dev-guide distributed with the sources. I don't know what YCSB
> needs, but among our basic data types are String, Integer, Long, BigInteger,
> Boolean, byte[], Double...
>
> Regards
> Nico Kruber
>

> signature.asc
> < 1KViewDownload

Steve Ramage

unread,

Nov 28, 2011, 4:02:47 PM11/28/11

to scalaris

To give an idea of the numbers we are seeing, we are running on a 4
node ring that consists of 4 Large Amazon EC2 instances. Allegedly
they have 7.5 GB of RAM each. We are inserting records that are about
1K each. Primarily as I alluded to previously we are looking at how
Latency increases as Throughput Increases (up to the point where your
throughput drops because you are thrashing),

We load up 100,000 rows, and then depending on the workload do one of
the following:
(When I say Randomly, I mean that the chance we read a record is given
by the Zipfain distribution).

Workload 1: Randomly read a record.
Workload 2: 50% chance we randomly read a record, or 50% chance we
write a record.
Workload 3: 80% chance we insert a record, and 20% chance we read it.

We compared a few different data stores, Oracle NoSQL, MySQL Cluster,
Voldemort, as well as Scalaris.

All the workloads had Scalaris having huge Average Latency, for
surprising little latency.

Workload 2 for instance:
These numbers are ballparked
Data Store, Threads Max Throughput (op/sec), Latency
(update) (us) Latency (read) (us)
MySQL 16
6663 2300
2200
Oracle NoSQL 16
3433 7400
1600
Voldemort 4
2625 1920 966
Scalaris 32
618 72000
26000

As you can see the numbers for Scalaris are two orders of magnitude
greater than (one magnitude less throughput, for one magnitude greater
latency).

Anyway I just wondering on any thoughts, the numbers are the same for
the workload that consists only of reads.

Nico Kruber

unread,

Dec 2, 2011, 6:20:31 AM12/2/11

to scal...@googlegroups.com

Hi Steve,
let me briefly address some points inline in your last two emails:

On Monday 28 November 2011 05:51:47 Steve Ramage wrote:
> So I've run some benchmarks with Scalaris, and the driver I wrote. To
> handle the AbortExceptions, I've tried two strategies, one spinning
> retry, and the other sleeping about 100 ms, before retrying. Even with
> a read only load, the results I get for Scalaris seem like something
> is configured incorrectly.

Actually, you should not get any AbortExceptions in the read-only use case
(and I hope you didn't). Sleeping for 100ms is probably a bit too much in your
scenario though but it is probably good to wait a few ms before re-trying.
Since you only have single-write transactions, a value <10ms is probably the
best but this needs to be evaluated.

> We are generally concerned with measuring how various Key Value
> stories handle Latency while throughput increases, we have 4x Large
> (7.5 GB RAM, 4x cores) machines running on Amazon EC2, and another
> running YCSB to generate traffic, we run the tests with more and more
> threads, until the throughput starts decreasing and latency increases.

According to https://aws.amazon.com/ec2/instance-types/ the large instances
only have 2 CPU cores, i.e. "virtual cores". Or is it actually 4?

> For Oracle NoSQL on Reads, that seems to occur at about ~6000 ops/sec,
> with average latency being 4.5 ms. For a MySQL Cluster, ~8000 ops/sec
> and 3.5 ms. For a sharded redis (which will be much faster), ~10,000
> op/sec latency 3.0 ms, and for Voldemort it seems at ~4000 ops/sec ~
> 1.0 ms.
> For Scalaris the peak throughput is at ~4000 ops/sec, but the latency
> is at around 250 ms. Now there is a bunch of sensititivity in the
> larger numbers, but the Latency vs Throughput graph, has Scalaris
> shoot way up, and the others remain flat. I'm wondering if this is to
> be expected.
>
> I have one node as the master. All the nodes share a configuration
> file mounted over network (I hope this isn't the problem). The highest
> throughput seemed to occur at around 32-64 threads making
> simulataneous requests.

Sharing a configuration file shouldn't be a problem since this is read into
memory during program start (as is most of the code, so it should be safe to
also have Scalaris itself in a network-mounted file system).

Recall that Scalaris uses 4 replicas per K/V item. If you are only using 4
Scalaris nodes, then for every transaction every single node will be used. As
DB access needs to be serialised, maybe this is your bottleneck. You could try
running 2 Scalaris nodes on each machine, e.g. 1 Scalaris node per CPU core,
or maybe more. This way, data will be spread over several nodes reducing this
bottleneck.

Actually, 72ms for writes and 26ms for reads don't look too bad after all. The
difference between the other solutions and Scalaris is that we need to contact
all 4 replicas (or at least a majority, i.e. 3) in order to successfully and
consistently read/write a value for which some communication is required.
Keeping 4 replicas allows us to offer high availability in case of node or
network failures. This is something which is difficult to grasp in a
benchmark.
Please also keep in mind that the YCSB benchmark is quite simple, i.e. it only
allows to bench reads or writes. Other work loads like transactions involving
multiple keys, mixed reads/writes etc. are more interesting, especially, since
those are the points where simple data-partitioning schemes are not working
anymore or involve costly operations.

If you like, I could have a look at your new YCSB driver implementation to see
if there's anything that negatively influences Scalaris' performance. If you
send me (maybe off list) this class, your workload and a brief description on
how to set up the bench, I could look a bit deeper.

Otherwise, increasing the number of nodes per machine (as mentioned above)
should probably give you some more performance. Maybe instead of using 4 large
instances, you could also try using 16 small instances which cost about the
same thing.
After setting up Scalaris, please verify that the ranges are split equally
among the participating nodes. You can use the our web debug interface for
that (per default on <IP>:8000/ring.yaws). If they are started one after
another, they should automatically arrange themselves in such a way. Starting
them simultaneously may yield different partitions.