Re: counters + replication = awful performance?

Showing 1-14 of 14 messages
Re: counters + replication = awful performance? Juan Valencia 11/27/12 9:38 AM
Hi Sergey,

I know I've had similar issues with counters which were bottle-necked by network throughput.  You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes.  It might not be your case, but that was what happened to me :-)

Juan


On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir <solf....@gmail.com> wrote:
Hi,

I have a serious problem with counters performance and I can't seem to
figure it out.

Basically I'm building a system for accumulating some statistics "on the
fly" via Cassandra distributed counters. For this I need counter updates to
work "really fast" and herein lies my problem -- as soon as I enable
replication_factor = 2, the performance goes down the drain. This happens in
my tests using both 1.0.x and 1.1.6.

Let me elaborate:

I have two boxes (virtual servers on top of physical servers rented
specifically for this purpose, i.e. it's not a cloud, nor it is shared;
virtual servers are managed by our admins as a way to limit damage as I
suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because
I want to be able to do some range queries.

First, I set up Cassandra individually on each box (not in a cluster) and
test counter increments performance (exclusively increments, no reads). For
tests I use code that is intended to somewhat resemble the expected load
pattern -- particularly the majority of increments create new counters with
some updating (adding) to already existing counters. In this test each
single node exhibits respectable performance - something on the order of 70k
(seventy thousand) increments per second.

I then join both of these nodes into single cluster (using SimpleSnitch and
SimpleStrategy, nothing fancy yet). I then run the same test using
replication_factor=1. The performance is on the order of 120k increments per
second -- which seems to be a reasonable increase over the single node
performance.


HOWEVER I then rerun the same test on the two-node cluster using
replication_factor=2 -- which is the least I'll need for actual production
for redundancy purposes. And the performance I get is absolutely horrible --
much, MUCH worse than even single-node performance -- something on the order
of less than 25k increments per second. In addition to clients not being
able to push updates fast enough, I also see a lot of 'messages dropped'
messages in the Cassandra log under this load.

Could anyone advise what could be causing such drastic performance drop
under replication_factor=2? I was expecting something on the order of
single-node performance, not approximately 3x less.


When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes
through the roof. On 1.0.x I think it looked more like disk overload, but
I'm not sure (being on virtual server I apparently can't see true iostats).

I do have Cassandra data on a separate disk, commit log and cache are
currently on the same disk as the system. I experimented with commit log
flush modes and even with disabling commit log at all -- but it doesn't seem
to have noticeable impact on the performance when under
replication_factor=2.


Any suggestions and hints will be much appreciated :) And please let me know
if I need to share additional information about the configuration I'm
running on.

Best regards,
Sergey



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html
Sent from the cassand...@incubator.apache.org mailing list archive at Nabble.com.



--

Learn More:  SQI (Social Quality Index) - A Universal Measure of Social Quality

Re: counters + replication = awful performance? Michael Kjellman 11/27/12 10:02 AM
Are you writing with QUORUM consistency or ONE?

On 11/27/12 9:52 AM, "Sergey Olefir" <solf....@gmail.com> wrote:

>Hi Juan,
>
>thanks for your input!
>
>In my case, however, I doubt this is the case -- clients are able to push
>many more updates than I need to saturate replication_factor=2 case (e.g.
>I'm doing as many as 6x more increments when testing 2-node cluster with
>replication_factor=1), so bandwidth between clients and server should be
>sufficient.
>
>Bandwidth between nodes in the cluster should also be quite sufficient
>since
>they are both in the same DC. But it is something to check, thanks!
>
>Best regards,
>Sergey
>
>
>Juan Valencia wrote
>> Hi Sergey,
>>
>> I know I've had similar issues with counters which were bottle-necked by
>> network throughput.  You might be seeing a problem with throughput
>>between
>> the clients and Cass or between the two Cass nodes.  It might not be
>>your
>> case, but that was what happened to me :-)
>>
>> Juan
>>
>>
>> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
>
>> solf.lists@
>> cassandra-user@.apache
>
>>  mailing list archive at
>>> Nabble.com.
>>>
>>
>>
>>
>> --
>>
>> Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
>> Quality
>
>
>
>
>
>--
>View this message in context:
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
>replication-awful-performance-tp7583993p7583996.html
>Sent from the cassand...@incubator.apache.org mailing list archive at
>Nabble.com.


'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook


Re: counters + replication = awful performance? Edward Capriolo 11/27/12 1:44 PM
The difference between Replication factor =1 and replication factor > 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes.

You may want to experiment with the very dangerous column family attribute:

- replicate_on_write: Replicate every counter update from the leader to the
follower replicas. Accepts the values true and false.

Edward
Re: counters + replication = awful performance? Scott McKay 11/27/12 3:13 PM
We're having a similar performance problem.  Setting 'replicate_on_write: false' fixes the performance issue in our tests.

How dangerous is it?  What exactly could go wrong?
>> cassand...@.apache

>
>>  mailing list archive at
>>> Nabble.com.
>>>
>>
>>
>>
>> --
>>
>> Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
>> Quality
>
>
>
>
>
>--
>View this message in context:
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
>replication-awful-performance-tp7583993p7583996.html
>Sent from the cassand...@incubator.apache.org mailing list archive at
>Nabble.com.


'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.

Visit http://barracudanetworks.com/facebook






--
Scott McKay, Sr. Software Developer
MailChannels

Tel: +1 604 685 7488 x 509
www.mailchannels.com
Re: counters + replication = awful performance? Edward Capriolo 11/27/12 3:21 PM
I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it.

Re: counters + replication = awful performance? Edward Capriolo 11/27/12 4:14 PM

Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load.

In many cases incrementing the same counter n times can be avoided.

Twitter's rainbird did just that. It avoided multiple counter increments by batching them.

I have done a similar think using cassandra and Kafka.

https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java


On Tuesday, November 27, 2012, Sergey Olefir <solf....@gmail.com> wrote:
> Hi, thanks for your suggestions.
>
> Regarding replicate=2 vs replicate=1 performance: I expected that below
> configurations will have similar performance:
> - single node, replicate = 1
> - two nodes, replicate = 2 (okay, this probably should be a bit slower due
> to additional overhead).
>
> However what I'm seeing is that second option (replicate=2) is about THREE
> times slower than single node.
>
>
> Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA
> discusses, if you make changes to your ring (moving tokens and such) you
> will *silently* lose data. That is on top of whatever data you might end up
> losing if you run replicate_on_write=false and the only node that got the
> data fails.
>
> But what is much worse -- with replicate_on_write being false the data will
> NOT be replicated (in my tests) ever unless you explicitly request the cell.
> Then it will return the wrong result. And only on subsequent reads it will
> return adequate results. I haven't tested it, but documentation states that
> range query will NOT do 'read repair' and thus will not force replication.
> The test I did went like this:
> - replicate_on_write = false
> - write something to node A (which should in theory replicate to node B)
> - wait for a long time (longest was on the order of 5 hours)
> - read from node B (and here I was getting null / wrong result)
> - read from node B again (here you get what you'd expect after read repair)
>
> In essence, using replicate_on_write=false with rarely read data will
> practically defeat the purpose of having replication in the first place
> (failover, data redundancy).
>
>
> Or, in other words, this option doesn't look to be applicable to my
> situation.
>
> It looks like I will get much better performance by simply writing to two
> separate clusters rather than using single cluster with replicate=2. Which
> is kind of stupid :) I think something's fishy with counters and
> replication.
>
>
>
> Edward Capriolo wrote

>> I mispoke really. It is not dangerous you just have to understand what it
>> means. this jira discusses it.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-3868
>>
>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay &lt;
>
>> scottm@
>
>> &gt;wrote:

>>
>>>  We're having a similar performance problem.  Setting
>>> 'replicate_on_write:
>>> false' fixes the performance issue in our tests.
>>>
>>> How dangerous is it?  What exactly could go wrong?
>>>
>>> On 12-11-27 01:44 PM, Edward Capriolo wrote:
>>>
>>> The difference between Replication factor =1 and replication factor > 1
>>> is
>>> significant. Also it sounds like your cluster is 2 node so going from
>>> RF=1
>>> to RF=2 means double the load on both nodes.
>>>
>>>  You may want to experiment with the very dangerous column family
>>> attribute:
>>>
>>>  - replicate_on_write: Replicate every counter update from the leader to
>>> the
>>> follower replicas. Accepts the values true and false.
>>>
>>>  Edward
>>>  On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman <
>>>
>
>> mkjellman@

>
>>> wrote:
>>>
>>>> Are you writing with QUORUM consistency or ONE?
>>>>
>>>> On 11/27/12 9:52 AM, "Sergey Olefir" &lt;
>
>> solf.lists@
>>>>>>> 'Like' us on Facebook for exclusive content and other resources on all
>>>> Barracuda Networks solutions.
>>>>
>>>> Visit http://barracudanetworks.com/facebook
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> *Scott McKay*, Sr. Software Developer

>>> MailChannels
>>>
>>> Tel: +1 604 685 7488 x 509
>>> www.mailchannels.com
>>>
>
>
>
>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html
Re: counters + replication = awful performance? Edward Capriolo 11/27/12 4:21 PM

By the way the other issues you are seeing with replicate on write at false could be because you did not repair. You should do that when changing rf. >>>>> >Bandwidth between nodes in the cluster should also be >>>>>>> 'Like' us on Facebook for exclusive content and other resources on all
Re: counters + replication = awful performance? Edward Capriolo 11/27/12 5:26 PM
Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node.  If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down.

To figure out what is going on we would need to see tpstat, iostat , and top information.

I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance.

You do not get the benefits of "scala out" don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better.

On Tuesday, November 27, 2012, Sergey Olefir <solf....@gmail.com> wrote:
> I already do a lot of in-memory aggregation before writing to Cassandra.
>
> The question here is what is wrong with Cassandra (or its configuration)
> that causes huge performance drop when moving from 1-replication to
> 2-replication for counters -- and more importantly how to resolve the
> problem. 2x-3x drop when moving from 1-replication to 2-replication on two
> nodes is reasonable. 6x is not. Like I said, with this kind of performance
> degradation it makes more sense to run two clusters with replication=1 in
> parallel rather than rely on Cassandra replication.
>
> And yes, Rainbird was the inspiration for what we are trying to do here :)
>
>
>
> Edward Capriolo wrote

>> Cassandra's counters read on increment. Additionally they are distributed
>> so that can be multiple reads on increment. If they are not fast enough
>> and
>> you have avoided all tuning options add more servers to handle the load.
>>
>> In many cases incrementing the same counter n times can be avoided.
>>
>> Twitter's rainbird did just that. It avoided multiple counter increments
>> by
>> batching them.
>>
>> I have done a similar think using cassandra and Kafka.
>>
>> https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java
>>
>>
>> On Tuesday, November 27, 2012, Sergey Olefir &lt;
>
>> solf.lists@>>>> cassandra-user@.apache

>
>>  mailing list archive at
>> Nabble.com.
>>>
>
>
>
>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html
Re: counters + replication = awful performance? Sylvain Lebresne 11/28/12 12:24 AM
Counters replication works in different ways than the one of "normal" writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a degenerate case. So there is two reason why RF>2 is much slower than RF=1:
1) it involves a read to replicate and that read takes times. Especially if that read hits the disk, it may even dominate the insertion time.
2) the replication to the first replica and the one to the res of the replica are not done in parallel but sequentially. Note that this is only true for the first replica versus the othere. In other words, from RF=2 to RF=3 you should see a significant performance degradation.

Note that while there is nothing you can do for 2), you can try to speed up 1) by using row cache for instance (in case you weren't).

In other words, with counters, it is expected that RF=1 be potentially much faster than RF>1. That is the way counters works.

And don't get me wrong, I'm not suggesting you should use RF=1 at all. What I am saying is that the performance you see with RF=2 is the performance of counters in Cassandra.

--
Sylvain


On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir <solf....@gmail.com> wrote:
I think there might be a misunderstanding as to the nature of the problem.

Say, I have test set T. And I have two identical servers A and B.
- I tested that server A (singly) is able to handle load of T.
- I tested that server B (singly) is able to handle load of T.
- I then join A and B in the cluster and set replication=2 -- this means
that each server in effect has to handle full test load individually
(because there are two servers and replication=2 it means that each server
effectively has to handle all the data written to the cluster). Under these
circumstances it is reasonable to assume that cluster A+B shall be able to
handle load T because each server is able to do so individually.

HOWEVER, this is not the case. In fact, A+B together are only able to handle
less than 1/3 of T DESPITE the fact that A and B individually are able to
handle T just fine.

I think there's something wrong with Cassandra replication (possibly as
simple as me misconfiguring something) -- it shouldn't be three times faster
to write to two separate nodes in parallel as compared to writing to 2-node
Cassandra cluster with replication=2.


Edward Capriolo wrote

> Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node.
> If you go to rf2 that is 100 inserts a node.  If you were at 75 % capacity
> on each mode your now at 150% which is not possible so things bog down.
>
> To figure out what is going on we would need to see tpstat, iostat , and
> top information.
>
> I think your looking at the performance the wrong way. Starting off at rf
> 1
> is not the way to understand cassandra performance.
>
> You do not get the benefits of "scala out" don't happen until you fix your
> rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf
> 3 even better.
> On Tuesday, November 27, 2012, Sergey Olefir &lt;

> solf.lists@View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584031.html

Sent from the cassand...@incubator.apache.org mailing list archive at Nabble.com.

Re: counters + replication = awful performance? Robin Verlangen 11/28/12 12:29 AM
Not sure whether it's an option for you, but you might consider to do some in-memory aggregation of counter values and flushing only once every X updates / seconds. This will decrease both load, latency and throughput. 
However this is not possible in every single use case. 

Best regards, 

Robin Verlangen
Software engineer




Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: counters + replication = awful performance? Rob Coli 11/28/12 1:21 AM
On Tue, Nov 27, 2012 at 3:21 PM, Edward Capriolo <edlin...@gmail.com> wrote:
> I mispoke really. It is not dangerous you just have to understand what it
> means. this jira discusses it.
>
> https://issues.apache.org/jira/browse/CASSANDRA-3868

Per Sylvain on the referenced ticket :

"
I don't disagree about the efficiency of the valve, but at what price?
'Bootstrapping a node will make you lose increments (you don't know
which ones, you don't know how many and this even if nothing goes
wrong)' is a pretty bad drawback. That is pretty much why that option
makes me uncomfortable: it does give you better performance, so people
may be tempted to use it. Now if it was only a matter of replicating
writes only through read-repair/repair, then ok, it's pretty dangerous
but it's rather easy to explain/understand the drawback (if you don't
lose a disk, you don't lose increments, and you'd better use CL.ALL or
have read_repair_chance to 1). But the fact that it doesn't work with
bootstrap/move makes me wonder if having the option at all is not
making a disservice to users.
"

To me anything that can be described as "will make you lose increments
(you don't know which ones, you don't know how many and this even if
nothing goes wrong)" and which therefore "doesn't work with
bootstrap/move" is correctly described as "dangerous." :D

=Rob

--
=Robert Coli
AIM&GTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb
Re: counters + replication = awful performance? Edward Capriolo 11/28/12 7:15 AM
I may be wrong but during a bootstrap hints can be silently discarded, if the node they are destined for leaves the ring. 

There are a large number of people using counters for 5 minute "real-time" statistics. On the back end they use ETL based reporting to compute the true stats at a hourly or daily interval. 

A user like this might benefit from DANGER counters. They are not looking for perfection, only better performance, and the counter row keys themselves role over in 5 minutes anyway. 

Options like this are also great for winning benchmarks. When someone other NoSQL (that is not has fast as c*) wants to win a benchmark they turn off/on WAL, or write acks, or something that compromises their ACID/CAP story for the purpose of winning. We need our own secret awesome-sauce dangerous options too! jk
Re: counters + replication = awful performance? Edward Capriolo 11/28/12 11:58 AM
Just for reference HBase's counters also do a local read. I am not saying they work better/worse/faster/slower but I would not suspect any system that reads on increment to me significantly faster then what Cassandra does. 

Just saying your counter throughput is read bound, this is not unique to C*'s implementation.



On Wed, Nov 28, 2012 at 2:41 PM, Sergey Olefir <solf....@gmail.com> wrote:
Well, those are sad news then. I don't think I can consider 20k increments
per second for a two node cluster (with RF=2) a reasonable performance (cost
vs. benefit).

I might have to look into other storage solutions or perhaps experiment with
duplicate clusters with RF=1 or replicate_on_write=false.

Although yes, I probably should try that row cache you mentioned -- I saw
that key cache was going unused (so saw no reason to try to enable row
cache), but I think it was on RF=1, it might be different on RF=2.


Sylvain Lebresne-3 wrote > On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir &lt; >> >>>> Edward Capriolo wrote

>> >>>>> I mispoke really. It is not dangerous you just have to understand
>> what
>> >>>>> it
>> >>>>> means. this jira discusses it.
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/CASSANDRA-3868
>> >>>>>
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584052.html

Sent from the cassand...@incubator.apache.org mailing list archive at Nabble.com.

Re: counters + replication = awful performance? Rob Coli 11/28/12 11:34 PM
On Wed, Nov 28, 2012 at 7:15 AM, Edward Capriolo <edlin...@gmail.com> wrote:
> I may be wrong but during a bootstrap hints can be silently discarded, if
> the node they are destined for leaves the ring.

Yeah : https://issues.apache.org/jira/browse/CASSANDRA-2434

> A user like this might benefit from DANGER counters. They are not looking
> for perfection, only better performance, and the counter row keys themselves
> role over in 5 minutes anyway.

Yep, I agree that if you don't care about accurate counting, Cassandra
counters may be for you. Cassandra counters in mongo mode are even
more web scale! The unfortunate thing is that people seem to assume
that software does what it is supposed to do, and probably do not get
a great impression of said software when it doesn't. :D