|Re: counters + replication = awful performance?||Juan Valencia||11/27/12 9:39 AM|
I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-)
On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir <solf....@gmail.com> wrote:
Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality
|Re: counters + replication = awful performance?||Michael Kjellman||11/27/12 10:08 AM|
Are you writing with QUORUM consistency or ONE?
On 11/27/12 9:52 AM, "Sergey Olefir" <solf....@gmail.com> wrote:
>thanks for your input!
>In my case, however, I doubt this is the case -- clients are able to push
>many more updates than I need to saturate replication_factor=2 case (e.g.
>I'm doing as many as 6x more increments when testing 2-node cluster with
>replication_factor=1), so bandwidth between clients and server should be
>Bandwidth between nodes in the cluster should also be quite sufficient
>they are both in the same DC. But it is something to check, thanks!
>Juan Valencia wrote
>> Hi Sergey,>> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir <
>Sent from the cassand...@incubator.apache.org mailing list archive at'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
|Re: counters + replication = awful performance?||Edward Capriolo||11/27/12 1:44 PM|
The difference between Replication factor =1 and replication factor > 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes.
You may want to experiment with the very dangerous column family attribute:
- replicate_on_write: Replicate every counter update from the leader to the
follower replicas. Accepts the values true and false.
|Re: counters + replication = awful performance?||Scott McKay||11/27/12 3:13 PM|
We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests.
How dangerous is it? What exactly could go wrong?
Scott McKay, Sr. Software Developer
Tel: +1 604 685 7488 x 509
|Re: counters + replication = awful performance?||Edward Capriolo||11/27/12 3:21 PM|
I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it.
|Re: counters + replication = awful performance?||Edward Capriolo||11/27/12 4:14 PM|
Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load.
In many cases incrementing the same counter n times can be avoided.
Twitter's rainbird did just that. It avoided multiple counter increments by batching them.
I have done a similar think using cassandra and Kafka.
On Tuesday, November 27, 2012, Sergey Olefir <solf....@gmail.com> wrote:
> Hi, thanks for your suggestions.
> Regarding replicate=2 vs replicate=1 performance: I expected that below
> configurations will have similar performance:
> - single node, replicate = 1
> - two nodes, replicate = 2 (okay, this probably should be a bit slower due
> to additional overhead).
> However what I'm seeing is that second option (replicate=2) is about THREE
> times slower than single node.
> Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA
> discusses, if you make changes to your ring (moving tokens and such) you
> will *silently* lose data. That is on top of whatever data you might end up
> losing if you run replicate_on_write=false and the only node that got the
> data fails.
> But what is much worse -- with replicate_on_write being false the data will
> NOT be replicated (in my tests) ever unless you explicitly request the cell.
> Then it will return the wrong result. And only on subsequent reads it will
> return adequate results. I haven't tested it, but documentation states that
> range query will NOT do 'read repair' and thus will not force replication.
> The test I did went like this:
> - replicate_on_write = false
> - write something to node A (which should in theory replicate to node B)
> - wait for a long time (longest was on the order of 5 hours)
> - read from node B (and here I was getting null / wrong result)
> - read from node B again (here you get what you'd expect after read repair)
> In essence, using replicate_on_write=false with rarely read data will
> practically defeat the purpose of having replication in the first place
> (failover, data redundancy).
> Or, in other words, this option doesn't look to be applicable to my
> It looks like I will get much better performance by simply writing to two
> separate clusters rather than using single cluster with replicate=2. Which
> is kind of stupid :) I think something's fishy with counters and
> Edward Capriolo wrote
>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay <
>>>> On 11/27/12 9:52 AM, "Sergey Olefir" <
>>>>>>> 'Like' us on Facebook for exclusive content and other resources on all>>> *Scott McKay*, Sr. Software Developer
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html
|Re: counters + replication = awful performance?||Edward Capriolo||11/27/12 4:21 PM|
By the way the other issues you are seeing with replicate on write at false could be because you did not repair. You should do that when changing rf. >>>>> >Bandwidth between nodes in the cluster should also be >>>>>>> 'Like' us on Facebook for exclusive content and other resources on all
|Re: counters + replication = awful performance?||Edward Capriolo||11/27/12 5:26 PM|
Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down.
To figure out what is going on we would need to see tpstat, iostat , and top information.
I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance.
You do not get the benefits of "scala out" don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better.
> I already do a lot of in-memory aggregation before writing to Cassandra.
> The question here is what is wrong with Cassandra (or its configuration)
> that causes huge performance drop when moving from 1-replication to
> 2-replication for counters -- and more importantly how to resolve the
> problem. 2x-3x drop when moving from 1-replication to 2-replication on two
> nodes is reasonable. 6x is not. Like I said, with this kind of performance
> degradation it makes more sense to run two clusters with replication=1 in
> parallel rather than rely on Cassandra replication.
> And yes, Rainbird was the inspiration for what we are trying to do here :)
> Edward Capriolo wrote
>> On Tuesday, November 27, 2012, Sergey Olefir <
>> solf.lists@>>>> cassandra-user@.apache
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html
|Re: counters + replication = awful performance?||Sylvain Lebresne||11/28/12 12:24 AM|
Counters replication works in different ways than the one of "normal" writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a degenerate case. So there is two reason why RF>2 is much slower than RF=1:
1) it involves a read to replicate and that read takes times. Especially if that read hits the disk, it may even dominate the insertion time.
2) the replication to the first replica and the one to the res of the replica are not done in parallel but sequentially. Note that this is only true for the first replica versus the othere. In other words, from RF=2 to RF=3 you should see a significant performance degradation.
Note that while there is nothing you can do for 2), you can try to speed up 1) by using row cache for instance (in case you weren't).
In other words, with counters, it is expected that RF=1 be potentially much faster than RF>1. That is the way counters works.
And don't get me wrong, I'm not suggesting you should use RF=1 at all. What I am saying is that the performance you see with RF=2 is the performance of counters in Cassandra.
On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir <solf....@gmail.com> wrote:
I think there might be a misunderstanding as to the nature of the problem.
|Re: counters + replication = awful performance?||Robin Verlangen||11/28/12 12:29 AM|
Not sure whether it's an option for you, but you might consider to do some in-memory aggregation of counter values and flushing only once every X updates / seconds. This will decrease both load, latency and throughput.
However this is not possible in every single use case.
Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
|Re: counters + replication = awful performance?||Rob Coli||11/28/12 1:21 AM|
On Tue, Nov 27, 2012 at 3:21 PM, Edward Capriolo <edlin...@gmail.com> wrote:Per Sylvain on the referenced ticket :
I don't disagree about the efficiency of the valve, but at what price?
'Bootstrapping a node will make you lose increments (you don't know
which ones, you don't know how many and this even if nothing goes
wrong)' is a pretty bad drawback. That is pretty much why that option
makes me uncomfortable: it does give you better performance, so people
may be tempted to use it. Now if it was only a matter of replicating
writes only through read-repair/repair, then ok, it's pretty dangerous
but it's rather easy to explain/understand the drawback (if you don't
lose a disk, you don't lose increments, and you'd better use CL.ALL or
have read_repair_chance to 1). But the fact that it doesn't work with
bootstrap/move makes me wonder if having the option at all is not
making a disservice to users.
To me anything that can be described as "will make you lose increments
(you don't know which ones, you don't know how many and this even if
nothing goes wrong)" and which therefore "doesn't work with
bootstrap/move" is correctly described as "dangerous." :D
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb
|Re: counters + replication = awful performance?||Edward Capriolo||11/28/12 7:15 AM|
I may be wrong but during a bootstrap hints can be silently discarded, if the node they are destined for leaves the ring.
There are a large number of people using counters for 5 minute "real-time" statistics. On the back end they use ETL based reporting to compute the true stats at a hourly or daily interval.
A user like this might benefit from DANGER counters. They are not looking for perfection, only better performance, and the counter row keys themselves role over in 5 minutes anyway.
Options like this are also great for winning benchmarks. When someone other NoSQL (that is not has fast as c*) wants to win a benchmark they turn off/on WAL, or write acks, or something that compromises their ACID/CAP story for the purpose of winning. We need our own secret awesome-sauce dangerous options too! jk
|Re: counters + replication = awful performance?||Edward Capriolo||11/28/12 11:58 AM|
Just for reference HBase's counters also do a local read. I am not saying they work better/worse/faster/slower but I would not suspect any system that reads on increment to me significantly faster then what Cassandra does.
Just saying your counter throughput is read bound, this is not unique to C*'s implementation.
On Wed, Nov 28, 2012 at 2:41 PM, Sergey Olefir <solf....@gmail.com> wrote:
Well, those are sad news then. I don't think I can consider 20k increments
|Re: counters + replication = awful performance?||Rob Coli||11/28/12 11:34 PM|
On Wed, Nov 28, 2012 at 7:15 AM, Edward Capriolo <edlin...@gmail.com> wrote:Yeah : https://issues.apache.org/jira/browse/CASSANDRA-2434
Yep, I agree that if you don't care about accurate counting, Cassandra
counters may be for you. Cassandra counters in mongo mode are even
more web scale! The unfortunate thing is that people seem to assume
that software does what it is supposed to do, and probably do not get
a great impression of said software when it doesn't. :D