Questions on Ec2Snitch and replication strategies

127 views
Skip to first unread message

Johann Tagle

<johanntagle@gmail.com>
unread,
Jul 3, 2017, 4:38:06 AM7/3/17
to ScyllaDB users
Hi.  Right now we have a 3-node cluster on AWS, single region, one on each availability zone.  We used Ec2Snitch.  Have some questions:


It says:

"Use the Ec2Snitch for simple cluster deployments on Amazon EC2 where all nodes in the cluster are within a single region.

In EC2 deployments, the region name is treated as the datacenter name and availability zones are treated as racks within a datacenter."


Then a few lines down, it says:


"If you need multiple datacenters, set the dc_suffix options in the cassandra-rackdc.properties file. Any other lines are ignored."


If all nodes are within a single region, and if a region name is treated as the datacenter name, how can you have multiple datacenters?



2.  Our intention is to have a copy of the data on each availability zone.  So right now we define keyspaces with "WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'us-west-2' : 3 };"  The question is, is this the best way to do it?  Because I notice I still achieve getting 100% copy of data on each node (reported by nodetool status) if we use SimpleStrategy with replication_factor set to 3.  Wondering what will happen if we add nodes to each availability zone without changing replication factors - will both replication strategies result in data being spread across nodes within an AZ?


When using SimpleStrategy with Ec2Snitch, which one will take effect - SimpleStrategy's "Additional replicas are placed on the next nodes clockwise in the ring without considering topology" or Ec2Snitch's "knowledge" of the racks and that "Scylla will do its best not to have more than one replica on the same "rack""


Thanks!


Johann

Johann Vincent Paul Tagle

<johanntagle@gmail.com>
unread,
Jul 4, 2017, 12:30:18 PM7/4/17
to ScyllaDB users
anyone?
--
You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/CrPOmQaER14/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-user...@googlegroups.com.
To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/9770ae42-5df3-40a6-be78-d8f99b92b3a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Glauber Costa

<glauber@scylladb.com>
unread,
Jul 4, 2017, 4:17:01 PM7/4/17
to ScyllaDB users
On Mon, Jul 3, 2017 at 4:38 AM, Johann Tagle <johan...@gmail.com> wrote:
> Hi. Right now we have a 3-node cluster on AWS, single region, one on each
> availability zone. We used Ec2Snitch. Have some questions:
>
> 1. These lines from
> https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configSnitchEC2.html
> confuses me:
>
> It says:
>
> "Use the Ec2Snitch for simple cluster deployments on Amazon EC2 where all
> nodes in the cluster are within a single region.
>
> In EC2 deployments, the region name is treated as the datacenter name and
> availability zones are treated as racks within a datacenter."
>
>
> Then a few lines down, it says:
>
>
> "If you need multiple datacenters, set the dc_suffix options in the
> cassandra-rackdc.properties file. Any other lines are ignored."
>
>
> If all nodes are within a single region, and if a region name is treated as
> the datacenter name, how can you have multiple datacenters?

If you specify that, then the region name is no longer treated as the
(sole) datacenter name.
It will now be region name + suffix.

>
>
>
> 2. Our intention is to have a copy of the data on each availability zone.
> So right now we define keyspaces with "WITH REPLICATION = {'class' :
> 'NetworkTopologyStrategy', 'us-west-2' : 3 };" The question is, is this the
> best way to do it? Because I notice I still achieve getting 100% copy of
> data on each node (reported by nodetool status) if we use SimpleStrategy
> with replication_factor set to 3. Wondering what will happen if we add
> nodes to each availability zone without changing replication factors - will
> both replication strategies result in data being spread across nodes within
> an AZ?

Different AZs within the same region appear as different racks.

Also quoting DS documentation, you have:

"NetworkTopologyStrategy attempts to place replicas on distinct racks
because nodes in the same rack (or similar physical grouping) often
fail at the same time due to power, cooling, or network issues."

(http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html)

Therefore, if you add nodes in more AZs, you will have a copy in each
AZ (assuming RF=3, and num_AZs=3)

>
>
> When using SimpleStrategy with Ec2Snitch, which one will take effect -
> SimpleStrategy's "Additional replicas are placed on the next nodes clockwise
> in the ring without considering topology" or Ec2Snitch's "knowledge" of the
> racks and that "Scylla will do its best not to have more than one replica on
> the same "rack""

We do not recommend using SimpleStrategy in this case at all.
But if you do, then the rack information will just be ignored.

>
>
> Thanks!
>
>
> Johann
>
> --
> You received this message because you are subscribed to the Google Groups
> "ScyllaDB users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Johann Vincent Paul Tagle

<johanntagle@gmail.com>
unread,
Jul 4, 2017, 9:38:11 PM7/4/17
to scylladb-users@googlegroups.com
Thanks Glauber for the response.  Just one follow-up question below:

On Wed, Jul 5, 2017 at 4:17 AM Glauber Costa <gla...@scylladb.com> wrote:
> "If you need multiple datacenters, set the dc_suffix options in the
> cassandra-rackdc.properties file. Any other lines are ignored."
>
> If all nodes are within a single region, and if a region name is treated as
> the datacenter name, how can you have multiple datacenters?

If you specify that, then the region name is no longer treated as the
(sole) datacenter name.
It will now be region name + suffix.


Ok so multiple datacenters in this case do not mean multiple regions.  Specifying that suffix just effectively "splits" a region, right?  So if I want a cluster that has nodes in us-west-1 and us-west-2 I need Ec2MultiRegionSnitch?

Thanks again.

Johann

Glauber Costa

<glauber@scylladb.com>
unread,
Jul 4, 2017, 9:46:39 PM7/4/17
to ScyllaDB users
On Tue, Jul 4, 2017 at 9:38 PM, Johann Vincent Paul Tagle
<johan...@gmail.com> wrote:
> Thanks Glauber for the response. Just one follow-up question below:
>
> On Wed, Jul 5, 2017 at 4:17 AM Glauber Costa <gla...@scylladb.com> wrote:
>>
>> > "If you need multiple datacenters, set the dc_suffix options in the
>> > cassandra-rackdc.properties file. Any other lines are ignored."
>> >
>> > If all nodes are within a single region, and if a region name is treated
>> > as
>> > the datacenter name, how can you have multiple datacenters?
>>
>> If you specify that, then the region name is no longer treated as the
>> (sole) datacenter name.
>> It will now be region name + suffix.
>>
>
> Ok so multiple datacenters in this case do not mean multiple regions.

Exactly. That is a very uncommon scenario, but say you want to test
it, or try something more esoteric: that will allow you to do so. I
don't know of anyone deploying it this way.

> Specifying that suffix just effectively "splits" a region, right?

yes.

So if I
> want a cluster that has nodes in us-west-1 and us-west-2 I need
> Ec2MultiRegionSnitch?
>

Yes. If they are in different (real) regions altogether, then you
should use the EC2MultiRegionSnitch and configure your yaml to use the
public, rather than the private IPs.

> Thanks again.
>
> Johann
>
> --
> You received this message because you are subscribed to the Google Groups
> "ScyllaDB users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scylladb-user...@googlegroups.com.
> To post to this group, send email to scyllad...@googlegroups.com.
> Visit this group at https://groups.google.com/group/scylladb-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/scylladb-users/CAG-7zq00sR9D%2Bj%2B2g-HQr75b93qtNUvMnjZzFNteHeaYEMt%3Dhw%40mail.gmail.com.

Johann Vincent Paul Tagle

<johanntagle@gmail.com>
unread,
Jul 4, 2017, 9:51:41 PM7/4/17
to scylladb-users@googlegroups.com
Thanks!

You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/CrPOmQaER14/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-user...@googlegroups.com.

To post to this group, send email to scyllad...@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
Reply all
Reply to author
Forward
0 new messages