Exposing cassandra-as-a-service : How to allow an end user to connect to the db instance using the java driver ?

729 views
Skip to first unread message

Rohit Sardesai

unread,
Feb 17, 2016, 5:11:22 AM2/17/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hello,

I am trying to expose Cassandra as a service to an end-user . The end-user would typically request for a <n>-node cassandra cluster. Once the cluster is provisioned , the user should be able to bind this cassandra service instance to his app. In the java driver , we need to provide ip's of the cassandra seed nodes. Instead of providing the ip, could I provide a  single dns name which would map to a public ip of a node in the cassandra cluster ? What are the best practices of exposing a cassandra cluster in a public cloud ? Should I have an HAProxy in front of the cluster to do this ? I know this would be a bad option since it defeats the purpose with which the client drivers have been built and would introduce a bottleneck. Thoughts ?

Olivier Michallat

unread,
Feb 17, 2016, 12:02:38 PM2/17/16
to java-dri...@lists.datastax.com
Hi,

Yes, having a proxy in front of the cluster would defeat the driver's built-in load balancing. The driver uniquely identifies each node by the address it uses to connect to it, so having all nodes behind the same public address would not work as expected.

One thing that can simplify configuration is if you expose the list of seeds as a single DNS entry with multiple A-records. Then the user can pass that single DNS name to addContactPoint(String) (or addContactPoints(String) on the 2.1 branch), and have all seeds added without having to list individual addresses.

--

Olivier Michallat

Driver & tools engineer, DataStax


On Wed, Feb 17, 2016 at 11:11 AM, Rohit Sardesai <rohits...@gmail.com> wrote:
Hello,

I am trying to expose Cassandra as a service to an end-user . The end-user would typically request for a <n>-node cassandra cluster. Once the cluster is provisioned , the user should be able to bind this cassandra service instance to his app. In the java driver , we need to provide ip's of the cassandra seed nodes. Instead of providing the ip, could I provide a  single dns name which would map to a public ip of a node in the cassandra cluster ? What are the best practices of exposing a cassandra cluster in a public cloud ? Should I have an HAProxy in front of the cluster to do this ? I know this would be a bad option since it defeats the purpose with which the client drivers have been built and would introduce a bottleneck. Thoughts ?

--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Vishy Kasaravalli

unread,
Feb 17, 2016, 12:36:21 PM2/17/16
to java-dri...@lists.datastax.com
Here is one approach that could work.

Host a simple discovery service that exposes a REST API List<String> getContactPoints(String clusterName) to return list of cassandra host names given the cluster name. 

Over ride Cluster.Builder and add a withDiscoveryService(discoveryURL) method. The withDiscoveryService invokes the getContactPoints(clusterName) to get all the cassandra nodes and invokes addContactPoint() on them.

I raised a JIRA below which will help you do this some day with out extending the Cluster.Builder. 

https://datastax-oss.atlassian.net/browse/JAVA-1082 : Add support for a discovery service in Cluster.Builder

On Feb 17, 2016, at 2:11 AM, Rohit Sardesai <rohits...@gmail.com> wrote:

Hello,

I am trying to expose Cassandra as a service to an end-user . The end-user would typically request for a <n>-node cassandra cluster. Once the cluster is provisioned , the user should be able to bind this cassandra service instance to his app. In the java driver , we need to provide ip's of the cassandra seed nodes. Instead of providing the ip, could I provide a  single dns name which would map to a public ip of a node in the cassandra cluster ? What are the best practices of exposing a cassandra cluster in a public cloud ? Should I have an HAProxy in front of the cluster to do this ? I know this would be a bad option since it defeats the purpose with which the client drivers have been built and would introduce a bottleneck. Thoughts ?

Rohit Sardesai

unread,
Feb 18, 2016, 9:59:32 AM2/18/16
to java-dri...@lists.datastax.com
Hello,

Thanks for your inputs. In case of a public cloud, I  won't be typically exposing all the nodes in the cluster ( since all would be having private ip addresses) . Even if I return these addresses to the client , he won't be able to connect . How to address this issue ?

Olivier Michallat

unread,
Feb 18, 2016, 10:18:30 AM2/18/16
to java-dri...@lists.datastax.com
You would typically handle that with the load balancing policy. Either a WhiteListPolicy (or the more generic HostFilterPolicy) that only includes the "public" nodes. Or if the private nodes are in a different datacenter, the DCAwareRoundRobinPolicy with localDC = the public DC, and usedHostsPerRemoteDc = 0.

The downside is that this is client-side, you have to rely on your users correctly configuring the policy.


--

Olivier Michallat

Driver & tools engineer, DataStax


Vishy Kasaravalli

unread,
Feb 18, 2016, 12:26:08 PM2/18/16
to java-dri...@lists.datastax.com
It is up to your discovery service on exactly what to expose to clients. It can vend out only those IPs that client is able to connect. 

Jack Krupansky

unread,
Feb 18, 2016, 12:34:15 PM2/18/16
to java-dri...@lists.datastax.com
To be clear, there are two distinct but related concepts here: 1) contact points, 2) connection IP addresses.

The theory is that you provide only a small number of IP addresses or even better a DNS name to the Cassandra Java driver and then the driver queries the Cassandra cluster on one of those contact points to get the full list of cluster node connection IP addresses (based on the state of Gossip communications among the nodes), and then the load balancing policy of the driver round-robins among those discovered cluster node IP addresses. An external load balancer is not needed.


-- Jack Krupansky

Rohit Sardesai

unread,
Feb 19, 2016, 6:12:50 AM2/19/16
to java-dri...@lists.datastax.com
Ok so say I have a 3-node Cassandra cluster running in an Openstack environment .

Node-1   private ip : 10.10.0.5 / public ip : 54.204.15.20
Node-2   private ip : 10.10.0.6 / public ip :  <none>
Node-3   private ip : 10.10.0.7 / public ip : <none>

So if my client driver ( sitting on  my desktop ) needs to talk to the cluster , I provide the public ip of Node-1 to the cluster ( or a DNS name mapping to this public ip ) as a contact point. The client driver will then query the C* cluster on node-1 and get the full list of cluster node connections IP addresses ( which are private addresses) . So the client won't be able to load-balance amongst the private ip's and it won't be able to connect to those . So effectively it would be able to talk to only those nodes with public ip's . This would again be a bottleneck right ? I cannot have public ip's for all of the nodes in the cluster. Thoughts ?


Jack Krupansky

unread,
Feb 19, 2016, 10:06:47 AM2/19/16
to java-dri...@lists.datastax.com
You need to have public IPs for each node. Period. If you don't, then you really can't get away with saying that you are offering a fully distributed database - which requires that you not have a SPOF (single point of failure), which is what a single IP is. You have nod HA - if the one public node goes down, the cluster is completely inaccessible. Besides, three nodes is really just a toy or development cluster. What cluster sizes are you envisioning in production? Certainly in production you wouldn't have just a single public IP.

-- Jack Krupansky

Rohit Sardesai

unread,
Feb 19, 2016, 12:27:38 PM2/19/16
to java-dri...@lists.datastax.com

We intend to provide the db as a service in a public cloud, so in production we would be targeting typically 1000+ nodes deployed across multiple regions. The three node cluster was just an example.

Lahiru

unread,
May 10, 2016, 2:09:48 PM5/10/16
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Rohit,

I have a very similar issue to address. I am thinking of exposing HTTP API and expose the data though the API, but if there is a better solution it would be great because if I want to query data in a  new way I always have to change my API.

Lahiru
Reply all
Reply to author
Forward
0 new messages