Accessing Cassandra Node by broadcast_rpc_address

147 views
Skip to first unread message

dekarr...@gmail.com

unread,
Nov 15, 2017, 10:53:34 AM11/15/17
to gocql
Hello all,

I've got a Cassandra v3.10 cluster (CQL v3.4.4, native protocol v4) where each node is hosted on a separate compute instance of an IaaS environment. The networking is set up such that each instance in the cluster has a private address (in my case, 192.168.*.* addresses) and a public address (in my case, 10.*.*.* addresses). Nodes bind their services to their private IPs. Each node can reach all other instances via their public addresses, and clients of the cluster must also access nodes via that address.

I'm using the latest version of gocql at the time of this writing, commit 33a5f3c1bcc2c421b3221c5858312afb141bf605.

It seems that gocql is having trouble forming a data connection with the first node it connects to. It's able to establish a control connection and use it to determine the public addresses of peer nodes just fine, but after closing the control connection, it tries to form a data connection with the control node via its private IP address, which fails as it is not accessible.

The nodes are set up with the following config parameters:
| listen_address: <private-ip>
| broadcast_address: <public-ip>
| rpc_address: <private-ip>
| broadcast_rpc_address: <public-ip>

The node being used for the control connection has the following configuration:
| listen_address: 192.168.25.16
| broadcast_address: 10.1.110.1
| rpc_address: 192.168.25.16
| broadcast_rpc_address: 10.1.110.1

I am trying to create the connection with the following test code:
| cluster := gocql.NewCluster("10.1.110.1")
| cluster.Keyspace = "mykeyspace"
| cluster.Authenticator = gocql.PasswordAuthenticator{
| Username: os.Args[1],
| Password: os.Args[2],
| }
|
| cluster.Consistency = gocql.LocalOne
|
| sess, err := cluster.CreateSession()
|
| if err != nil {
| panic(err)
| }
|
| defer sess.Close()
|
| ... // query the cluster, show results

When I execute the program, I get output indicating that the node at 10.1.110.1 is trying to be dialed by its private IP address:
| [dekarrin@meiko 09:14:22 test] $ cat ~/.casspass | xargs go run --tags="gocql_debug" main.go
| 2017/11/15 09:13:26 gocql: Session.handleNodeUp: 192.168.25.16:9042
| 2017/11/15 09:13:28 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
| 2017/11/15 09:13:28 gocql: Session.handleNodeDown: 192.168.25.16:9042
| 2017/11/15 09:13:31 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
| 2017/11/15 09:13:31 gocql: Session.handleNodeDown: 192.168.25.15:9042
| (output of queries)

I narrowed it down such that the initial host sits on its own data center, and made it the only host connected to for data with a host filter:
| ...
| cluster.Consistency = gocql.LocalOne
|
| cluster.HostFilter = gocql.HostFilterFunc(func(host *gocql.HostInfo) bool {
| return host.DataCenter() == "datacenter-of-host"
| })
|
| sess, err := cluster.CreateSession()
|
| if err != nil {
| panic(err)
| }
| ...

And then the connection could not be made at all:
| [dekarrin@meiko 09:30:17 test] $ cat ~/.casspass | xargs go run --tags="gocql_debug" main.go
| 2017/11/15 09:30:24 gocql: Session.handleNodeUp: 192.168.25.16:9042
| 2017/11/15 09:30:26 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
| 2017/11/15 09:30:26 gocql: Session.handleNodeDown: 192.168.25.16:9042
| 2017/11/15 09:30:28 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
| panic: no connections were made when creating the session

When I look at the source for HostInfo, it seems that the rpc_address of the initial host is used as the connection address if it is set:
host_source.go
156| func (h *HostInfo) connectAddressLocked() (net.IP, string) {
157| if validIpAddr(h.connectAddress) {
158| return h.connectAddress, "connect_address"
159| } else if validIpAddr(h.rpcAddress) {
160| return h.rpcAddress, "rpc_adress"
161| } else if validIpAddr(h.preferredIP) {
162| // where does perferred_ip get set?
163| return h.preferredIP, "preferred_ip"
164| } else if validIpAddr(h.broadcastAddress) {
165| return h.broadcastAddress, "broadcast_address"
166| } else if validIpAddr(h.peer) {
167| return h.peer, "peer"
168| }
169| return net.IPv4zero, "invalid"
170| }

According to the docs for the Cassandra configuration as I understand it, it seems that broadcast_rpc_address is the value that cassandra should tell clients to call it by, and my rpc_address is explicitly private. Is there some way I can have gocql use the IP address set as broadcast_rpc_address rather than the one set as rpc_address?

dekarr...@gmail.com

unread,
Nov 15, 2017, 10:56:03 AM11/15/17
to gocql
On Wednesday, November 15, 2017 at 9:53:34 AM UTC-6, dekarr...@gmail.com wrote:
> When I execute the program, I get output indicating that the node at 10.1.110.1 is trying to be dialed by its private IP address:
> | [dekarrin@meiko 09:14:22 test] $ cat ~/.casspass | xargs go run --tags="gocql_debug" main.go
> | 2017/11/15 09:13:26 gocql: Session.handleNodeUp: 192.168.25.16:9042
> | 2017/11/15 09:13:28 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
> | 2017/11/15 09:13:28 gocql: Session.handleNodeDown: 192.168.25.16:9042
> | 2017/11/15 09:13:31 unable to dial "192.168.25.16": dial tcp 192.168.25.16:9042: i/o timeout
> | 2017/11/15 09:13:31 gocql: Session.handleNodeDown: 192.168.25.15:9042
> | (output of queries)
>

My mistake; the IP address in the last line of output should end with a 16, not a 15.

Chris Bannister

unread,
Nov 15, 2017, 1:56:53 PM11/15/17
to dekarr...@gmail.com, gocql
The columns in the systems table unfortunately don't map directly to what the config is as far as I am aware, though your setup should be working, can you post the output of SELECT peer, rpc_address FROM system.peers

Thanks

--
You received this message because you are subscribed to the Google Groups "gocql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gocql+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dekarr...@gmail.com

unread,
Nov 16, 2017, 11:17:36 AM11/16/17
to gocql
Sure. I can't help but think that there must be some way of getting Cassandra to share its broadcast_rpc_address... Though I also noticed that it didn't seem to be the system.local table. It just seems strange that it would have been added without being accessible over native protocol. I'm half-ready to dive into the Cassandra source just to see where broadcast_rpc_address is exposed, but I was hoping someone else might know before I dig in to that.

Anyways, here's my output from the system.peers table:
test_user@cqlsh> SELECT peer, rpc_address FROM system.peers;

peer | rpc_address
------------+-------------
10.1.110.2 | 10.1.110.2
10.1.110.3 | 10.1.110.3
10.2.110.1 | 10.2.110.1
10.2.110.2 | 10.2.110.2
10.2.110.3 | 10.2.110.3

(7 rows)
Reply all
Reply to author
Forward
0 new messages