Question about removing and adding bigcouch nodes

514 views
Skip to first unread message

joe

unread,
Mar 30, 2016, 7:56:06 AM3/30/16
to 2600hz-dev
We recently have been getting strange errors trying to save voicemails to bigcouch.  In order to remove complexity from the situation and make following logs easier, we have reduced the number of bigcouch servers from 3 to 1.

I see a lot of warnings for heartbeat expired messages for the servers we've taken down. 

First of all, I'm surprised to see the nodes being used directly, because everything is setup to use bigcouch through the load balancer.  

If kazoo is directly talking to bigcouch nodes, what is the point of load balancing bigcouch? This just seems really weird. 

Instead of benefitting from the abstraction of a load balancer, it seems to be going around it.  How do I tell it to stop doing this?


Darren Schreiber

unread,
Mar 30, 2016, 7:56:52 AM3/30/16
to 2600h...@googlegroups.com
What you've stated doesn't make sense. How are you coming to this conclusion? Please post relevant logs that back this up. Sounds like something is badly misconfigured.

Also, how did you "remove" BigCouch nodes?

--
You received this message because you are subscribed to the Google Groups "2600hz-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 2600hz-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jawaid Bazyar

unread,
Mar 30, 2016, 1:20:02 PM3/30/16
to 2600hz-dev
Hi Joe,

the first thing you want to check is your /etc/kazoo/config.ini file on the host that is running ecallmgr and whapps:

Here are the important bits:

[bigcouch]
compact_automatically = true
cookie = yourcookiegoeshere
ip = "127.0.0.1"
port = 15984
admin_port = 15986

Make sure that the IP is localhost (127.0.0.1), and make sure the port is 15984. This should correspond to your /etc/kazoo/haproxy/haproxy.cfg setup on the same VM:

listen bigcouch-data 127.0.0.1:15984
  balance roundrobin
      server bc1.cluster.net xxxx:5984 check
      server bc2.cluster.net yyyy:5984 check backup
      server bc3.cluster.net zzzz:5984 check backup

If your config is like this, then Kazoo is definitively going through the haproxy load balancer.

Try looking at the loadbalancer stats:

Make sure you have the following in your haproxy.cfg:

listen haproxy-stats-pub 0.0.0.0:22003
  mode http
  stats uri /

Then http://kazoo-ip:22003/

joe

unread,
Mar 31, 2016, 7:15:02 PM3/31/16
to 2600hz-dev, dschr...@2600hz.com
I didn't explain this very well before because it was literally right before I went home.  

My config.ini looks like the following:

[zone]

name = "east-sl"

amqp_uri = "amqp://guest:gu...@172.17.1.139:5672"


[bigcouch]

compact_automatically = true

ip = "172.17.1.232"

port = 5984

admin_port = 5986

zone = "east-sl"

cookie = VxvM7akBsq3FgBbqvvZ6Aw9T92tbD7sYztdqBqoYrFXXGbVpCFHvdMLNfegrrD2Gpg9Wv4onga2QzR96RTyajm2neurnyHXUxKuC

;username = "admin"

;password = "secret"


[whistle_apps]

zone = "east-sl"

cookie = VxvM7akBsq3FgBbqvvZ6Aw9T92tbD7sYztdqBqoYrFXXGbVpCFHvdMLNfegrrD2Gpg9Wv4onga2QzR96RTyajm2neurnyHXUxKuC


[ecallmgr]

zone = "east-sl"

cookie = VxvM7akBsq3FgBbqvvZ6Aw9T92tbD7sYztdqBqoYrFXXGbVpCFHvdMLNfegrrD2Gpg9Wv4onga2QzR96RTyajm2neurnyHXUxKuC


[log]

console = info

file = error



The load balancer for bigcouch is 172.17.1.232.  This ip address is used exclusively for bigcouch's load balancer, all the bigcouch servers have their own ip address.  Our setup has 172.17.1.0/24 reserved exclusively for load balancing services.


In the logs for the whapps host, when bringing up a new cluster, I see the following repeat on a regular basis:

20:14:39.637 [info] connected successfully to 172.17.1.232:5984

20:15:41.199 [info] getting connection information for 172-17-20-61.default.pod.cluster.local, 5984 and 5986

20:15:41.227 [info] connected successfully to 172-17-20-61.default.pod.cluster.local:5984


It looks as if the whistle_apps erlang node is first connecting to the load balancer, asking for its ip address directly, and from then on using it's direct ip address to connect.  I could be wrong, but that's what it looks like it's doing according to the logs.

Load Balancer: 
ip: 172.17.1.232
host: bigcouch.default.svc.cluster.local

BigcouchNode01:
ip: 172.17.20.61
host: 172-17-20-61.default.pod.cluster.local

So this makes me curious, if it is true that kazoo is keeping track of each bigcouch servers direct hostname/ip address:

* Where do I find this information?  
* If I remove a node from the cluster, how do I go about telling kazoo that that server is now gone?
* If I add a bigcouch node to the cluster, how do I go about telling kazoo that there is a new node?

Currently it seems when I remove a node, I get a bunch of lines in the log at a regular interval that complain about a missing heartbeat (I don't have the logs handy right now for that since I just recently reset the cluster, but that was the gist of it)

Thanks for your help as always, it is truly appreciated :)

Jawaid Bazyar

unread,
Mar 31, 2016, 10:19:02 PM3/31/16
to 2600hz-dev, dschr...@2600hz.com
Joe,

based on your description there are some discrepancies with the standard cluster setup.

First, each Kazoo node (i.e., nodes that run whapps/ecallmgr) should run its own local copy of haproxy in order to reach the databases. This way a failure of a single haproxy node will not cause loss of connectivity to the whole database.

Two, the standard configuration puts the load-balanced database port on 15984, not 5984.

Three, I believe whapps may indeed call each database directly but only for purposes of doing periodic compaction on the databases. (Is this right, 2600hz-ers?) But for all routine queries, it goes through the load-balancer.

Finally, once a Bigcouch cluster is setup and initialized, you cannot add or remove nodes from the cluster without building a new bigcouch cluster and using CloneTools to copy the data from old cluster to new cluster.

It will, of course, continue to work if a BigCouch node fails, but you cannot *remove* a node from the cluster and expect it to work - there are apparently assumptions that get baked into the distribution of shards such that you just can't do it.

joe

unread,
Apr 1, 2016, 5:08:44 PM4/1/16
to 2600hz-dev, dschr...@2600hz.com
Ok, so I understand now that bigcouch simply doesn't allow you to scale down, but only up, correct?

In our setup, everything is containerized, so we're not so much taking away nodes, but replacing them elsewhere, and by nature the ips can change anytime. For persistence we're saving bigcouch data to /var/lib/bigcouch/data which is really just a symbolic link to something like /mnt/gv0/$(hostname).

If anyone is curious to take a look at our dockerized bigcouch and all the manifest information for deploying it into kubernetes, check it out. 


So I guess what I'm more specifically wondering about now is how to remove the old hostname and add the new one to kazoo when a node is replaced?  Is there a sup command for this, or is there a document in bigcouch we should edit?  If the latter, what sup command(s) should invalidate the cached info?

Thanks,

Joe

joe

unread,
Apr 1, 2016, 5:40:51 PM4/1/16
to 2600hz-dev, dschr...@2600hz.com
Forgot to add that we're accomplishing the load balancing using a virtual ip rather than a dedicated haproxy.  http://kubernetes.io/docs/admin/kube-proxy/ which allows us to do load balancing using label selectors for container labels, it's also self-configuring.  I want to say it's doing its magic using iptables, but I'm not entirely sure.  Interestingly enough it works for udp as well. :)


On Thursday, March 31, 2016 at 10:19:02 PM UTC-4, Jawaid Bazyar wrote:

Henrique Fernandes

unread,
Oct 11, 2016, 4:37:59 PM10/11/16
to 2600hz-dev, dschr...@2600hz.com
Hello,

Have you "fixed" joe? One note, bigcouch does not allow scale up as well, it will only share to new node the newly created databases.

I'm facing the problem where i have 1 bigcouch node, and want to increase it to 3...  I don't know what the best way of doing this. Still searching a bit.

Think it may be better to create a new cluster with 3 bigcouchs and them use clonetools from kazoo to migrate to the new cluster. Not sure yet.

Appreciate any help

Regards

Joe

unread,
Oct 11, 2016, 6:16:31 PM10/11/16
to 2600hz-dev, dschr...@2600hz.com
You are correct in that one node is not enough.  Also you guessed correctly the best way to add nodes to your bigcouch cluster.  there is another way but it requires you to have presharded multiple shards and seems unnecessarily complicated to do.

Henrique Santos Fernandes

unread,
Oct 12, 2016, 2:04:14 PM10/12/16
to 2600hz-dev, dschr...@2600hz.com
Thanks Joe, I am testing the tool.

I think i will go and over provision with 5 nodes ahead... so i don't have to change my cluster for a long time, as it will have dowmtime when I need to scale up again.

You received this message because you are subscribed to a topic in the Google Groups "2600hz-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/2600hz-dev/XusG4-grDEo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 2600hz-dev+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages