Issue Initializing New Mongo Arbiter in 19.12

53 views
Skip to first unread message

Peter Krautle

unread,
Dec 16, 2021, 3:12:50 PM12/16/21
to sipxcom-users
We are having trouble initializing a new arbiter in Mongo on a production cluster where a server is being retired - release is 19.12 and have done this many times on test systems without issue. The new arbiter is defined in the servers section, a new id is generated and used in the sipxecs-setup. The primary server is on IP 10.20.2.100 - the only log file generated is sipxagent.log and in browsing for issues, the following messages appear:
cf3 -> Connect to 10.20.2.100 = 10.20.2.100 on port 5308
cf3>  !! Unable to connect to server 10.20.2.100
cf3> Unable to establish connection with 10.20.2.100

When the /var/cfengine directory is compared with a working initialized secondary server, the following differences are seen:

Unitialized System
cfagent.$(sipx.host).$(sipx.net_domain).log
cfagent.localhost.localdomain.log
cf3.$(sipx.runlog
cd ppkeys
localhost.priv  localhost.pub

Configured System
cfagent.pbx4.abcd.org.log
fagent.localhost.localdomain.log
cf3.pbx4.runlog
cd ppkeys
localhost.priv  root-MD5=a6a287ed17c18bf14da222ce4566fc58.pub
localhost.pub   sipx-MD5=2d06f8bfcf19d94322dc4f1c83e4d609.pub

I've tried 4 times on two different servers re-installing Sipxcom and cannot get the server initialized. After Sipxecs-setup is completed, Mongo also goes from an active to a stopped state.

Any ideas on how to troubleshoot or next steps is appreciated.

All the best
Peter


Iuliu Blaga

unread,
Dec 16, 2021, 3:16:47 PM12/16/21
to Peter Krautle, sipxcom-users
Run sipxagent on this secondary/arbiter
--
You received this message because you are subscribed to the Google Groups "sipxcom-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sipxcom-users/df255164-0da5-4dd2-ac65-996cfa194cean%40googlegroups.com.


--

 

Iuliu Blaga
Sr. Support Engineer
 
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Peter Krautle

unread,
Dec 16, 2021, 5:54:34 PM12/16/21
to sipxcom-users
Thanks Luliu. Are there parameters to specify in sipxagent -e.g. here is the response when I add the -h parameter:

[root@$(sipx cfengine]# sipxagent -h pbx6.abcd.org
Error. System error for fopen: "No such file or directory". Unable to find host or service: (pbx6.abcd.org/5308) Name or service not known. No server is responding on this port

I have defined pbx6.abcd.org on the primary server and is assigned mongo-id of 6. Running Sipxagent without parameters does not resolve the issue.

Appreciate the support!

Peter
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-user...@googlegroups.com.

Iuliu Blaga

unread,
Dec 16, 2021, 6:23:48 PM12/16/21
to Peter Krautle, sipxcom-users
Hi, no should not need any parameters.

Try systemctl restart sipxsupervisor on both the primary and the arbiter

Then run sipxagent again on the arbiter
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-users+unsubscribe@googlegroups.com.


--

 

Iuliu Blaga
Sr. Support Engineer
 
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

--
You received this message because you are subscribed to the Google Groups "sipxcom-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sipxcom-users/2bdf697d-5682-4d99-841f-83a0b4553f45n%40googlegroups.com.

Peter Krautle

unread,
Dec 17, 2021, 2:48:18 PM12/17/21
to sipxcom-users
Thank you again, but no joy. I  have scheduled a restart of the primary server for early Sunday morning and will retry again. Peter

To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-user...@googlegroups.com.


--

 

Iuliu Blaga
Sr. Support Engineer
 
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

--
You received this message because you are subscribed to the Google Groups "sipxcom-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-user...@googlegroups.com.

Peter Krautle

unread,
Dec 19, 2021, 12:28:31 PM12/19/21
to sipxcom-users
Reboot of the primary server did not resolve the issue, but in looking at the Mongo primary server logs, I see that the list of primary and secondary servers defined in the GUI does not align with the underlying MongoDB configuration. This a five server cluster across 3 physical servers - pbx, pbx2, pbx3, pbx4, pbx5. The old arbiter was pbx3 and was deleted as a server in the GUI and pbx6 was added as the new server. However Mongo rs.config() results still show pbx3 with 'not found' errors in the Mongo log files and pbx6 is not listed. Is there an easy way to resync the Mongo database to reflect what in the Sipxcom Servers GUI without issuing manual Mongo database commands?

Iuliu Blaga

unread,
Dec 19, 2021, 2:40:33 PM12/19/21
to Peter Krautle, sipxcom-users
Hi, please select all servers and send profiles to them (off hours)
To unsubscribe from this group and stop receiving emails from it, send an email to sipxcom-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sipxcom-users/d332e0cf-c7dc-4d98-86aa-5c6af64d69f7n%40googlegroups.com.

Peter Krautle

unread,
Dec 19, 2021, 10:58:50 PM12/19/21
to sipxcom-users
Thanks again Liliu - much appreciated. I tried twice this morning to push the profiles without success before inspecting the Mongo configuration on the primary server. I have a three server test system at 21.04. I'll add two more tomorrow morning, and then repeat the steps taken on the production server to try and replicate the issue - I'll open a Jira if the problem can be replicated,  Peter

Peter Krautle

unread,
Dec 28, 2021, 12:02:31 AM12/28/21
to sipxcom-users
Hi Luliu - hope your holidays were enjoyable and your family/friends were staying safe. Here in the NYC region, the Corona virus is spreading very quickly again but with the high vaccination rates, the virus impact is  lower and people recover quickly.  On our test system, when a server is deleted (e.g. pbx3), the Mongo DB configuration gets updated after a positive response is issued to the 'Remove pbx3.abcd.org:27018 from configuration prompt in the System->Databases->Database menu.  I checked the production cluster and the same prompt appeared, so responded positively and the Mongo configuration was updated. However on the production cluster, I cannot get the new arbiter to initialize despite reinstall of Sipxcom, reboot of primary server, restart of Sipxsupervisor on primary and arbiter, etc. Any other suggestions on things to try is appreciated. Again, a big thank you for your support. Peter

Iuliu Blaga

unread,
Dec 28, 2021, 7:10:41 AM12/28/21
to Peter Krautle, sipxcom-users
Hi, this is very strange, are you certain the nodes communicate unrestricted with each other? Can you ssh from one to the other both ways?

Peter Krautle

unread,
Dec 29, 2021, 4:39:31 AM12/29/21
to sipxcom-users
Hi Luliu - there is an IPSEC tunnel connecting the data center with the arbiter with the data center where the primary server is located, but the new arbiter is now 30 KMs away instead of 160 KMsaway with the old arbiter - ping times are under 10 msec. I rebuilt the arbiter yesterday from scratch - here are the sipxagent.log files from the arbiter https://www.dropbox.com/s/uuypa0xv2mwlii8/sipxagentarbiter.log?dl=0 and primary server https://www.dropbox.com/s/opu9pzpsaf91cy2/sipxagentprimary.log?dl=0 after sipxecs-setup was run on the arbiter. The arbiter is at 192.168.111.212 and primary server is at 10.20.2.100. Peter

Iuliu Blaga

unread,
Dec 29, 2021, 4:55:50 AM12/29/21
to Peter Krautle, sipxcom-users
Please run on the arbiter:

nmap -sT -p 5308 10.20.2.100

Peter Krautle

unread,
Dec 29, 2021, 8:05:40 AM12/29/21
to sipxcom-users

[root@$(sipx ~]# nmap -sT -p 5308 10.20.2.100

Starting Nmap 6.40 ( http://nmap.org ) at 2021-12-29 06:40 EST
Nmap scan report for 10.20.2.100
Host is up (0.0075s latency).
PORT     STATE    SERVICE
5308/tcp filtered cfengine

Nmap done: 1 IP address (1 host up) scanned in 0.45 seconds
[root@$(sipx ~]# ping 10.20.2.100
PING 10.20.2.100 (10.20.2.100) 56(84) bytes of data.
64 bytes from 10.20.2.100: icmp_seq=1 ttl=62 time=9.40 ms
64 bytes from 10.20.2.100: icmp_seq=2 ttl=62 time=7.70 ms
64 bytes from 10.20.2.100: icmp_seq=3 ttl=62 time=9.76 ms

Iuliu Blaga

unread,
Dec 29, 2021, 1:05:09 PM12/29/21
to Peter Krautle, sipxcom-users
Hi, "filtered" is not good :)

"Filtered means that a firewall, filter, or other network obstacle is blocking the port so that Nmap cannot tell whether it is open or closed . Closed ports have no application listening on them, though they could open up at any time."

It is clear from the logs that cfengine cannot reach port 5308 on the master. Why that is is a network issue, whether you are using "unmanaged" firewall and the proper rules were not added by sipxcom to iptables or something along the way blocks some of the traffic.

Michael Picher

unread,
Dec 29, 2021, 2:08:41 PM12/29/21
to Iuliu Blaga, Peter Krautle, sipxcom-users
Also note that tunnels can reduce mtu size...

Mike

Peter Krautle

unread,
Dec 30, 2021, 5:49:12 PM12/30/21
to sipxcom-users
The arbiter is now configured and connected to the cluster. The NMAP output pointed to firewall issues, so I turned off the iptables service (which I already thought was off). The 5308 port showed as 'open' when running NMAP after iptables was turned off -  the arbiter immediately configured and was able to include in Mongo. In auditing the system the 'unmanaged firewall service' was enabled as the primary and secondary servers are sitting behind firewalls. But in reading some of the threads, we will disable the unmanaged firewall service option. Luliu - many thanks again and your guidance is appreciated. Happy New Year!
Reply all
Reply to author
Forward
0 new messages