Creating cluster

443 views
Skip to first unread message

Kyle

unread,
Sep 10, 2013, 1:00:51 AM9/10/13
to nxfil...@googlegroups.com
Hi Jinhee

I'm trying to cluster 2 NXFilter nodes. Running CentOS 6.4, and NXFilter 1.5.3. Both nodes start OK unclustered.

I created the cluster by completing the Slave IP field on the master and completing the Master IP field on the slave. Then I restarted NXFilter on the master node which came up ok.

When I try to restart NXFilter on the slave node though, it hangs at this prompt and doesn't go any further:

./startup.sh
*********************************************************************
NxFilter v1.5.3
  Author : Jinhee Lee
  Contact : sup...@nxfilter.org
*********************************************************************
 INFO [09-10 14:47:17] - Starting NxFilter.

I've confirmed that I can connect to the ports 19003 and 19004 from the Slave, to the Master.

Also confirmed that the contents of nxfilter/conf/cfg.properties on the slave show cluster_mode = 2 and master_ip is correct. The details are also correct on the Master node.

If I blank out the values of master_ip and cluster_mode, I'm able to start it again.

Would you be able to offer any advice please?

Thanks,

Kyle


Kyle

unread,
Sep 10, 2013, 1:07:18 AM9/10/13
to nxfil...@googlegroups.com
Sorry I should also mention I've checked nxd.log however all it shows for the hung startup is "Starting NxFilter"

Jinhee

unread,
Sep 10, 2013, 3:19:26 AM9/10/13
to nxfil...@googlegroups.com
Hi Kyle,

Do you have both nodes on the same subnet?
What if you don't have slave-ip on master node setup?

Jinhee

Kyle

unread,
Sep 10, 2013, 9:45:08 PM9/10/13
to nxfil...@googlegroups.com
Hi Jinhee,

Thanks for the quick response, appreciate it.

Yes both nodes are on the same subnet - the IP's are for example 10.0.0.11 and 10.0.0.12.

If I remove the slave-ip from the master node, I get this prompt:

*********************************************************************
NxFilter v1.5.3
  Author : Jinhee Lee
  Contact : sup...@nxfilter.org
*********************************************************************
 INFO [09-11 11:42:49] - Starting NxFilter.
ERROR [09-11 11:42:52] - Couldn't connect to config DB!

And the "Couldn't connect to config DB!" message is recurring.

Thanks again,

Kyle

Jinhee

unread,
Sep 10, 2013, 10:50:58 PM9/10/13
to nxfil...@googlegroups.com
That slave-ip on master node is for access control.
But you said that you were able to connect its 19003 port.
Did you use command like 'telnet 10.0.0.11 19003'?

Jinhee
Message has been deleted

Kyle

unread,
Sep 10, 2013, 11:13:02 PM9/10/13
to nxfil...@googlegroups.com
Yes that's correct. 

With slave-ip set on master node, I'm able to "telnet 10.0.0.11 19003" from what is supposed to be the slave node (10.0.0.12).

With slave-ip not set on master node, if I try "telnet 10.0.0.11 19003" from 10.0.0.12, I get connection refused.

So it would seem it can connect to the port, but is possibly not able to download the config?

If I do a "netstat -a" on the master node - when the slave node is trying to start, and the slave-ip is set on the master node - this is what I get

tcp        1      0 ::ffff:10.0.0.11:19003  ::ffff:10.0.0.12:53294  CLOSE_WAIT
tcp      160      0 ::ffff:10.0.0.11:19003  ::ffff:10.0.0.12:53302  ESTABLISHED
tcp      161      0 ::ffff:10.0.0.11:19003  ::ffff:10.0.0.12:53298  CLOSE_WAIT

Jinhee

unread,
Sep 10, 2013, 11:46:01 PM9/10/13
to nxfil...@googlegroups.com
Just tested it again to find out any possible bug.
But on my side it's working.
One is Ubuntu and the other one is Windows 7.
I switched the master role for both of OS but found no problem.

Do you see any logging on master node side?
If there's a slave node connected you get 'work_cnt = 1' message.
And if there's a connection attempt from not allowed IP you get 'Not allowed IP : 192.168.0.102' message.

Jinhee

Kyle

unread,
Sep 11, 2013, 12:44:14 AM9/11/13
to nxfil...@googlegroups.com
Thanks again for the quick replies.

There's no logging at all from today on the master side, regarding connection attempts in nxd.log.

However I can see one, short connection from when I first started attempting this yesterday:

[root@master-node nxfilter]# cat log/nxd.log | grep work_cnt
 INFO [09-10 14:39:12] - work_cnt = 1
 INFO [09-10 14:39:35] - work_cnt = 0

Very strange!

Are there any other configuration options that need to be set for clustering to work other than what's in the cluster menu? If not, I may try a reinstall on the slave node.

Thanks,

Kyle

Jinhee

unread,
Sep 11, 2013, 1:07:31 AM9/11/13
to nxfil...@googlegroups.com
Yeah. I think at the time it was working.
And after that not working.
Don't know what it is but something changed.
There's no other option for clustering.

Jinhee

Kyle

unread,
Sep 11, 2013, 1:31:52 AM9/11/13
to nxfil...@googlegroups.com
OK thanks Jinhee.

I'll try a reinstall.
Message has been deleted

Catalin Alexandru Patrascu

unread,
Sep 11, 2013, 2:12:21 AM9/11/13
to nxfil...@googlegroups.com
Same problem. I post about it on June. I have this problem after the update to 1.4.6. Still not able to solve it within any of new version. Running Ubuntu on master and XPSP3 on slave. Now i run stand alone on both machine with same config.
Fortunately there are not many configuration changes and I'm not particularly interested in reports. Besides the fact that you need to configure the 2 machines is a good thing, if you fail one of the machines , the other takes over all traffic, filters and reports. If running in clueste, if the master fail, the slave not fltrer, just let requests in DNS. It would be a great idea if it could appear an option to sync configuration  between stand alone machines.

Jinhee

unread,
Sep 11, 2013, 2:41:02 AM9/11/13
to nxfil...@googlegroups.com
Yeah. I remember.
You still couldn't get it working?
If yours was working before the update what if someone test it with pre 1.4.6 version then?
In that case I might find a clue to the solution.
On my side it's working fine so I couldn't find any solution so far.

And it's not that simple.
If I just copy the config values over nodes the problem is that you might need to login twice.
If you use SSO with AD there's no problem with that but if you use login-page this could be a problem.
And if you use quota-time function you can't share the quota-time across the nodes.
And I implemented bandwidth control, it needs to share the bandwidth consumption data across the nodes.

Maybe there's some problem with that access control for DB thing which was added with v1.4.6.
If I can confirm it I can restore it to pre v1.4.6.

Jinhee

Kyle

unread,
Sep 11, 2013, 9:42:47 PM9/11/13
to nxfil...@googlegroups.com
Hi Jinhee,

I just tested installing version 1.4.5 and configuring a cluster - worked perfectly, immediately.

I set everything up exactly the same as in my previous config.

Thanks,

Kyle

Jinhee

unread,
Sep 11, 2013, 11:00:46 PM9/11/13
to nxfil...@googlegroups.com
Hi Kyle,

Thanks for the testing.
Can you confirm it that it doesn't work when you use v1.5.3 on the same environment?

Jinhee

Kyle

unread,
Sep 12, 2013, 8:52:54 PM9/12/13
to nxfil...@googlegroups.com
Hi Jinhee,

Sorry for the delayed response, I missed this message.

I've just tested v1.5.3 in this same lab environment and it seems to be working now! Very strange.

I'm going to try a fresh install of v1.5.3 on both nodes in our production environment now and I'll let you know how it goes.

Kyle

Jinhee

unread,
Sep 12, 2013, 8:59:45 PM9/12/13
to nxfil...@googlegroups.com
Hi Kyle,

Thanks for the reporting.
I will revert it back to the old method anyway.
But I still want to know what was the cause of the problem.
Though I can't find anything if it's working with v1.5.3.

Jinhee

Kyle

unread,
Sep 12, 2013, 10:55:32 PM9/12/13
to nxfil...@googlegroups.com
Hi Jinhee,

No worries, glad to help. We really like the options NXFilter provides to us. This just gets more and more strange on this side.

I've done a clean install of v1.5.3 in our production environment and still can't create a cluster. 

Then I rebuilt the CentOS operating system on both production nodes to ensure it was exactly the same as the lab environment; reinstalled v1.5.3 and still couldn't create a cluster.

The next step was trying v1.4.5 in the production environment on the fresh machines and v1.4.5 experienced the exact same issue!

The only difference now between my lab and production environment, is the hypervisor the VM is running on - but I can't see how that would make any difference?

Cheers,

Kyle

Jinhee

unread,
Sep 12, 2013, 11:37:08 PM9/12/13
to nxfil...@googlegroups.com
So you mean v1.4.5. also not working?

Jinhee

Kyle

unread,
Sep 13, 2013, 12:03:45 AM9/13/13
to nxfil...@googlegroups.com
Yeah that's right. It has the exact same symptoms I'm getting with v1.5.3.

As soon as the cluster is configured, the slave node just hangs at "Starting NxFilter"

Production environment is 2 VM's running on ESXi, lab environment is 2 VM's running on XenServer. That's the only difference.

Jinhee

unread,
Sep 13, 2013, 12:48:46 AM9/13/13
to nxfil...@googlegroups.com
I test it on VMWare workstation version.
So I guess VMWare workstation is also fine.
Maybe somthing blocks DB connection or data transmission.
Although it's very weird.

When you say 'ESXi' are you talking about this one?
  https://my.vmware.com/web/vmware/evalcenter?p=esxi&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDYQFjAB&url=https://www.vmware.com/go/getesxi/&ei=15gyUuXOO8nmiAeg9oD4Cw&usg=AFQjCNHBrjOyvlZeYDC1V2Y9kFbxuIp9sg&sig2=cVfmClMfUNYrxwOy-RusmA&bvm=bv.52164340,d.aGc&cad=rjt

After I finish v1.5.4 I will try to find out the source of the problem.

Jinhee

Kyle

unread,
Sep 15, 2013, 6:52:09 PM9/15/13
to nxfil...@googlegroups.com
Hi Jinhee,

Same product, but we're using a later version:

You can download it for free.

Running nxfilter on CentOS 6.4 64 bit

Great, really appreciate the assistance!

In the meantime, I think we're going to need to just run 2 individual nodes not in a cluster. 

Kyle

Jinhee

unread,
Sep 19, 2013, 9:50:48 AM9/19/13
to nxfil...@googlegroups.com
Hi Kyle,

Today I installed VMware ESXi 5.1 and CentOS 6.4 x86 64bit.
I had the same problem as you described.
And when I install it on VMware workstation it was OK.
I don't know what's the cause of the problem exactly but I made it working anyway.

My solution was putting the hostname into /etc/hosts file.
It's needed for the master node.
The hostname of my master node was 'cent-122' and the IP address was 192.168.0.122.
So I had to put this line into /etc/hosts file.

  192.168.0.122 cent-122

And it was working.
Try that.

Jinhee

Jinhee

unread,
Sep 19, 2013, 9:51:19 AM9/19/13
to nxfil...@googlegroups.com
I was using v1.5.4.

Kyle

unread,
Sep 19, 2013, 5:56:58 PM9/19/13
to nxfil...@googlegroups.com
That's great news, thanks.

To confirm - you made that host file entry on the slave node? 

It doesn't seem to work for v1.5.3, but I'm happy to upgrade to v1.5.4. 

Is there an upgrade process so we don't have to reconfigure all our other settings?

Jinhee

unread,
Sep 19, 2013, 6:02:18 PM9/19/13
to nxfil...@googlegroups.com
No. I didn't have it on the slave.
Updating is simple just copy everything into /nxfilter.
However if you modified GUI files you'd need to exclude it.
And in your case /nxfilter/conf/cfg.properties file needs to be excluded as the cluster config residing in the file.
But if it doesn't work with v1.5.3.
I don't know if this is the real solution of not for you.
I tested with 2 CentOS and 1 CentOS(master) and 1 Ubuntu all working.

Jinhee

Kyle

unread,
Sep 19, 2013, 6:04:57 PM9/19/13
to nxfil...@googlegroups.com
Oh, so on the master node, you put it's own IP into the hosts file?

Maybe I've done it wrong then.

Do we also exclude the DB folder?

Thanks for all your help

Kyle

Jinhee

unread,
Sep 19, 2013, 6:18:00 PM9/19/13
to nxfil...@googlegroups.com
Yes. On the master node. Its own IP and hostname.
There's no DB file inside the package so it's already excluded.
However you'd better have a backup before updating.

Jinhee

Kyle

unread,
Sep 19, 2013, 6:20:12 PM9/19/13
to nxfil...@googlegroups.com
Ok thanks, I did it incorrectly then.

I'll try that and let you know.

Kyle

Kyle

unread,
Sep 22, 2013, 6:26:24 PM9/22/13
to nxfil...@googlegroups.com
Hi Jinhee,

Just wanted to let you know, that fix worked for me as well.

I upgraded to v1.5.4 and edited the host file of the master node. Then the cluster started perfectly.

Thanks heaps for all your help!

Kyle

Catalin Alexandru Patrascu

unread,
Sep 23, 2013, 7:32:08 AM9/23/13
to nxfil...@googlegroups.com
Hello. Also work for me after edit the host files on 1.5.4.
Thanks !

Harxel

unread,
Sep 16, 2015, 1:36:31 AM9/16/15
to NxFilter
Im using v2.8.7. after setting up cluster, could not access slave gui.

able to verify connection using: 
telnet masterip 19003

telnet masterip 19004


Got error on slave:

INFO [09-16 13:04:09] - Starting NxFilter.

 INFO [09-16 13:04:09] - It's running as a slave node.

 INFO [09-16 13:04:09] - MasterCheck started.

 INFO [09-16 13:04:10] - updating to 140.

ERROR [09-16 13:04:10] - Couldn't update config DB!



On master log:

 INFO [09-16 05:30:00] - Adding a slave-node down alert email.

 INFO [09-16 05:30:00] - An email discarded as the alert email setup inactive!

 INFO [09-16 05:30:00] - Writing logs, log_cnt = 0, signal_cnt = 0, flow_cnt = 0, recv_flow = 0.

 INFO [09-16 05:31:00] - Writing logs, log_cnt = 0, signal_cnt = 0, flow_cnt = 0, recv_flow = 0.

 INFO [09-16 05:32:00] - Writing logs, log_cnt = 0, signal_cnt = 0, flow_cnt = 0, recv_flow = 0.

 INFO [09-16 05:33:00] - Writing logs, log_cnt = 0, signal_cnt = 0, flow_cnt = 0, recv_flow = 0.

 INFO [09-16 05:34:00] - Writing logs, log_cnt = 0, signal_cnt = 0, flow_cnt = 0, recv_flow = 0.

 INFO [09-16 05:35:00] - Adding a slave-node down alert email.

 INFO [09-16 05:35:00] - An email discarded as the alert email setup inactive!


As usual, thanks for your prompt support.



Jinhee

unread,
Sep 16, 2015, 4:53:12 AM9/16/15
to NxFilter

INFO [09-16 13:04:10] - updating to 140.

ERROR [09-16 13:04:10] - Couldn't update config DB!


This is about its local DB. Do you see anything in /nxfilter/db on the slave node? Do you have

any trace files in the directory?

Harxel

unread,
Sep 16, 2015, 5:33:23 AM9/16/15
to NxFilter
it has db files on it. but when i set it up to slave, the error came up. (im using ubuntu on this) 

What i did next is setup nxfilter on Centos server, it worked but if i access the slave gui, i cant go to anywhere menu but return me to cluster settings only. there i see its connected.  

Jinhee

unread,
Sep 16, 2015, 5:41:36 AM9/16/15
to NxFilter
Slave node uses master node setup. That's why you get redirected on slave node. And that's a clustering.

Harxel

unread,
Sep 16, 2015, 5:45:38 AM9/16/15
to NxFilter
so its normal?

Jinhee

unread,
Sep 16, 2015, 5:46:45 AM9/16/15
to NxFilter
Yeah, you have to use your master node for setup.

Harxel

unread,
Sep 16, 2015, 5:56:19 AM9/16/15
to NxFilter
Now i understand.

Thank you so much!!!!
Reply all
Reply to author
Forward
0 new messages