question regards AWS autodiscovery INFO logoutput and time outs

68 views
Skip to first unread message

Iso

unread,
May 21, 2013, 10:39:40 AM5/21/13
to haze...@googlegroups.com
Hi,

we are testing hazelcast in EC2

we dont use any securityGroupName, tagKey or tagValue.

now we have few questions:

1.
for example we got outputs like below:

2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.211.174.18:5701. Reason: ConnectException[Connection timed out]
2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.211.174.18:5703. Reason: ConnectException[Connection timed out]
2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.211.174.18:5702. Reason: ConnectException[Connection timed out]
.....
2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.139.10.41:5701. Reason: ConnectException[Connection timed out]
2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.139.10.41:5703. Reason: ConnectException[Connection timed out]
2013-05-21 10:17:51 INFO [<default>] [10.165.21.89]:5701 [myGroup] Could not connect to: /10.139.10.41:5702. Reason: ConnectException[Connection timed out]
.......
....
...

I found these issue and discussions:
https://github.com/hazelcast/hazelcast/issues/140
https://groups.google.com/forum/#!searchin/hazelcast/AWS/hazelcast/rnBcHc2j5Wo/bhwfKGw_pxIJ
https://groups.google.com/forum/#!topic/hazelcast/k9tVcagUTm8/discussion

does that mean, hazelcast (in this case is the split-brain handler i guess) is trying to connect to ALL servers that is known with AWS auto discovery mechanism, even if hazelcast is not running on this aws instances ??
and why is hazelcast trying 3 different sockets ? has this something to do with port autoincrement ? (port autoincrement is set on true).

will it help if we define security group + tagKey or value ? so that other instances, where hazelcast is not running, won't be in the list.

2.
we sometimes get timeout issues. connection time out of tcp/ip is set on 10 sec at the moment, it seems that's not really enough, not always. How is your experience with timeouts in EC2 ? lets say if we want to deploy a cluster with 10.-20 nodes.


Thanks




Peter Veentjer

unread,
May 21, 2013, 10:56:34 AM5/21/13
to haze...@googlegroups.com
Hi Iso,

Answer to question 1)

if I remember correctly, in your current situation it tries to connect to all ec2 instances within a specific region

Normally you want to set either a security group, then it will only try to connect to instances with that security group.

Or provide a tag-key/tag-value. Then only instances with that same tag-key/tag-value will be tried.

Answer to question 2)

using the conn-timeout-seconds setting on the aws configuration you can increase the timeout. Try setting it to a higher value.

Can you post your configuration (and please don't post the aws credentials!)






--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Iso

unread,
May 21, 2013, 11:44:12 AM5/21/13
to haze...@googlegroups.com
Thx for answer,
we set our config with api

we decreases most timeouts

joinConfig.getTcpIpConfig().setConnectionTimeoutSeconds("10");
        // initial timeout setting
        config.setProperty("hazelcast.merge.first.run.delay.seconds", "60");
        config.setProperty("hazelcast.merge.next.run.delay.seconds", "60");
        config.setProperty("hazelcast.redo.giveup.threshold", "30");
        config.setProperty("hazelcast.redo.log.threshold", "15");
        config.setProperty("hazelcast.max.operation.timeout", "60000");
        config.setProperty("hazelcast.max.no.heartbeat.seconds", "60");
        config.setProperty("hazelcast.master.confirmation.interval.seconds", "20");
        config.setProperty("hazelcast.max.no.master.confirmation.seconds", "120");
        config.setProperty("hazelcast.member.list.publish.interval.seconds", "180");

but set connection time out on a higher value (I believe default is only 5 seconds)

Iso

unread,
May 21, 2013, 11:46:30 AM5/21/13
to haze...@googlegroups.com
joinConfig.getTcpIpConfig().setConnectionTimeoutSeconds(10);

the parameter is an int of course :)

Peter Veentjer

unread,
May 22, 2013, 3:20:36 AM5/22/13
to haze...@googlegroups.com
Which join mechanisms do you have enabled?

Because I'm not sure if you are using aws or if you are using tcpip. 


On Tue, May 21, 2013 at 6:46 PM, Iso <linmi...@gmx.net> wrote:
joinConfig.getTcpIpConfig().setConnectionTimeoutSeconds(10);

the parameter is an int of course :)

--

Iso

unread,
May 23, 2013, 4:05:53 AM5/23/13
to haze...@googlegroups.com
aws is enabled and tcp/ip is disabled.
I didn't  post the whole config factory class here.

it looks like this.

....
Join joinConfig = config.getNetworkConfig().getJoin();
        joinConfig.getMulticastConfig().setEnabled(false);
        joinConfig.getTcpIpConfig().setEnabled(false);
        joinConfig.getTcpIpConfig().setConnectionTimeoutSeconds(10);
        AwsConfig awsConfig = joinConfig.getAwsConfig();

        awsConfig.setEnabled(true);
....

The only ConnectionTimeoutSeconds(int value) I found is in tcp/ip.
And I do remember discussion here, that setting timeoutSeconds in tcp/ip works for aws aswell.
at least this is the only way to set with api.

Peter Veentjer

unread,
May 23, 2013, 4:15:21 AM5/23/13
to haze...@googlegroups.com
ok.

Question: do you have other ec2 instances running, that are not related to your hazelcast cluster?

If you do.. can you specify a security group or a tag-key/tag-value to make sure that only the instances of your cluster are contacted?

Iso

unread,
May 23, 2013, 4:41:36 AM5/23/13
to haze...@googlegroups.com
yes it seems so (or must be).
(the whole ec2 is not in my responsibility not even in my team so I am not really sure :) I just have to/want to prove that's not really a hazelcast issue).

But you already helped me with your first post, thank you.
Like you wrote in your answer: "..... in your current situation it tries to connect to all ec2 instances within a specific region....". I double checked in AWS documentation etc.. thanks. This point is now clear.

the last thing I don't understand is why it always tries 3 different port (5701-5703). Is it because I enabled port auto increment ?
I enabled port auto increment for test environment, so we can test multiple hazelcast instances on same machine/ec2 Instance.

Tanks for your help.
Reply all
Reply to author
Forward
0 new messages