On 31.10.18 12:42, Simeon Angelov wrote:
> Hi all,
>
> I would like to ask you for some propositions/recommendations on
> improvement on our jgroups configuration or some improvements as a whole
> in order to delegate some issues we have faced on our last sales period.
>
> Set up of the environment: We are using an e-commerce platform running
> on Hybris (Tomcat) that uses jgroups version 3.6.16. Please note that we
> are using TCP as opposed to UDP because AWS can't support multicast on
> our VPC.
>
> Issues:
>
> 1. Getting*"JGRP000034: hybrisnode-8355: failure sending message to
> hybrisnode-8369: java.net.SocketTimeoutException: connect timed out"
> *error , especially on a time when we scale up/down some of the nodes in
> the clusters as the users traffic increase significantly and the nodes
> in the cluster becomes heavy (high CPU - more than 80 ~ 90% ).
> Is the*sock_conn_timeout="300" *what we need to increase here in order
> to not facing such a SocketTimeoutException exception ? Are there any
> other depended configuration properties to it we should have in mind
> changing it? *
In general, a low timeout is fine: if a connection cannot be
established, the next attempt might succeed (retransmission will take
care of that).
> 2. At some point a AWS EC2 instance (node) (e.g. when a scale down
> policy is being executed, or a AWS EC2 instance (node) is being
> terminated), is receiving a SIGKILL(9) . This force the application to
> stop and also the AWS EC2 instance to be terminated. What we are facing
> incidentally (not always happening) is that that EC2 instance node is
> not recognized by the other nodes in the cluster that is out of the
> cluster / terminated already.
This would be determined by FD_SOCK (if the sockets of the process are
closed) or by FD (if FD_SOCK doesn't detect this). I recommend replace
FD with FD_ALL, which is much faster in detecting multiple member crashes.
> And so, the other nodes are trying to send
> messages to that node (even it has been terminated already). It is like
> the terminated EC2 instance couldn't manage to notify the other EC2
> instances that is our of the cluster. We can identify that as we are
> having again the
> "JGRP000034: hybrisnode-8355: failure sending message to
> hybrisnode-8369: java.net.SocketTimeoutException: connect timed out"
> error where the destination EC2 instance, for where the message has been
> sent, is the node has been terminated.
> Again, this is not a at all time happening, it is incidentally where we
> can see that situation.
> Is there a property we could change in order to mitigate that situation
If UNICAST3 tries to retransmit to a member that dies, then
max_retransmit_time will remove that connection after a given time and
the sending will stop. Also take a look at the conn_close_timeout.
I suggest upgrade from UNICAST to UNICAST3 anyway.
> ? And what could be the reason the EC2 instance to be not recognized by
> the other EC2 instance has been terminated?
It *should* be recognized after ca 10 seconds (3*3s in FD and 1.5s in
VERIFY_SUSPECT), or immediately by FD_SOCK. I have not seen such
behavior in my own AWS tests...
> 3. Also, we can see, incidentally again, that a AWS EC2 instance even is
> in the cluster , messages has been send to that node for a while - for
> couple of minutes, let's say (again we could see "JGRP000034:" error).
> What could be the reason behind that ? The destination EC2 instance is
> too busy to accepts any kind of jgroups messages, per say ? Could we
> delegate that case with a configuration change or some other way ?
Not sure I understand what you mean... do you have any firewalls in place?
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
jgroups-dev...@googlegroups.com
> <mailto:
jgroups-dev...@googlegroups.com>.
> To post to this group, send email to
jgrou...@googlegroups.com
> <mailto:
jgrou...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/jgroups-dev/c2e48e85-c109-4655-8dd1-9ca62f616ca1%40googlegroups.com
> <
https://groups.google.com/d/msgid/jgroups-dev/c2e48e85-c109-4655-8dd1-9ca62f616ca1%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit
https://groups.google.com/d/optout.
--
Bela Ban |
http://www.jgroups.org