Cluster node not rejoining

34 vistas
Ir al primer mensaje no leído

Duke Tantiprasut

no leída,
14 ago 2010, 10:58:17 p.m.14/8/10
para haze...@googlegroups.com,ta...@hazelcast.com
HI All,

Any ideas why a node would be added to a "dead" list and not be allowed to rejoin when the network connectivity is restored (see server2 Aug 11th below)?

Thanks
Duke

--- SERVER1 ---

Jul 26, 2010 4:23:51 PM com.hazelcast.config.XmlConfigBuilder
INFO: Using configuration file at /apps/resolve/rscontrol/config/hazelcast.xml
Jul 26, 2010 4:23:51 PM com.hazelcast.config.XmlConfigBuilder
INFO: Using configuration file at /apps/resolve/rscontrol/config/hazelcast.xml
Jul 26, 2010 4:23:52 PM com.hazelcast.system
INFO: [dev] Hazelcast 1.8.4 (20100525) starting at Address[169.70.34.153:5701]
Jul 26, 2010 4:23:52 PM com.hazelcast.system
INFO: [dev] Copyright (C) 2008-2010 Hazelcast.com
Jul 26, 2010 4:23:59 PM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [2] {
       Member [169.70.34.150:5701]
       Member [169.70.34.153:5701] this
}

log4j:ERROR Failed to flush writer,
java.io.InterruptedIOException
       at java.io.FileOutputStream.writeBytes(Native Method)
       at java.io.FileOutputStream.write(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(Unknown Source)
       at sun.nio.cs.StreamEncoder.flush(Unknown Source)
       at java.io.OutputStreamWriter.flush(Unknown Source)
       at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:57)
       at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:315)
       at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:234)
       at org.apache.log4j.WriterAppender.append(WriterAppender.java:159)
       at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
       at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)
       at org.apache.log4j.Category.callAppenders(Category.java:203)
       at org.apache.log4j.Category.forcedLog(Category.java:388)
       at org.apache.log4j.Category.debug(Category.java:257)
       at com.resolve.rscontrol.MAction.removeProcessCleanup(MAction.java:727)
       at com.resolve.rscontrol.MAction.removeProcess(MAction.java:684)
       at com.resolve.rscontrol.MAction.abortProcess(MAction.java:608)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
       at java.lang.reflect.Method.invoke(Unknown Source)
       at com.resolve.thread.ExecutorTask.execute(ExecutorTask.java:268)
       at com.resolve.thread.ExecutorTask.call(ExecutorTask.java:114)
       at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
       at java.util.concurrent.FutureTask.run(Unknown Source)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
       at java.lang.Thread.run(Unknown Source)
Aug 11, 2010 11:26:20 PM com.hazelcast.impl.PartitionManager
INFO: [dev] Address[169.70.34.153:5701] will backup 615
Aug 11, 2010 11:26:20 PM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [1] {
       Member [169.70.34.153:5701] this
}

--- SERVER 2 ---
Jun 15, 2010 10:43:21 AM com.hazelcast.config.XmlConfigBuilder
INFO: Using configuration file at /apps/resolve/rscontrol/config/hazelcast.xml
Jun 15, 2010 10:43:22 AM com.hazelcast.config.XmlConfigBuilder
INFO: Using configuration file at /apps/resolve/rscontrol/config/hazelcast.xml
Jun 15, 2010 10:43:22 AM com.hazelcast.system
INFO: [dev] Hazelcast 1.8.4 (20100525) starting at Address[169.70.34.150:5701]
Jun 15, 2010 10:43:22 AM com.hazelcast.system
INFO: [dev] Copyright (C) 2008-2010 Hazelcast.com
Jun 15, 2010 10:43:30 AM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [2] {
       Member [169.70.34.153:5701]
       Member [169.70.34.150:5701] this
}

log4j:ERROR Failed to flush writer,
java.io.InterruptedIOException
       at java.io.FileOutputStream.writeBytes(Native Method)
       at java.io.FileOutputStream.write(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(Unknown Source)
       at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(Unknown Source)
       at sun.nio.cs.StreamEncoder.flush(Unknown Source)
       at java.io.OutputStreamWriter.flush(Unknown Source)
       at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:57)
       at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:315)
       at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:234)
       at org.apache.log4j.WriterAppender.append(WriterAppender.java:159)
       at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
       at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)
       at org.apache.log4j.Category.callAppenders(Category.java:203)
       at org.apache.log4j.Category.forcedLog(Category.java:388)
       at org.apache.log4j.Category.debug(Category.java:257)
       at com.resolve.rscontrol.MAction.removeProcessCleanup(MAction.java:727)
       at com.resolve.rscontrol.MAction.removeProcess(MAction.java:684)
       at com.resolve.rscontrol.MAction.abortProcess(MAction.java:608)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
       at java.lang.reflect.Method.invoke(Unknown Source)
       at com.resolve.thread.ExecutorTask.execute(ExecutorTask.java:268)
       at com.resolve.thread.ExecutorTask.call(ExecutorTask.java:114)
       at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
       at java.util.concurrent.FutureTask.run(Unknown Source)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
       at java.lang.Thread.run(Unknown Source)
Jul 26, 2010 4:22:49 PM com.hazelcast.impl.PartitionManager
INFO: [dev] Address[169.70.34.150:5701] will backup 599
Jul 26, 2010 4:22:49 PM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [1] {
       Member [169.70.34.150:5701] this
}

Jul 26, 2010 4:23:59 PM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [2] {
       Member [169.70.34.150:5701] this
       Member [169.70.34.153:5701]
}

Aug 11, 2010 11:26:49 PM com.hazelcast.cluster.ClusterManager
WARNING: [dev] Added Address[169.70.34.153:5701] to list of dead addresses because of timeout since last read
Aug 11, 2010 11:26:49 PM com.hazelcast.impl.PartitionManager
INFO: [dev] Address[169.70.34.150:5701] will backup 615
Aug 11, 2010 11:26:49 PM com.hazelcast.cluster.ClusterManager
INFO: [dev]

Members [1] {
       Member [169.70.34.150:5701] this
}


--
Duke Tantiprasut
Chief Technology Officer (CTO)
GenerationE Technologies
Email: duke.tan...@generationetech.com
Office: 949-325-0103
Cell: 858-232-8287

Talip Ozturk

no leída,
15 ago 2010, 1:20:45 a.m.15/8/10
para Duke Tantiprasut,haze...@googlegroups.com
It is because 169.70.34.153 couldn't send any heartbeat (any message,
any request) for a long time
(GroupProperties.MAX_NO_HEARTBEAT_SECONDS) to 169.70.34.150. Or
169.70.34.150 couldn't receive anything. Possibly because of a long GC
pause on one of the two servers.

Although you can configure MAX_NO_HEARTBEAT_SECONDS property to a
higher value, you should consider this: If Hazelcast didn't receive
anything for that long, that means your JVM wasn't responsive for that
long. If you can prove that your JVM was responsive then it is an
Hazelcast bug. Otherwise you have to fix the long GC pause problem of
your application.


twitter @oztalip

Duke

no leída,
16 ago 2010, 12:36:04 p.m.16/8/10
para Hazelcast
Hi Talip,

Is there a way to configure the
GroupProperties.MAX_NO_HEARTBEAT_SECONDS from the hazelcast.xml or
system property (hazelcast.max.no.heartbeat.seconds) ?

Thanks
Duke
> > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appen derAttachableImpl.java:65)
> > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appen derAttachableImpl.java:65)
> >        at org.apache.log4j.Category.callAppenders(Category.java:203)
> >        at org.apache.log4j.Category.forcedLog(Category.java:388)
> >        at org.apache.log4j.Category.debug(Category.java:257)
> >        at
> > com.resolve.rscontrol.MAction.removeProcessCleanup(MAction.java:727)
> >        at com.resolve.rscontrol.MAction.removeProcess(MAction.java:684)
> >        at com.resolve.rscontrol.MAction.abortProcess(MAction.java:608)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> >        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >        at java.lang.reflect.Method.invoke(Unknown Source)
> >        at com.resolve.thread.ExecutorTask.execute(ExecutorTask.java:268)
> >        at com.resolve.thread.ExecutorTask.call(ExecutorTask.java:114)
> >        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> >        at java.util.concurrent.FutureTask.run(Unknown Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access $301(Unknown
> > Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un known
> > Email: duke.tantipra...@generationetech.com
> > Office: 949-325-0103
> > Cell: 858-232-8287

Duke

no leída,
16 ago 2010, 12:15:45 p.m.16/8/10
para Hazelcast
Thanks Talip. Just got the response back from the customer. It looks
like the JVM is responsive and replies to JMS msgs sent to it. I'll
set the MAX_NO_HEARTBEAT_SECONDS to a high number in the meantime.
Please let me know if you do see any issue on your end.

Thanks
Duke

On Aug 14, 10:20 pm, Talip Ozturk <ta...@hazelcast.com> wrote:
> > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appen derAttachableImpl.java:65)
> >        at org.apache.log4j.Category.callAppenders(Category.java:203)
> >        at org.apache.log4j.Category.forcedLog(Category.java:388)
> >        at org.apache.log4j.Category.debug(Category.java:257)
> >        at
> > com.resolve.rscontrol.MAction.removeProcessCleanup(MAction.java:727)
> >        at com.resolve.rscontrol.MAction.removeProcess(MAction.java:684)
> >        at com.resolve.rscontrol.MAction.abortProcess(MAction.java:608)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> >        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >        at java.lang.reflect.Method.invoke(Unknown Source)
> >        at com.resolve.thread.ExecutorTask.execute(ExecutorTask.java:268)
> >        at com.resolve.thread.ExecutorTask.call(ExecutorTask.java:114)
> >        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> >        at java.util.concurrent.FutureTask.run(Unknown Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access $301(Unknown
> > Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un known
> >        at org.apache.log4j.WriterAppender.append(WriterAppender.java:159)
> >        at
> > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
> >        at
> > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appen derAttachableImpl.java:65)
> >        at org.apache.log4j.Category.callAppenders(Category.java:203)
> >        at org.apache.log4j.Category.forcedLog(Category.java:388)
> >        at org.apache.log4j.Category.debug(Category.java:257)
> >        at
> > com.resolve.rscontrol.MAction.removeProcessCleanup(MAction.java:727)
> >        at com.resolve.rscontrol.MAction.removeProcess(MAction.java:684)
> >        at com.resolve.rscontrol.MAction.abortProcess(MAction.java:608)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> >        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >        at java.lang.reflect.Method.invoke(Unknown Source)
> >        at com.resolve.thread.ExecutorTask.execute(ExecutorTask.java:268)
> >        at com.resolve.thread.ExecutorTask.call(ExecutorTask.java:114)
> >        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> >        at java.util.concurrent.FutureTask.run(Unknown Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access $301(Unknown
> > Source)
> >        at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un known

Talip Ozturk

no leída,
18 ago 2010, 7:02:50 a.m.18/8/10
para haze...@googlegroups.com
Yes you can set it as a system property or a property in hazelcast.xml file.

<hazelcast>
<properties>
<hazelcast.max.no.heartbeat.seconds>180</hazelcast.max.no.heartbeat.seconds>
</properties>
<group>
<name>dev</name>
<password>dev-pass</password>
</group>
.......

-talip

> --
> You received this message because you are subscribed to the Google Groups "Hazelcast" group.
> To post to this group, send email to haze...@googlegroups.com.
> To unsubscribe from this group, send email to hazelcast+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/hazelcast?hl=en.
>
>

Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos