Nathan, thank you very much for your rely.
I am using 0.6.2 and there is no supervisor.worker.* configs.
It seems that the worker first found that the connection to zookeeper
is broken, and then after about 20 seconds, worker tried to reconnect
to the same session on zookeeper, which has timed out, and has been
deleted. the worker tried to rebuild the connection to zookeeper, and
failed. I cannot find any app log on worker log, so i don't know what
happened.
In my bolt, i started one timer task(every 5 mins), and run some time
cost work(about several seconds). will that effect the storm's
performance?
the worker's log is like:
2012-04-04 14:45:22 ClientCnxn [INFO] Client session timed out, have
not heard from server in 13400ms for sessionid 0x136695d83c700b1,
closing socket connection and attempting reconnect
2012-04-04 14:45:33 ConnectionStateManager [INFO] State change:
SUSPENDED
2012-04-04 14:45:33 ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2012-04-04 14:45:36 ClientCnxn [INFO] Opening socket connection to
server
data1.wf.ppweb.com.cn/192.168.0.101:2181
2012-04-04 14:45:36 ClientCnxn [INFO] Socket connection established to
data1.wf.ppweb.com.cn/192.168.0.101:2181, initiating session
2012-04-04 14:45:46 cluster [WARN] Received event :disconnected::none:
with disconnected Zookeeper.
2012-04-04 14:45:47 ConnectionState [WARN] Session expired event
received
2012-04-04 14:45:47 ClientCnxn [INFO] Unable to reconnect to ZooKeeper
service, session 0x136695d83c700b1 has expired, closing socket
connection
2012-04-04 14:45:47 ZooKeeper [INFO] Initiating client connection,
connectString=
192.168.0.101:2181/storm sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@24cd59e9
2012-04-04 14:45:47 ClientCnxn [INFO] Opening socket connection to
server /
192.168.0.101:2181
2012-04-04 14:45:50 ClientCnxn [INFO] Socket connection established to
data1.wf.ppweb.com.cn/192.168.0.101:2181, initiating session
2012-04-04 14:45:50 ConnectionStateManager [INFO] State change: LOST
2012-04-04 14:45:50 ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2012-04-04 14:45:50 cluster [WARN] Received event :expired::none: with
disconnected Zookeeper.
2012-04-04 14:45:53 ClientCnxn [INFO] Session establishment complete
on server
data1.wf.ppweb.com.cn/192.168.0.101:2181, sessionid =
0x136695d83c700ca, negotiated timeout = 20000
2012-04-04 14:45:53 ConnectionStateManager [INFO] State change:
RECONNECTED
2012-04-04 14:45:53 ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2012-04-04 14:45:59 ClientCnxn [INFO] EventThread shut down
my nimbus log goes like:
2012-04-01 01:00:22 nimbus [INFO] Cleaning inbox ... deleted:
stormjar-2b2796d7-a3d1-41ed-8099-e6784e3dbf97.jar
2012-04-01 16:25:49 nimbus [INFO] Task RealTimeAnalysis-1-1333209246:2
timed out
2012-04-01 16:25:49 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:12 timed out
2012-04-01 16:25:49 nimbus [INFO] Reassigning
RealTimeAnalysis-1-1333209246 to 10 slots
2012-04-01 16:25:49 nimbus [INFO] Reassign ids: [2 12]
2012-04-01 16:25:49 nimbus [INFO] Available slots:
(["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023]
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10023])
2012-04-01 16:25:49 nimbus [INFO] Setting new assignment for storm id
RealTimeAnalysis-1-1333209246:
#:backtype.storm.daemon.common.Assignment{:master-code-dir "/home/op/
work/storm/state/nimbus/stormdist/
RealTimeAnalysis-1-1333209246", :node->host
{"c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" "
data0.wf.ppweb.com.cn",
"80954a44-5bc6-4a67-aa98-2794447c5fd6" "
data1.wf.ppweb.com.cn",
"b5b94eb7-485a-493b-8780-d9835dab3e2e" "
data2.wf.ppweb.com.cn"}, :task-
>node+port {1 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 2
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 3
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 4
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 5 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10021], 6 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81"
10021], 7 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 8
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 9
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10022], 10
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 11
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 12
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 13
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 14
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 15
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10021], 16
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 17
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 18
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 19
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10022], 20
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 21
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020]}, :task->start-time-
secs {1 1333209247, 2 1333268749, 3 1333209247, 4 1333209247, 5
1333209247, 6 1333209247, 7 1333209247, 8 1333209247, 9 1333209247, 10
1333209247, 11 1333209247, 12 1333268749, 13 1333209247, 14
1333209247, 15 1333209247, 16 1333209247, 17 1333209247, 18
1333209247, 19 1333209247, 20 1333209247, 21 1333209247}}
2012-04-01 17:48:44 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:14 timed out
2012-04-01 17:48:44 nimbus [INFO] Reassigning
RealTimeAnalysis-1-1333209246 to 10 slots
2012-04-01 17:48:44 nimbus [INFO] Reassign ids: [4 14]
2012-04-01 17:48:44 nimbus [INFO] Available slots:
(["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10020]
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10023])
2012-04-01 17:48:44 nimbus [INFO] Setting new assignment for storm id
RealTimeAnalysis-1-1333209246:
#:backtype.storm.daemon.common.Assignment{:master-code-dir "/home/op/
work/storm/state/nimbus/stormdist/
RealTimeAnalysis-1-1333209246", :node->host
{"c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" "
data0.wf.ppweb.com.cn",
"80954a44-5bc6-4a67-aa98-2794447c5fd6" "
data1.wf.ppweb.com.cn",
"b5b94eb7-485a-493b-8780-d9835dab3e2e" "
data2.wf.ppweb.com.cn"}, :task-
>node+port {1 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 2
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 3
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 4 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10020], 5 ["80954a44-5bc6-4a67-aa98-2794447c5fd6"
10021], 6 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 7
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 8 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10022], 9 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81"
10022], 10 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 11
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 12
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 13
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 14
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10020], 15
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10021], 16
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 17
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 18
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 19
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10022], 20
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 21
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020]}, :task->start-time-
secs {1 1333209247, 2 1333268749, 3 1333209247, 4 1333273724, 5
1333209247, 6 1333209247, 7 1333209247, 8 1333209247, 9 1333209247, 10
1333209247, 11 1333209247, 12 1333268749, 13 1333209247, 14
1333273724, 15 1333209247, 16 1333209247, 17 1333209247, 18
1333209247, 19 1333209247, 20 1333209247, 21 1333209247}}
2012-04-01 18:19:34 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:10 timed out
2012-04-01 18:19:34 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:20 timed out
2012-04-01 18:19:34 nimbus [INFO] Reassigning
RealTimeAnalysis-1-1333209246 to 10 slots
2012-04-01 18:19:34 nimbus [INFO] Reassign ids: [10 20]
2012-04-01 18:19:34 nimbus [INFO] Available slots:
(["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021]
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10023])
2012-04-01 18:19:34 nimbus [INFO] Setting new assignment for storm id
RealTimeAnalysis-1-1333209246:
#:backtype.storm.daemon.common.Assignment{:master-code-dir "/home/op/
work/storm/state/nimbus/stormdist/
RealTimeAnalysis-1-1333209246", :node->host
{"c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" "
data0.wf.ppweb.com.cn",
"80954a44-5bc6-4a67-aa98-2794447c5fd6" "
data1.wf.ppweb.com.cn",
"b5b94eb7-485a-493b-8780-d9835dab3e2e" "
data2.wf.ppweb.com.cn"}, :task-
>node+port {1 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 2
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 3
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 4 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10020], 5 ["80954a44-5bc6-4a67-aa98-2794447c5fd6"
10021], 6 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 7
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 8 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10022], 9 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81"
10022], 10 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 11
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 12
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 13
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 14
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10020], 15
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10021], 16
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 17
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 18
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 19
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10022], 20
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 21
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020]}, :task->start-time-
secs {1 1333209247, 2 1333268749, 3 1333209247, 4 1333273724, 5
1333209247, 6 1333209247, 7 1333209247, 8 1333209247, 9 1333209247, 10
1333275574, 11 1333209247, 12 1333268749, 13 1333209247, 14
1333273724, 15 1333209247, 16 1333209247, 17 1333209247, 18
1333209247, 19 1333209247, 20 1333275574, 21 1333209247}}
2012-04-01 19:16:42 nimbus [INFO] Task RealTimeAnalysis-1-1333209246:7
timed out
2012-04-01 19:16:42 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:17 timed out
2012-04-01 19:16:42 nimbus [INFO] Reassigning
RealTimeAnalysis-1-1333209246 to 10 slots
2012-04-01 19:16:42 nimbus [INFO] Reassign ids: [7 17]
2012-04-01 19:16:42 nimbus [INFO] Available slots:
(["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023]
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10023])
2012-04-01 19:16:42 nimbus [INFO] Setting new assignment for storm id
RealTimeAnalysis-1-1333209246:
#:backtype.storm.daemon.common.Assignment{:master-code-dir "/home/op/
work/storm/state/nimbus/stormdist/
RealTimeAnalysis-1-1333209246", :node->host
{"c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" "
data0.wf.ppweb.com.cn",
"80954a44-5bc6-4a67-aa98-2794447c5fd6" "
data1.wf.ppweb.com.cn",
"b5b94eb7-485a-493b-8780-d9835dab3e2e" "
data2.wf.ppweb.com.cn"}, :task-
>node+port {1 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 2
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 3
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 4 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10020], 5 ["80954a44-5bc6-4a67-aa98-2794447c5fd6"
10021], 6 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 7
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 8 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10022], 9 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81"
10022], 10 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 11
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 12
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 13
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 14
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10020], 15
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10021], 16
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 17
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 18
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 19
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10022], 20
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 21
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020]}, :task->start-time-
secs {1 1333209247, 2 1333268749, 3 1333209247, 4 1333273724, 5
1333209247, 6 1333209247, 7 1333279002, 8 1333209247, 9 1333209247, 10
1333275574, 11 1333209247, 12 1333268749, 13 1333209247, 14
1333273724, 15 1333209247, 16 1333209247, 17 1333279002, 18
1333209247, 19 1333209247, 20 1333275574, 21 1333209247}}
2012-04-01 19:47:44 nimbus [INFO] Task RealTimeAnalysis-1-1333209246:9
timed out
2012-04-01 19:47:44 nimbus [INFO] Task
RealTimeAnalysis-1-1333209246:19 timed out
2012-04-01 19:47:44 nimbus [INFO] Reassigning
RealTimeAnalysis-1-1333209246 to 10 slots
2012-04-01 19:47:44 nimbus [INFO] Reassign ids: [9 19]
2012-04-01 19:47:44 nimbus [INFO] Available slots:
(["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022]
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10023])
2012-04-01 19:47:44 nimbus [INFO] Setting new assignment for storm id
RealTimeAnalysis-1-1333209246:
#:backtype.storm.daemon.common.Assignment{:master-code-dir "/home/op/
work/storm/state/nimbus/stormdist/
RealTimeAnalysis-1-1333209246", :node->host
{"c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" "
data0.wf.ppweb.com.cn",
"80954a44-5bc6-4a67-aa98-2794447c5fd6" "
data1.wf.ppweb.com.cn",
"b5b94eb7-485a-493b-8780-d9835dab3e2e" "
data2.wf.ppweb.com.cn"}, :task-
>node+port {1 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 2
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 3
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 4 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10020], 5 ["80954a44-5bc6-4a67-aa98-2794447c5fd6"
10021], 6 ["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 7
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 8 ["80954a44-5bc6-4a67-
aa98-2794447c5fd6" 10022], 9 ["b5b94eb7-485a-493b-8780-d9835dab3e2e"
10022], 10 ["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 11
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020], 12
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10023], 13
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10020], 14
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10020], 15
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10021], 16
["c45b8f05-4f2c-4a9f-85fb-5e64f6a31d81" 10021], 17
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10023], 18
["80954a44-5bc6-4a67-aa98-2794447c5fd6" 10022], 19
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10022], 20
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10021], 21
["b5b94eb7-485a-493b-8780-d9835dab3e2e" 10020]}, :task->start-time-
secs {1 1333209247, 2 1333268749, 3 1333209247, 4 1333273724, 5
1333209247, 6 1333209247, 7 1333279002, 8 1333209247, 9 1333280864, 10
1333275574, 11 1333209247, 12 1333268749, 13 1333209247, 14
1333273724, 15 1333209247, 16 1333209247, 17 1333279002, 18
1333209247, 19 1333280864, 20 1333275574, 21 1333209247}}
2012-04-01 20:00:28 nimbus [INFO] Delaying event :remove for 30 secs
for RealTimeAnalysis-1-1333209246
2012-04-01 20:00:28 nimbus [INFO] Updated
RealTimeAnalysis-1-1333209246 with status {:type :killed, :kill-time-
secs 30}
2012-04-01 20:00:58 nimbus [INFO] Killing topology:
RealTimeAnalysis-1-1333209246
2012-04-01 20:00:59 nimbus [INFO] Cleaning up
RealTimeAnalysis-1-1333209246
2012-04-01 20:03:24 nimbus [INFO] Uploading file from client to /home/
op/work/storm/state/nimbus/inbox/stormjar-
edb4c029-7c1b-408d-8af3-176771aafb34.jar
2012-04-01 20:03:24 nimbus [INFO] Finished uploading file from
client: /home/op/work/storm/state/nimbus/inbox/stormjar-
edb4c029-7c1b-408d-8af3-176771aafb34.jar
2012-04-01 20:03:24 nimbus [INFO] Received topology submission for
RealTimeAnalysis with conf {"
storm.id"
"RealTimeAnalysis-2-1333281804", "DB_PORT" "27017",
"Global_DB_COLLECTION" "globalState", "Partner_DB_COLLECTION"
"partnerState", "SubPartner_DB_COLLECTION" "subPartnerState",
"com.yuncheng.realtime.sourceCount" "3", "DB_NAME" "p2pstatus",
"DB_USER" "p2pstatus", "LoadBalanceSubPartner_RUN_DELAY" "10",
"SubPartner_RUN_INTERVAL" "300000", "Partner_RUN_INTERVAL" "300000",
"com.yuncheng.realtime.cluster" "wf", "Global_RUN_DELAY" "20000",
"DB_ADDRESS" "
bj2.ppweb.com.cn", "com.yuncheng.realtime.source_0"
"
192.168.0.100:10100", "topology.kryo.register" nil,
"Global_RUN_INTERVAL" "300000", "topology.workers" 10,
"com.yuncheng.realtime.source_1" "
192.168.0.101:10100",
"com.yuncheng.realtime.source_2" "
192.168.0.102:10100",
"SubPartner_RUN_DELAY" "5000", "Partner_RUN_DELAY" "10000",
"LoadBalanceSubPartner_RUN_INTERVAL" "10000", "display_port" "tcp://
192.168.0.100:20000", "DB_PASSWD" "status"}