We have a very similar problem. A topology was working flawlessly, then one day it hung.
Nimbus & supervisor start up OK. Only when we try to deploy the topology there is an error & it doesn't succeed.
- Restarting ==> does not help
- Cleaning storm data directories + restarting ==> does not help
- Cleaning zookeeper + cleaning storm data directories + restarting ==> does not help.
Nimbus log (fragments):
2013-10-16 11:14:12 b.s.d.nimbus [DEBUG] Assignment for EsperTopology-1-1381914842 hasn't changed
2013-10-16 11:14:12 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 82,12 replyHeader:: 82,22,0 request:: '/storm/workerbeats,F response:: v{'EsperTo
pology-1-1381914842},s{13,13,1381914838566,1381914838566,0,1,0,0,1,1,18}
2013-10-16 11:14:12 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 83,12 replyHeader:: 83,22,0 request:: '/storm/errors,F response:: v{},s{14,14,138
1914838574,1381914838574,0,0,0,0,1,0,14}
2013-10-16 11:14:12 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 84,12 replyHeader:: 84,22,0 request:: '/storm/storms,F response:: v{'EsperTopolog
y-1-1381914842},s{10,10,1381914838538,1381914838538,0,1,0,0,1,1,19}
2013-10-16 11:14:19 o.a.z.ClientCnxn [DEBUG] Got ping response for sessionid: 0x141c08aef2e0003 after 1ms
2013-10-16 11:14:23 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 85,12 replyHeader:: 85,40,0 request:: '/storm/storms,F response:: v{},s{10,10,138
1914838538,1381914838538,0,2,0,0,1,0,31}
2013-10-16 11:14:23 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 86,12 replyHeader:: 86,40,0 request:: '/storm/assignments,F response:: v{},s{9,9,
1381914838531,1381914838531,0,2,0,0,1,0,30}
2013-10-16 11:14:23 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 87,12 replyHeader:: 87,40,0 request:: '/storm/supervisors,F response:: v{'8346470
3-06e0-4697-b8d6-08f34bfce926},s{11,11,1381914838549,1381914838549,0,1,0,0,1,1,16}
2013-10-16 11:14:23 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0003, packet:: clientPath:null serverPath:null finished:false header:: 88,3 replyHeader:: 88,40,0 request:: '/storm/supervisors/83464703-06e0-4697-b8d6-0
8f34bfce926,F response:: s{16,32,1381914838648,1381914861412,5,0,0,90565170474909698,957,0,16}
Supervisor log:
2013-10-16 11:14:19 b.s.d.supervisor [INFO] 7aab2452-23d3-4084-9e81-d9c79a07b016 still hasn't started
2013-10-16 11:14:20 b.s.d.supervisor [INFO] 7aab2452-23d3-4084-9e81-d9c79a07b016 still hasn't started
2013-10-16 11:14:20 b.s.d.supervisor [INFO] 7aab2452-23d3-4084-9e81-d9c79a07b016 still hasn't started
2013-10-16 11:14:21 o.a.z.ClientCnxn [DEBUG] Got notification sessionid:0x141c08aef2e0002
2013-10-16 11:14:21 o.a.z.ClientCnxn [DEBUG] Got WatchedEvent state:SyncConnected type:NodeDeleted path:/assignments/EsperTopology-1-1381914842 for sessionid 0x141c08aef2e0002
2013-10-16 11:14:21 o.a.z.ClientCnxn [DEBUG] Got notification sessionid:0x141c08aef2e0002
2013-10-16 11:14:21 o.a.z.ClientCnxn [DEBUG] Got WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/assignments for sessionid 0x141c08aef2e0002
2013-10-16 11:14:21 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e0002, packet:: clientPath:null serverPath:null finished:false header:: 36,12 replyHeader:: 36,30,0 request:: '/storm/assignments,T response:: v{},s{9,9,
1381914838531,1381914838531,0,2,0,0,1,0,30}
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] Synchronizing supervisor
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] Storm code map: {}
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] Downloaded storm ids: #{"EsperTopology-1-1381914842"}
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] All assignment:
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] New assignment: {}
2013-10-16 11:14:21 b.s.d.supervisor [DEBUG] Writing new assignment {}
2013-10-16 11:14:21 b.s.d.supervisor [INFO] Removing code for storm id EsperTopology-1-1381914842
2013-10-16 11:14:21 b.s.util [DEBUG] Rmr path /data/storm/supervisor/stormdist/EsperTopology-1-1381914842
Worker log (similar for all workers):
2013-10-16 11:14:22 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e000a, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,38,0 request:: '/storm/assignments,F response:: s{9,9,1381914838531,1381914838531,0,2,0,0,1,0,30}
2013-10-16 11:14:22 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e000a, packet:: clientPath:null serverPath:null finished:false header:: 2,3 replyHeader:: 2,38,0 request:: '/storm/storms,F response:: s{10,10,1381914838538,1381914838538,0,2,0,0,1,0,31}
2013-10-16 11:14:22 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e000a, packet:: clientPath:null serverPath:null finished:false header:: 3,3 replyHeader:: 3,39,0 request:: '/storm/supervisors,F response:: s{11,11,1381914838549,1381914838549,0,1,0,0,1,1,16}
2013-10-16 11:14:22 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e000a, packet:: clientPath:null serverPath:null finished:false header:: 4,3 replyHeader:: 4,39,0 request:: '/storm/workerbeats,F response:: s{13,13,1381914838566,1381914838566,0,2,0,0,1,0,33}
2013-10-16 11:14:22 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x141c08aef2e000a, packet:: clientPath:null serverPath:null finished:false header:: 5,3 replyHeader:: 5,39,0 request:: '/storm/errors,F response:: s{14,14,1381914838574,1381914838574,0,0,0,0,1,0,14}
2013-10-16 11:14:23 b.s.d.worker [ERROR] Error on initialization of server mk-worker
java.io.FileNotFoundException: File '/data/storm/supervisor/stormdist/EsperTopology-1-1381914842/stormconf.ser' does not exist