Storm 0.8.2 - stormconf.ser does not exist

2,901 views
Skip to first unread message

Michael Rose

unread,
Feb 5, 2013, 4:04:12 PM2/5/13
to storm...@googlegroups.com
Hey Nathan,

We've been continuing to get supervisors which go down and have issues with not finding stormconf.ser (leading to perpetual reloads).

If the bug has been fixed, where should I be looking to fix this issue? It's a brand new Storm 0.8.2 cluster with 8 supervisors and a brand new ZooKeeper.

It's every few days and always generally two of them. The topology uses two workers. It always seems to be after a long string of workers not starting up in time.

2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181 sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@5e725967
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb5, negotiated timeout = 20000
2013-02-05 20:56:08 zookeeper [INFO] Zookeeper state update: :connected:none
2013-02-05 20:56:08 ClientCnxn [INFO] EventThread shut down
2013-02-05 20:56:08 ZooKeeper [INFO] Session: 0x13c6e2bce917bb5 closed
2013-02-05 20:56:08 CuratorFrameworkImpl [INFO] Starting
2013-02-05 20:56:08 ZooKeeper [INFO] Initiating client connection, connectString=10.38.9.44:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@74c12978
2013-02-05 20:56:08 ClientCnxn [INFO] Opening socket connection to server /10.38.9.44:2181
2013-02-05 20:56:08 ClientCnxn [INFO] Socket connection established to ip-10-38-9-44.ec2.internal/10.38.9.44:2181, initiating session
2013-02-05 20:56:08 ClientCnxn [INFO] Session establishment complete on server ip-10-38-9-44.ec2.internal/10.38.9.44:2181, sessionid = 0x13c6e2bce917bb6, negotiated timeout = 20000
2013-02-05 20:56:08 supervisor [INFO] Starting supervisor with id 05fe7be7-2971-4a7a-9cfc-275146ff48de at host ip-10-82-50-86.ec2.internal
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id f3dad62e-8819-4571-8f26-a6497772325a. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:f3dad62e-8819-4571-8f26-a6497772325a
2013-02-05 20:56:09 supervisor [INFO] Shutting down and clearing state for id d48cc968-8be8-48f1-81df-33e9e41fa8c0. Current supervisor time: 1360097769. State: :disallowed, Heartbeat: nil
2013-02-05 20:56:09 supervisor [INFO] Shutting down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Shut down 05fe7be7-2971-4a7a-9cfc-275146ff48de:d48cc968-8be8-48f1-81df-33e9e41fa8c0
2013-02-05 20:56:09 supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "cherry-pitter-import-staging-528-1360097475", :executors ([3 3] [35 35] [67 67] [99 99] [131 131] [5 5] [37 37] [69 69] [101 101] [133 133] [7 7] [39 39] [71 71] [103 103] [135 135] [9 9] [41 41] [73 73] [105 105] [137 137] [11 11] [43 43] [75 75] [107 107] [139 139] [13 13] [45 45] [77 77] [109 109] [141 141] [15 15] [47 47] [79 79] [111 111] [17 17] [49 49] [81 81] [113 113] [19 19] [51 51] [83 83] [115 115] [21 21] [53 53] [85 85] [117 117] [23 23] [55 55] [87 87] [119 119] [25 25] [57 57] [89 89] [121 121] [27 27] [59 59] [91 91] [123 123] [29 29] [61 61] [93 93] [125 125] [31 31] [63 63] [95 95] [127 127] [1 1] [33 33] [65 65] [97 97] [129 129])} for this supervisor 05fe7be7-2971-4a7a-9cfc-275146ff48de on port 6703 with id 0f3342c4-f2ca-46db-80f6-70b5c5bd95e4
2013-02-05 20:56:09 event [ERROR] Error when processing event
java.io.FileNotFoundException: File '/mnt/storm/supervisor/stormdist/cherry-pitter-import-staging-528-1360097475/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
at backtype.storm.daemon.supervisor$fn__4793.invoke(supervisor.clj:414)
at clojure.lang.MultiFn.invoke(MultiFn.java:177)
at backtype.storm.daemon.supervisor$sync_processes$iter__4684__4688$fn__4689.invoke(supervisor.clj:249)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:473)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$dorun.invoke(core.clj:2725)
at clojure.core$doall.invoke(core.clj:2741)
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:237)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at backtype.storm.event$event_manager$fn__2507.invoke(event.clj:24)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:662)
2013-02-05 20:56:09 util [INFO] Halting process: ("Error when processing an event")

e.x. of not starting up in time:

supervisor/stormdist/contact-pull-1059-491-1360084684/stormjar.jar backtype.storm.daemon.worker contact-pull-1059-491-1360084684 b489ee56-12cf-423a-8eb1-794d04c329ef 6702 58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 17:18:16 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:17 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:18 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:19 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 17:18:20 supervisor [INFO] 58c2eba0-a51d-4baf-95db-6c18538ed5a9 still hasn't started
2013-02-05 18:00:55 supervisor [INFO] Removing code for storm id contact-pull-1059-491-1360084684
2013-02-05 18:00:55 supervisor [INFO] Shutting down and clearing state for id 58c2eba0-a51d-4baf-95db-6c18538ed5a9. Current supervisor time: 1360087255. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1360087255, :storm-id "contact-pull-1059-491-1360084684", :executors #{[3 3] [35 35] [67 67] [99 99] [131 131] [163 163] [195 195] [227 227] [259 259] [291 291] [323 323] [355 355] [7 7] [39 39] [71 71] [103 103] [135 135] [167 167] [199 199] [231 231] [263 263] [295 295] [327 327] [359 359] [11 11] [43 43] [75 75] [107 107] [139 139] [171 171] [203 203] [235 235] [267 267] [299 299] [331 331] [363 363] [15 15] [47 47] [79 79] [111 111] [143 143] [175 175] [207 207] [239 239] [271 271] [303 303] [335 335] [19 19] [51 51] [83 83] [115 115] [147 147] [179 179] [211 211] [243 243] [275 275] [307 307] [339 339] [23 23] [55 55] [87 87] [119 119] [151 151] [183 183] [215 215] [247 247] [279 279] [311 311] [343 343] [27 27] [59 59] [91 91] [123 123] [155 155] [187 187] [219 219] [251 251] [283 283] [315 315] [347 347] [31 31] [63 63] [95 95] [127 127] [159 159] [191 191] [223 223] [255 255] [287 287] [319 319] [351 351]}, :port 6702}
2013-02-05 18:00:55 supervisor [INFO] Shutting down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:00:55 supervisor [INFO] Shut down b489ee56-12cf-423a-8eb1-794d04c329ef:58c2eba0-a51d-4baf-95db-6c18538ed5a9
2013-02-05 18:21:12 supervisor [INFO] Downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463
2013-02-05 18:21:20 supervisor [INFO] Finished downloading code for storm id contact-pull-1067-504-1360088463 from /mnt/storm/nimbus/stormdist/contact-pull-1067-504-1360088463

-- 
Michael Rose (@Xorlev)
Senior Platform Engineer, FullContact
mic...@fullcontact.com

Message has been deleted

Michael Rose

unread,
Feb 7, 2013, 6:48:09 PM2/7/13
to storm...@googlegroups.com

They are not installed together.

Not sent from my iPhone

On Feb 7, 2013 4:04 PM, "ttyunix ttyunix" <tty...@gmail.com> wrote:
the nimbus and the supervisor don't install one machine together.
--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Message has been deleted

Matthew Gordon

unread,
Feb 11, 2013, 6:33:38 PM2/11/13
to storm...@googlegroups.com
We have also been seeing this on 0.7.3 so it doesn't look like a new
problem. Would be great to get a fix for this.


On Fri, Feb 8, 2013 at 5:57 AM, Enno Shioji <esh...@gmail.com> wrote:
> I've seen the same message on my cluster.
> I stopped all services and nuked all states from my cluster (storm temp
> directory etc.) and restarted them. That made it go away.

Michael Rose

unread,
Feb 11, 2013, 6:39:53 PM2/11/13
to storm...@googlegroups.com
Supposedly this was fixed in 0.8.2, but we're still experiencing it on a regular basis. Brand new cluster, no state shared with old 0.7.2 cluster.

-- 
Michael Rose (@Xorlev)
Senior Platform Engineer, FullContact
mic...@fullcontact.com

Nathan Marz

unread,
Feb 18, 2013, 10:01:10 PM2/18/13
to storm...@googlegroups.com
Hi Michael,

I believe I've now fixed this bug. Try out 0.9.0-wip16 and let me know how it goes. Alternatively you can apply this patch to 0.8.2 and build your own release with only this change:


Let me know how it goes.

-Nathan
Twitter: @nathanmarz
http://nathanmarz.com

Michael Rose

unread,
Feb 18, 2013, 10:22:33 PM2/18/13
to storm...@googlegroups.com
Thanks Nathan!

I'll roll a release of 0.8.2 patched out tomorrow and see how things go.

-- 
Michael Rose (@Xorlev)
Senior Platform Engineer, FullContact
mic...@fullcontact.com

Viral Bajaria

unread,
Feb 25, 2013, 4:30:37 PM2/25/13
to storm...@googlegroups.com, nat...@nathanmarz.com
I faced this issue when running 0.9.0 wip15, this is what I did to repro the issue:

- start supervisor on a node
- submit a topology
- wait for workers to start running
- kill supervisor ---> for some reason some workers also died, but the topology kept on running
- kill topology (this step is optional)
- restart supervisor ---> this fails with the stormconf.ser does not exist

Even after creating the directory, it still had issues starting up. I had to kill all workers on that node and then restart supervisor for it to start working.

Thanks,
Viral

On Mon, Feb 25, 2013 at 3:24 AM, vonPuh fonPuhendorf <vonpuhfon...@gmail.com> wrote:
Confirm the issue is resolved tried 0.9.0wip15 and then 0.9.0wip16 tested and worked.Thanks.

Nathan Marz

unread,
Feb 25, 2013, 4:42:25 PM2/25/13
to Viral Bajaria, storm...@googlegroups.com
It's fixed in 0.9.0-wip16, not in 0.9.0-wip15.

Viral Bajaria

unread,
Feb 25, 2013, 5:15:36 PM2/25/13
to Nathan Marz, storm...@googlegroups.com
ahh.. sorry... I read the previous email as saying that the tests worked for both wip15 and wip16

Thanks Nathan.

Richards Peter

unread,
Apr 25, 2013, 8:28:01 AM4/25/13
to storm...@googlegroups.com
Hi,

I am interested to know about the release date of storm 0.8.3 and storm 0.9.0. We faced this issue with storm 0.8.2 today. We have experienced it in some of the previous releases of storm also. The changelog file in storm webpage says that the issue is fixed in storm 0.8.3 also. So I am keen to know about the release dates of these builds.

Thanks,
Richards Peter.

Michael Rose

unread,
Apr 25, 2013, 9:54:45 AM4/25/13
to storm...@googlegroups.com
You can download 0.8.3-wip3 from storm-project.net which contains this fix. 0.8.3-wip3 only has a few bugfixes so far.

-- 
Michael Rose (@Xorlev)
Senior Platform Engineer, FullContact
mic...@fullcontact.com

--

Richards Peter

unread,
May 13, 2013, 7:39:44 AM5/13/13
to storm...@googlegroups.com
Hi Nathan,

I would like to verify whether the fix mentioned in https://github.com/nathanmarz/storm/blob/master/CHANGELOG.md for storm 0.8.3 is the one related this issue. I could also see similar log for storm 0.8.2. So I am little bit confused about the status of this issue. Are the fixes on storm 0.8.2 and storm 0.8.3-wip partial fixes?

Is this issue fixed in both storm-0.8.3-wip and storm-0.9.0-wip16 or only in storm-0.9.0-wip16?

Richards Peter.

Michael Rose

unread,
May 13, 2013, 10:01:38 AM5/13/13
to storm...@googlegroups.com

We upgraded to 0.8.3-wip3 , it is fixed as far as we can tell. We've encountered it once but haven't been able to replicate it (the day we upgraded). After clearing ZK and worker directories its been smooth sailing

-- Sent from mobile

--

Patricio Echagüe

unread,
May 13, 2013, 11:29:58 AM5/13/13
to storm-user

We are in 0.8.3 wip3 as well and ran into the same issue. But it only happened once in like a month.

Allan C

unread,
May 13, 2013, 12:46:17 PM5/13/13
to storm...@googlegroups.com
We haven't seen it since we upgraded to 0.8.3-wip a few months ago.
You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to storm-user+...@googlegroups.com.

caim...@gmail.com

unread,
Aug 5, 2013, 3:43:20 AM8/5/13
to storm...@googlegroups.com

I use the 0.9.0-wip16 and i have the same problem!


在 2013年2月19日星期二UTC+8上午11时01分10秒,Nathan Marz写道:

姚仁捷

unread,
Aug 12, 2013, 4:21:05 AM8/12/13
to storm...@googlegroups.com
In my environment, this problem usually happened in storm-0.8.2, and after upgrade to 0.9.0, it works well.

在 2013年2月6日星期三UTC+8上午5时04分12秒,Michael Rose写道:

Quinton Anderson

unread,
Aug 13, 2013, 5:25:30 PM8/13/13
to storm...@googlegroups.com
In 0.9.0-wip16, I did see the problem until I cleared all the state from supervisor nodes and zookeeper and then restarted. But the problem then went away with 0.9.0-wip16, I validated by going back to 0.8.2 and the defect came back. 

So, upgrade, clear your state and then try again. Note that I did still see the errors in the log, but this was a transient effect, a warning rather than an error. 

Billy Watson

unread,
Sep 24, 2013, 8:42:55 AM9/24/13
to storm...@googlegroups.com
I am on 0.8.3 and have just seen this error. Is it supposed to be fixed in this version? 

java.io.FileNotFoundException: File '/mnt/storm/supervisor/stormdist/topology_master-1-1379992108/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
at backtype.storm.daemon.worker$fn__4322$exec_fn__1202__auto____4323.invoke(worker.clj:332)
at clojure.lang.AFn.applyToHelper(AFn.java:185)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:601)
at backtype.storm.daemon.worker$fn__4322$mk_worker__4378.doInvoke(worker.clj:323)
at clojure.lang.RestFn.invoke(RestFn.java:512)
at backtype.storm.daemon.worker$_main.invoke(worker.clj:433)
at clojure.lang.AFn.applyToHelper(AFn.java:172)
at clojure.lang.AFn.applyTo(AFn.java:151)
at backtype.storm.daemon.worker.main(Unknown Source)

Thomas Söhngen

unread,
Sep 25, 2013, 7:41:07 AM9/25/13
to storm...@googlegroups.com
I would really like to know if this is fixed too. This error is the biggest issue we have with Storm atm. We have a lot of Topologies and this error occurs quite often. The only workaround we found is to wipe the Storm and zookeeper data dirs on the whole cluster and resubmit every Topology, which is veeery time-consuming when you have over 20 Topologies running on your cluster.

We are looking forward to 0.8.3 mostly to see this fixed! It's the cause of continuing annoyance and downtime!
-- 
Thomas Söhngen

Office: +49 221 294 975 20
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
StockPulse GmbH
Sitz der Gesellschaft: Köln
Amtsgericht: Köln (HRB 72529)
Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse GmbH
Registered Office: Cologne
District Court: Cologne HRB (72529)
Managing Director: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Quinton Anderson

unread,
Sep 25, 2013, 2:31:33 PM9/25/13
to storm...@googlegroups.com
It was fixed properly in 0.9.0-wip16 
You re To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


-- 
Thomas Söhngen

Office: +49 221 294 975 20
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
StockPulse GmbH
Sitz der Gesellschaft: Köln
Amtsgericht: Köln (HRB 72529)
Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse GmbH
Registered Office: Cologne
District Court: Cologne HRB (72529)
Managing Director: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

--
You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to storm-user+...@googlegroups.com.

Jon

unread,
Oct 18, 2013, 3:18:16 PM10/18/13
to storm...@googlegroups.com
I just had this happen in 0.9.0-rc2
You re To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


-- 
Thomas Söhngen

Office: +49 221 294 975 20
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
StockPulse GmbH
Sitz der Gesellschaft: Köln
Amtsgericht: Köln (HRB 72529)
Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse GmbH
Registered Office: Cologne
District Court: Cologne HRB (72529)
Managing Director: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

--
You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/f_92YdijmJQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to storm-user+unsubscribe@googlegroups.com.

P. Taylor Goetz

unread,
Oct 18, 2013, 6:30:20 PM10/18/13
to storm...@googlegroups.com, storm...@googlegroups.com
Did you clear all zookeeper and local storm state?

We used to see this on a regular basis under 0.8.x, but since upgrading to 0.9.0-rc2 we haven't seen it at all. 

When upgrading, clearing state is very important. You also want to make sure all processes from the previous version are killed. 

-Taylor
--
P. Taylor Goetz
Software Architect
 
Health Market Science
The Science of Better Results
2700 Horizon Drive •    King of Prussia, PA •   19406

Jon

unread,
Oct 18, 2013, 6:35:10 PM10/18/13
to storm...@googlegroups.com
Restarting everything seemed to fix it -- we had weird issues where it chewed through all of the Zookeeper connections though, with Nimbus then not starting, and had to restart Zookeeper as well.

I didn't clear state before updating to 0.9, but this specific topology did not exist before in 0.8.


Sort of an aside, but it might be worthwhile to have an easy way to clear state (unless their is, and I'm unaware of it). Maybe I'll throw something together and make a pull request for it.
Reply all
Reply to author
Forward
0 new messages