Warning when having multiple workers.

Shrikar archak

unread,

Oct 13, 2012, 3:32:06 PM10/13/12

to storm...@googlegroups.com

Hi All,

I need some help with setting the number of workers in a cluster. I get a lot of warning messages when I change the number of workers.

Setup:

1) Nimbus on a seperate host.

2) 2 supervisors on 2 different nodes. ( I am not providing any ports for supervisor in storm.yaml , so I assume its 4 by default.)

When I dont provide the worker setting. There is only one worker and I dont see these warnings.

conf.setMaxTaskParallelism(80);

conf.setNumAckers(4);

for the above setting I dont see any warnings.

Problematic settings.

conf.setMaxTaskParallelism(80);

conf.setNumAckers(4);

conf.setNumWorkers(4);

Lots of Warning messages :

2012-10-12 18:25:16 worker [WARN] Received invalid messages for unknown tasks. Dropping...

2012-10-12 18:25:17 worker [WARN] Received invalid messages for unknown tasks. Dropping...

PS : the storm-local-dir is not shared among supervisors since they are on different nodes.

Am I missing something? Any help greatly appreciated.

Thanks,

Shrikar

Shrikar archak

unread,

Oct 14, 2012, 8:16:25 PM10/14/12

to storm...@googlegroups.com

Has anyone faced similar problems?

Nathan Marz

unread,

Oct 14, 2012, 8:33:16 PM10/14/12

to storm...@googlegroups.com

It sounds like you have something wrong with your networking setup. Take a look at this page: https://github.com/nathanmarz/storm/wiki/Troubleshooting

--
Twitter: @nathanmarz
http://nathanmarz.com

Shrikar archak

unread,

Jan 5, 2013, 2:03:41 AM1/5/13

to storm...@googlegroups.com

Hi Saisai,

It was a networking problem. Make sure your cluster machine have unique host names

and also have a mapping in you /etc/hosts file as below.

Naming convention which I used

TYPE : hostname

Nimbus : nimbus

Supervisor : supervisor-1

Supervisor : supervisor-2

Supervisor : supervisor-3

Supervisor : supervisor-4

example /etc/hosts file

192.168.8.122 nimbus

192.168.8.123 supervisor-1

192.168.8.124 supervisor-2

192.168.8.125 supervisor-3

192.168.8.126 supervisor-4

Hope that helps.

Thanks,

Shrikar

On Tuesday, December 25, 2012 5:07:14 PM UTC-8, 邵赛赛 wrote:

Hi Shrikar,
I also meet this problem, could you please tell me how you solve this problem? it was so confused and I have no any clue.
Any help greatly appreciate.

Thanks
Saisai

在 2012年10月14日星期日UTC+8上午3时32分06秒，Shrikar archak写道：

Shrikar archak

unread,

May 1, 2013, 7:07:10 PM5/1/13

to storm...@googlegroups.com

I remember fixing this problem by doing this.

1) Make sure the hostname of all the supervisor and nimbus are different. In your case I see both the supervisors are localhost

2) The above /etc/host should have similar entries on the supervisors too.

Hope that helps.

Thanks,

Shrikar

On Wed, May 1, 2013 at 11:55 AM, Rich Schumacher <rich...@gmail.com> wrote:

I have a similar setup and am experiencing the same problem.

I have three AWS EC2 instances (provisioned by Chef/OpsWorks), one Nimbus and two Supervisor nodes. When I execute the ExclamationTopology from the tutorial on a single Supervisor it works as expected. However, if this topology executes on both Supervisors I see a lot of the dropped messages error, mixed in with stdout from the PrintingBolt I'm using.

[2013-05-01 16:47:03,651] worker [WARN] Received invalid messages for unknown tasks. Dropping...
[2013-05-01 16:47:03,652] worker [WARN] Received invalid messages for unknown tasks. Dropping...

[2013-05-01 16:47:03,655] worker [WARN] Received invalid messages for unknown tasks. Dropping...
[2013-05-01 16:47:03,658] STDIO [INFO] source: exclaim2:6, stream: default, id: {}, [jackson!!!!!!]
[2013-05-01 16:47:03,756] STDIO [INFO] source: exclaim2:9, stream: default, id: {}, [mike!!!!!!]
[2013-05-01 16:47:03,758] worker [WARN] Received invalid messages for unknown tasks. Dropping...

I've read the Troubleshooting wiki as well as the above recommendations for setting /etc/hosts properly but to no avail. AWS OpsWorks automatically adds records to /etc/hosts, so each machine has something very similar to the following:

# This file was generated by OpsWorks
# any manual changes will be removed on the next update.

127.0.0.1 localhost localhost.localdomain

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

ff02::3 ip6-allhosts

# OpsWorks Layer State
127.0.0.1 storm-nimbus.localdomain storm-nimbus
10.141.143.158 storm-nimbus
<public IP> storm-nimbus-ext

10.137.30.236 storm-supervisor-1
<public IP> storm-supervisor-1-ext
10.142.132.134 storm-supervisor-2
<public IP> storm-supervisor-2-ext
10.209.138.3 zookeeper-1

<public IP> zookeeper-1-ext
10.254.226.114 zookeeper-2
<public IP> zookeeper-2-ext
10.209.135.84 zookeeper-3
<public IP> zookeeper-3-ext

As far as I can tell, everything in here is sane and works correctly.

The one thing that sticks out in my mind is that the Storm UI indicates both Supervisor nodes report themselves as "localhost" (http://cl.ly/image/1I1D2D0W3T46). Likewise, I notice that whenever any of the Storm components (Nimbus and Supervisor) start up the log output from the ZooKeeper client lists their hostname as localhost, as can be seen here:

[2013-05-01 17:36:30,742] ZooKeeper [INFO] Client environment:host.name=localhost

Does anyone have any ideas about what is happening here?

Thanks for listening!

--
You received this message because you are subscribed to a topic in the Google Groups "storm-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-user/DgIDxawBczg/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rich Schumacher

unread,

May 1, 2013, 9:29:13 PM5/1/13

to storm...@googlegroups.com

I managed to track this down not long after I posted and it was definitely the localhost issue. I stumbled across the STORM_LOCAL_HOSTNAME property in backtype.storm.Config which states that the default value is the result of InetAddress.getLocalHost().getCanonicalHostName(). Sure enough, on instances launched via OpsWorks in AWS this value is always localhost. I'm not entirely sure why this is but instances launched via EC2 directly have the appropriate value.

In any event, the fix was to explicitly set the storm.local.hostname in storm.yaml. After that everything worked fine.

Thanks for the help and I hope this will help someone else out in the future.

Reply all

Reply to author

Forward