Hostname <-> IP resolution troubles.

8 views
Skip to first unread message

Michael Brauwerman

unread,
Sep 20, 2011, 9:07:45 PM9/20/11
to brisk...@googlegroups.com

Hello fellow Briskers,

My team has been struggling with hostname to IP resolution in our Brisk cluster.  (Single-node cluster seems to run fine.)

We have worked through various difficulties with listen_address and jobtrackers and InetAddress.getLocalhost, and we are down to one major blocker in this area:

When we run a MapReduce job generate from a Hive query, the reduce fails. Specifically, we get a FileNotFoundException when the reducer on node-1 (the seed/jobtracker) attempts to fetch mapOutput for a task that was completed (and is visible in the tasktracker web UI) on node-2.

The problem seems to be that node-1 looks for http://localhost.localdomain:50060/mapOutput?...
instead of http://node-2:50600.   (Note that jobtracker and map task distribution and execution work seemed to mostly complete OK, but then fetching output fails.)

I suspect that's he root cause is that InetAddress.getLocalHost returns 127.0.0.1 which maps back to localhost.localdomain.

(My DataCenter team assures me this is correct networking interface configuration, that the IP address of $(hostname) should always be 127.0.0.1, even when $(hostname) is a public "node-2.cloud.redfin.com" name.)

As such, I have already set my nodes' cassandra.yaml  listen_address values to be an IP like 10.13.0.102 etc, because blank and localhost lead to 127.0.0.1 being stored in Cassandra as the jobtracker. )

Could someone post a sample /etc/hosts file that shows a recommended DNS configuration for a Brisk node?

Or any other suggestions for fixing the domain names exposed to the reducers?

Thank you,

-Mike Brauwerman

Michael Brauwerman

unread,
Sep 21, 2011, 9:58:59 PM9/21/11
to brisk...@googlegroups.com
>. The problem seems to be that node-1 looks for http://localhost.localdomain:50060/mapOutput?... instead of http://node-2:50600

I worked around this problem by changing 127.0.0.1 to the machine's public IP, on each machine in the cluster.

So I guess I need to negotiate a root solution with my Operations team, to figure out how to balance Brisk's needs with the rest of the machine's needs.

If anyone has had a similar experience, or can share any advice for me or for my Operations team (to convince them that public IPs are OK in /etc/hosts?), I would very much appreciate it!

-Mike Brauwerman
--
Mike Brauwerman
Data Team, Redfin
Reply all
Reply to author
Forward
0 new messages