Networking Problem in creating HDFS cluster.

481 views
Skip to first unread message

Luqman

unread,
Dec 23, 2014, 4:00:17 AM12/23/14
to google-c...@googlegroups.com
I have setup Kubernetes cluster over CoreOS, using Flannel on DigitalOcean. I have images for Hadoop Namenode, Hadoop Datanode. The datanode binds to 0.0.0.0:50010 to serve as default. 
Problem: 
Now when the datanode tries to register itself to the namenode, it sends a rpc request to namenode. Now, the namenode registers the datanode with the IP of the docker (or flannel) interface. 

I want to ask why is that the IP of the container is not used instead. Is it that the header of the request get changed during the forwarding?


Brendan Burns

unread,
Dec 23, 2014, 12:31:40 PM12/23/14
to google-c...@googlegroups.com
Hrm, I don't know enough about the details of flannel's encap/decap scheme.  My guess is that this is because there is NAT somewhere in the flannel network routing, and so the packets that the namenode is seeing are NAT-ed packets from the host rather than from the container.

On the GCE virtual network, you see source address of the container.

Perhaps the CoreOS folks who wrote flannel can give deeper insight into the expected behavior of the flannel virtual network...  (sorry!)

--brendan


--
You received this message because you are subscribed to the Google Groups "Containers at Google" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-contain...@googlegroups.com.
To post to this group, send email to google-c...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-containers.
For more options, visit https://groups.google.com/d/optout.

Eugene Yakubovich

unread,
Dec 24, 2014, 1:29:16 PM12/24/14
to google-c...@googlegroups.com
On Tuesday, December 23, 2014 9:31:40 AM UTC-8, Brendan Burns wrote:
Hrm, I don't know enough about the details of flannel's encap/decap scheme.  My guess is that this is because there is NAT somewhere in the flannel network routing, and so the packets that the namenode is seeing are NAT-ed packets from the host rather than from the container.

On Tue, Dec 23, 2014 at 1:00 AM, Luqman <lgs...@gmail.com> wrote:
I have setup Kubernetes cluster over CoreOS, using Flannel on DigitalOcean. I have images for Hadoop Namenode, Hadoop Datanode. The datanode binds to 0.0.0.0:50010 to serve as default. 
Problem: 
Now when the datanode tries to register itself to the namenode, it sends a rpc request to namenode. Now, the namenode registers the datanode with the IP of the docker (or flannel) interface. 

I want to ask why is that the IP of the container is not used instead. Is it that the header of the request get changed during the forwarding?

Brendan is right that NAT is to blame here. Docker installs NAT (Masquerade rule) for traffic originating from docker0 address range and going anywhere other than docker0. This makes sense as any traffic destined for the Internet has to be NAT'ed. In your case the datanode is sitting on docker0 with 10.244.73.1/24 so the corresponding iptables rule Docker has inserted is equivalent to:

src: 10.244.73.0/24 dst: !10.244.73.0/24 action: MASQUERADE

So traffic going through the flannel network still gets NAT'ed. As you have noticed, this is clearly not desirable. To fix this, you need to do 2 things:
1. Tell Docker not to install the masquerading rule. In Docker 1.3+, do --ip-masq=false when starting the Docker daemon.
2. Tell flannel to install the masquerading rule that is "wider": start flannel with --ip-masq=true. In your case, it will install something equivalent to:

src: 10.244.0.0/16 dst: !10.244.0.0/16 action: MASQUERADE

Traffic over flannel will not get NAT'ed but anything going outside of it still will. The only caveat is that if you're using VXLAN backend (recently added), it doesn't support --ip-masq yet (will be fixed soon).

-Eugene

Luqman

unread,
Dec 28, 2014, 1:01:01 PM12/28/14
to google-c...@googlegroups.com
Thanks, a lot Eugene and Brendan. I am looking forward for the vxlan ip-masquerade. That solved the issue.

Luqman

unread,
Dec 29, 2014, 2:38:16 AM12/29/14
to
Hi, I am back.. actually I tested the ping only in the last message. The ping packets showed correct IP of the container. But now I am testing the Hadoop cluster setup. The same issue is coming that Datanode is connecting with the Flannel interface IP. Here is the log of flannel:

Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.277014 00646 main.go:233] Installing signal handlers
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.296478 00646 main.go:191] Using 10.132.229.150 as external interface
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.301091 00646 subnet.go:317] Picking subnet in range 10.244.1.0 ... 10.244.255.0
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.375500 00646 subnet.go:80] Subnet lease acquired: 10.244.23.0/24
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.482750 00646 udp.go:227] Adding iptables rule: FLANNEL -d 10.244.0.0/16 -j ACCEPT
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.487414 00646 udp.go:227] Adding iptables rule: FLANNEL -d 224.0.0.0/4 -j ACCEPT
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.493756 00646 udp.go:227] Adding iptables rule: FLANNEL ! -o flannel0 -j MASQUERADE
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.507265 00646 udp.go:227] Adding iptables rule: POSTROUTING -s 10.244.0.0/16 -j FLANNEL
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.515950 00646 main.go:201] UDP mode initialized
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.516010 00646 udp.go:239] Watching for new subnet leases
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.516823 00646 udp.go:264] Subnet added: 10.244.32.0/24
Dec 28 18:03:17 coreos-3 flanneld[646]: I1228 18:03:17.516901 00646 udp.go:264] Subnet added: 10.244.12.0/24

The log is almost same for all the nodes.

Luqman

unread,
Dec 29, 2014, 3:47:16 AM12/29/14
to
Also take a look at the IPTABLES :

$ sudo iptables -t nat -L -n -v

Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target     prot opt in     out     source               destination         
5012  524K KUBE-PROXY  all  --  *      *      0.0.0.0/0            0.0.0.0/0           
2802  162K DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 24 packets, 1440 bytes)
pkts bytes target     prot opt in     out     source               destination         
51508 3143K KUBE-PROXY  all  --  *      *     0.0.0.0/0            0.0.0.0/0           
12789  767K DOCKER     all  --  *      *      0.0.0.0/0            !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 24 packets, 1440 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 4296  259K FLANNEL    all  --  *      *       10.244.0.0/16        0.0.0.0/0           

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  !docker0 *     0.0.0.0/0            0.0.0.0/0            tcp dpt:6062 to:10.244.32.9:6062

Chain FLANNEL (1 references)
 pkts bytes target     prot opt in   out       source               destination         
   16   960 ACCEPT     all  --  *    *         0.0.0.0/0            10.244.0.0/16       
    0     0 ACCEPT     all  --  *    *         0.0.0.0/0            224.0.0.0/4         
   20  1501 MASQUERADE  all  --  *   !flannel0 0.0.0.0/0            0.0.0.0/0           

Chain KUBE-PROXY (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.170.62          /* kubernetes */ tcp dpt:443 redir ports 51646
    0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.64.54           /* kubernetes-ro */ tcp dpt:80 redir ports 45309
    3   180 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.244.74          /* hbase-master */ tcp dpt:6060 redir ports 43020
    0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.163.130         /* hadoop-datanode */ tcp dpt:50010 redir ports 46277
    3   180 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.172.216         /* hadoop-namenode */ tcp dpt:9000 redir ports 57779
    1    60 REDIRECT   tcp  --  *      *       0.0.0.0/0            10.1.190.187         /* zookeeper */ tcp dpt:2181 redir ports 35934
Can you explain what's happening?
Message has been deleted

Eugene Yakubovich

unread,
Dec 29, 2014, 1:47:48 PM12/29/14
to google-c...@googlegroups.com
On Mon, Dec 29, 2014 at 2:52 AM, Luqman <lgs...@gmail.com> wrote:
> Shouldn't there be a rule for unwrapping the masquerade when flannel0 gets
> packets from 10.244.0.0/16?

Packets coming from 10.244.0.0/16 (flannel network) are not
masqueraded -- it's the opposite. But there is no UNMASQUERADE rule --
MASQUERADE will do the translation both ways.

Can you use something like ncat to establish a tcp connection between
two containers and then "netstat -nt" to see if the IP gets NAT'ed?
I'm not sure what would cause different masquerading behavior for TCP
and ICMP packets.

It's not connecting via k8s proxy, is it? What IP are you connecting to?

Luqman

unread,
Dec 29, 2014, 6:14:17 PM12/29/14
to
Oh, I think I have caught the issue. I am also using a Kubernetes service for reaching the namenode. I think that is why the masquerading rule is applied. I will test it tomorrow. Thanks. 

Luqman

unread,
Dec 30, 2014, 5:54:08 AM12/30/14
to
The Datanodes are connecting now, when I'm not using the Kubernetes services. Now, I'm using the Namenode Pod IP directly.
I have a follow up question. In my case, it seems that I cannot use Kubernetes services for my purpose. I have two directions in my mind:

1. Inside a Datanode container, use Kubernetes API to get the actual IP of Namenode container and use that to connect to the namenode.
2. Use flannel with --ip-masq=false, and apply IP masquerading for only the traffic leaving the network (to the Internet) using iptables.

Please, suggest what is the better way to do. The 2nd way will allow me to use services.

Tim Hockin

unread,
Dec 30, 2014, 11:47:01 AM12/30/14
to google-c...@googlegroups.com
I don't know how these APIs work, so it's hard to guess why services
don't work. Does it use the source IP as the registrant, rather than
an explicit client-provided IP? If so, then yeah, services are broken
because there is a proxy in-between.

We have discussed ways to mitigate or eliminate this, but they are all
fairly complex solutions, so will not be landing any time soon.



On Tue, Dec 30, 2014 at 2:54 AM, Luqman <lgs...@gmail.com> wrote:
> The Datanodes are connecting now, when I'm not using the Kubernetes. Now,
> I'm using the Namenode Pod IP directly.
> I have a follow up question. In my case, it seems that I cannot use
> Kubernetes services for my purpose. I have two directions in my mind:
>
> 1. Inside a Datanode container, use Kubernetes API to get the actual IP of
> Namenode and use that to connect the namenode.
> 2. Use flannel with --ip-masq=false, and apply IP masquerading for only the
> traffic leaving the network (to the Internet) using iptables.
>
> Please, suggest what is the better way to do that!
>
> On Tuesday, 23 December 2014 14:00:17 UTC+5, Luqman wrote:
>>

Luqman

unread,
Dec 30, 2014, 1:17:58 PM12/30/14
to
Tim it's not that there is a problem in services. They are forwarding traffic as they are required, but the ip-masquerading rule by flannel is causing the IP in the packets to be modified. Since, the rule says any traffic going outside flannel interface will be masqueraded, therefore, when I use a service to communicate with the master (Namenode), the rule changes the IP in the requests, so the problem. That is why I am suggesting above two solutions to the problems (for myself).

The second one seems more plausible, so I am going to use it for a while.

However, this is a real use case (to use non-masqueraded, real IPs of containers within the network) and need to be addressed I think.

Eugene Yakubovich

unread,
Dec 30, 2014, 2:45:16 PM12/30/14
to google-c...@googlegroups.com
I am actually unclear why the flannel rule gets applied:

- connection is made from a container (pod) to a 10.1.x.x service IP
from a flannel IP (10.244.x.x).
- it first hits KUBE-PROXY PREROUTING rule which redirects it to the
service proxy.
- service proxy makes a connection to namenode but the source IP at
this point should be
something like eth0 of the host. It should not be 10.244.x.x and so
should not hit flannel's POSTROUTING
rule.

Can you run iptables trace
(http://backreference.org/2010/06/11/iptables-debugging/) to see which
rules are getting hit and when?
> You received this message because you are subscribed to a topic in the
> Google Groups "Containers at Google" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-containers/P4uh7y383oo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Tim Hockin

unread,
Dec 30, 2014, 3:08:53 PM12/30/14
to google-c...@googlegroups.com
Yeah, I am still operating under the assumption that, since you use a
service, you are seeing the node's IP as src, rather than the pod's
IP.

Luqman

unread,
Jan 1, 2015, 2:55:59 AM1/1/15
to
Eugene, I followed the link you gave: I ran the following commands on node running Datanode:

$ sudo iptables -t raw -A OUTPUT -j TRACE
$ sudo iptables -t raw -A PREROUTING -j TRACE
The TRACE logs don't show up in the system logs.

and entered the following commands:

$ sudo iptables -t raw -A OUTPUT -m limit --limit 2/m --limit-burst 5 -j TRACE
$ sudo iptables -t raw -A PREROUTING -m limit --limit 2/m --limit-burst 5 -j TRACE
$ sudo iptables -t raw -A OUTPUT -m limit --limit 2/m --limit-burst 10 -j LOG
$ sudo iptables -t raw -A PREROUTING -m limit --limit 2/m --limit-burst 10 -j LOG
I have applied the limit because otherwise the node reboots. I initiated join request and copied the logs when that request was happening. Can you explain what's happening. 
See the logs and supplementary info here: https://gist.github.com/LuqmanSahaf/5a219a3c9d926b93d1d1

Can you explain it. Is it working fine?


Luqman

unread,
Jan 1, 2015, 7:04:57 AM1/1/15
to google-c...@googlegroups.com
I tweaked the above rule a little bit for OUTPUT chain:

$ sudo iptables -t raw -A OUTPUT -o flannel0 -m limit --limit 10/m --limit-burst 10 -j TRACE
and did a curl request to Namenode container through a service from Datanode container.  See the same gist: https://gist.github.com/LuqmanSahaf/5a219a3c9d926b93d1d1 :: File: trace_logs_2

Here, are the new iptables at the time of this request:
Chain PREROUTING (policy ACCEPT 6 packets, 360 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1      974 87618 KUBE-PROXY  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
2      826 49671 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 6 packets, 360 bytes)
num   pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 6 packets, 360 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     6960  423K KUBE-PROXY  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
2     3029  182K DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 6 packets, 360 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1       24  1717 FLANNEL    all  --  *      *       10.244.0.0/16        0.0.0.0/0           

Chain DOCKER (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:50010 to:10.244.93.2:50010

Chain FLANNEL (1 references)
num   pkts bytes target     prot opt in  out        source     destination         
1        5   300 ACCEPT     all  --  *   *          0.0.0.0/0  10.244.0.0/16       
2        0     0 ACCEPT     all  --  *   *          0.0.0.0/0  224.0.0.0/4         
3       19  1417 MASQUERADE all  --  *   !flannel0  0.0.0.0/0  0.0.0.0/0           

Chain KUBE-PROXY (2 references)
num pkts bytes target   prot opt in out  source     destination         
1   0    0     REDIRECT tcp  --  *  *    0.0.0.0/0  10.1.192.92 /* kubernetes */ tcp dpt:443 redir ports 55849
2   0    0     REDIRECT tcp  --  *  *    0.0.0.0/0  10.1.190.79 /* kubernetes-ro */ tcp dpt:80 redir ports 49995
3   5    300   REDIRECT tcp  --  *  *    0.0.0.0/0  10.1.53.12  /* hadoop-namenode */ tcp dpt:9000 redir ports 55430




Luqman

unread,
Jan 1, 2015, 7:37:00 AM1/1/15
to google-c...@googlegroups.com
It seems to me that the MASQUERADE rule is not applied on the first request. This is what I noticed: When I make a join request to service (10.1.53.12), the pkts of KUBE-PROXY:REDIRECT increments by 1. But further communication causes FLANNEL:MASQUERADE pkts to increment further. See the gist for further logs:: File: trace_logs_3_for_request  -->  https://gist.github.com/LuqmanSahaf/5a219a3c9d926b93d1d1

Luqman

unread,
Jan 2, 2015, 9:06:40 AM1/2/15
to google-c...@googlegroups.com
Eugene, do you know what's going on here?

Eugene Yakubovich

unread,
Jan 2, 2015, 2:21:48 PM1/2/15
to google-c...@googlegroups.com
After looking at the logs and thinking some more, I think my previous
story was not correct. It should be more like this:

- connection is made from a container (pod) to a 10.1.x.x service IP
from a flannel IP (10.244.x.x).
- it first hits KUBE-PROXY PREROUTING rule which redirects it to the
service proxy.
- service proxy makes a connection to namenode which has 10.244.x.x
(flannel) IP. Since the route for such IPs is via flannel0, the source
IP will be that of flannel0 (10.244.x.0).
- FLANNEL:MASQUERADE is NOT applied but the src IP is nevertheless
that of flannel0.

This is actually consistent with the iptables logs as there's no trace
showing FLANNEL:rule:3 (masquerade) being applied. I'm not sure what
causes FLANNEL:MASQUERADE pkts to increment (it maybe some other
comms).

Regarding how to best solve this. You proposed:

> 1. Inside a Datanode container, use Kubernetes API to get the actual IP of Namenode and use that to connect the namenode.
> 2. Use flannel with --ip-masq=false, and apply IP masquerading for only the traffic leaving the network (to the Internet) using iptables.

Option 2 is not workable as it doesn't seem to be the cause. So that
leaves option 1.

-Eugene

On Fri, Jan 2, 2015 at 6:06 AM, Luqman <lgs...@gmail.com> wrote:
> Eugene, do you know what's going on here?
>

prateek arora

unread,
Jan 16, 2015, 12:49:57 PM1/16/15
to google-c...@googlegroups.com
Hi

I am also facing the same issue with HDFS setup with kubernetes and flannel
I have used --ip-masq=false when starting the Docker daemon and start flannel with --ip-masq=true..
so right now my Datanode's is connecting to namenode using kubernet api . but when i tried with kubernet service namenode registers the datanode with the IP of the flannel interface. 


you suggested second solution :

---- Use flannel with --ip-masq=false, and apply IP masquerading for only the traffic leaving the network (to the Internet) using iptables.

can you please tell me the iptables command to apply this solution.



Eugene Yakubovich

unread,
Jan 16, 2015, 1:05:02 PM1/16/15
to google-c...@googlegroups.com
Please see my previous message in this thread. I believe the flannel IP is showing up not because of masquerade but because kube-proxy is making a connection via flannel interface.

prateek arora

unread,
Jan 16, 2015, 2:17:24 PM1/16/15
to google-c...@googlegroups.com

Thanks Eugene so is their any way to fix this issue .

Luqman

unread,
Jan 19, 2015, 2:06:35 AM1/19/15
to
@prateek, @eugene

I have moved to the 1st solution for now. But I don't think this is a permanent solution. A user might want to use services, which in this use case, he cannot.

陶征霖

unread,
Nov 11, 2015, 12:59:56 AM11/11/15
to google-c...@googlegroups.com
@Luqman, I encountered the same issue as you. But even if I moved to the 1st solution, it still doesn't work. In my case, the actual IP of Namenode container is considered to be "k8s_POD-2fdae8b2_namenode-controller-keptk_default_55b8147c-881f-11e5-abad-02d07c9f6649_e41f815f.bridge" by datanode. And datanode failed to start due to this. Do you happen to know why? Also does service work now? 

Luqman Ghani

unread,
Nov 11, 2015, 1:43:42 AM11/11/15
to google-c...@googlegroups.com
@zhenglin, I have long been using DNS to solve the problems like these. For Hadoop and HBase, the only solution seemed plausible was DNS. I used SkyDNS, and used scripts to upload IPs of every container into the etcd when the container starts (SkyDNS uses etcd). A Namenode pod, say "k8s_POD-f23f_namenode-f2ff444-4f4fsd", for instance, will save its IP into etcd like : k8s_POD-f23f_namenode-f2ff444-4f4fsd.domain.com. Then this name is injected into the Datanode containers as an env variable. I hope this gives the answer.
If you still want to follow on this, you should create a GitHub issue I guess.

On Wed, Nov 11, 2015 at 10:59 AM, 陶征霖 <zhengli...@gmail.com> wrote:
@Luqman, I encountered the same issue as you. But even if I moved to the 1st solution, it still doesn't work. In my case, the actual IP of Namenode container is considered to be "k8s_POD-2fdae8b2_namenode-controller-keptk_default_55b8147c-881f-11e5-abad-02d07c9f6649_e41f815f.bridge" by datanode. And datanode failed to start due to this. Do you happen to know why? Also does option 2 work now? 


On Monday, January 19, 2015 at 3:06:35 PM UTC+8, Luqman wrote:
@prateek, @eugene

I have moved to the 1st solution for now. But I don't think this is a permanent solution. A user might want to use services, which in this use case, he cannot.

--

陶征霖

unread,
Nov 11, 2015, 9:40:49 PM11/11/15
to Containers at Google
Thanks for your reply. It's very helpful. I will try SkyDNS in my cluster. And I will keep tracking this issue to find if service ip could be used. Will post if anything found.

在 2015年11月11日星期三 UTC+8下午2:43:42,Luqman写道:

陶征霖

unread,
Nov 12, 2015, 3:41:51 AM11/12/15
to Containers at Google
@Luqman  try use the latest kubernetes and pass the params --proxy-mode=iptables to kube-proxy start command, HDFS cluster should work now. It works in my cluster.


在 2015年11月11日星期三 UTC+8下午2:43:42,Luqman写道:
@zhenglin, I have long been using DNS to solve the problems like these. For Hadoop and HBase, the only solution seemed plausible was DNS. I used SkyDNS, and used scripts to upload IPs of every container into the etcd when the container starts (SkyDNS uses etcd). A Namenode pod, say "k8s_POD-f23f_namenode-f2ff444-4f4fsd", for instance, will save its IP into etcd like : k8s_POD-f23f_namenode-f2ff444-4f4fsd.domain.com. Then this name is injected into the Datanode containers as an env variable. I hope this gives the answer.

Luqman Ghani

unread,
Nov 15, 2015, 3:38:34 PM11/15/15
to google-c...@googlegroups.com
@zhenglin I'll try that sure. Thanks.

dharmisha doshi

unread,
Aug 26, 2016, 5:50:06 PM8/26/16
to Kubernetes user discussion and Q&A
Hi, I am facing the same issue, have you solved it? 

Huihui He

unread,
Sep 20, 2016, 8:33:01 AM9/20/16
to Kubernetes user discussion and Q&A, google-c...@googlegroups.com
Thanks a lot, I successfully solve this problem with your method, when using kubernetes service to connect namenode and datanode.
1. Update docker version:  ( from 1.9.1 to 1.12.1)
sudo curl -sSL https://get.docker.com/ | sh

2. Stop docker service, and use --ip-masq=false:
sudo service docker stop
/usr/bin/dockerd --ip-masq=false

Then it works. Sorry for my poor English.


在 2015年1月17日星期六 UTC+8上午1:49:57,prateek arora写道:

semaa...@gmail.com

unread,
Mar 20, 2017, 10:21:19 AM3/20/17
to Kubernetes user discussion and Q&A, google-c...@googlegroups.com
I had the same exact issue. You shouldn't play with ip masq. vxlan backed flannel relies on it to pass messages between pods on different nodes.

What worked for me was to set the following properties in the hdfs-site.xml:

<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
<property> <name>dfs.datanode.use.datanode.hostname</name> <value>true</value> </property>

I use a headless service called datanode. In the deployment, I also set the hostname to "datanode" in the deployment yaml file.
Reply all
Reply to author
Forward
0 new messages