HDFS connector connecting to private IP instead of hostname in multi-DC setup

272 views
Skip to first unread message

Guillaume

unread,
Sep 28, 2016, 8:59:36 AM9/28/16
to Confluent Platform
Hello,

I have 2 clusters:
- one in house with confluent (3.0.0-1)
- one in AWS, with hadoop (hdp 2.4)

I am trying to use the hdfs connector to write from confluent to hadoop.

Long story short: the connector tries to connect to a private IP of the hadoop cluster instead of using the hostname. On the in-house cluster, /etc/hosts has been updated to resolve the internal hadoop hostnames to the relevant public IP. 

I am using the distributed connector, I have a bunch of connector JSON files as follow:
{
   
"name": "sent-connector",


   
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
   
"tasks.max": "1",
   
"topics": "sent",


   
"topics.dir":"/kafka-connect/topics",
   
"logs.dir":"/kafka-connect/wal",
   
"hdfs.url": "hdfs://ambari.dp.webpower.io:8020",


   
"hadoop.conf.dir": "/etc/hadoop/conf",
   
"hadoop.home": "/usr/hdp/current/hadoop-client",


   
"flush.size": "100",


   
"hive.integration":"true",
   
"hive.metastore.uris":"thrift://ambari.dp.webpower.io:9083",
   
"schema.compatibility":"FULL",


   
"partitioner.class": "io.confluent.connect.hdfs.partitioner.HourlyPartitioner",
   
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH/",
   
"locale": "C",
   
"timezone": "UTC",


   
"rotate.interval.ms": "2000"
}


and the worker is defined as such:

rest.port=8083
bootstrap
.servers=<eth0 IP of the server>:9092
group.id=dp2hdfs
key
.converter=io.confluent.connect.avro.AvroConverter
key
.converter.schema.registry.url=schemareg.dpe.webpower.io
value
.converter=io.confluent.connect.avro.AvroConverter
value
.converter.schema.registry.url=schemareg.dpe.webpower.io
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
config
.storage.topic=k2hdfs-configs
offset
.storage.topic=k2hdfs-offsets
status
.storage.topic=k2hdfs-statuses
debug
=true


A few notes: 
  • /kafka-connect exists on hdfs, world-writeable
  • the 3 topics (*.storage.topic) do exist
  • I have one worker running on each (3) servers with kafka broker (there is a schema registry, rest API and zookeeper server as well on all brokers)
  • I have set dfs.client.use.datanode.hostname  to true, and this property is set up on the client in $HADOOP_HOME/hdfs-site.xml

I see that the subdirectories of /kafka-connect are created as well as hive metadata. When I start the connector, the message is:

Sep 28 14:34:41 prod-nl-kafka2 connect-distributed[26492]: [2016-09-28 14:34:41,893] WARN Hive table already exists: default.sent (io.confluent.connect.hdfs.hive.HiveMetaStore:198)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: [2016-09-28 14:35:41,987] INFO Exception in createBlockOutputStream (org.apache.hadoop.hdfs.DFSClient:1471)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.231:50010]
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1610)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
Sep 28 14:35:41 prod-nl-kafka2 connect-distributed[26492]: [2016-09-28 14:35:41,989] INFO Abandoning BP-429601535-10.0.0.167-1471011443948:blk_1073745108_4284 (org.apache.hadoop.hdfs.DFSClient:1364)
Sep 28 14:35:42 prod-nl-kafka2 connect-distributed[26492]: [2016-09-28 14:35:42,013] INFO Excluding datanode 10.0.0.231:50010 (org.apache.hadoop.hdfs.DFSClient:1368)
Sep 28 14:36:42 prod-nl-kafka2 connect-distributed[26492]: [2016-09-28 14:36:42,084] INFO Exception in createBlockOutputStream (org.apache.hadoop.hdfs.DFSClient:1471)

[rince and repeat with other datanodes]

Any idea on how to fix this? It looks like confluent receives the IP directly, not a hostname. 

Thanks,

 
Reply all
Reply to author
Forward
0 new messages