Hi Guys,
I'm running batch tool with the following configuration:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<value>cluster</value>
<comment>
Name of the source cluster. It can be an arbitrary string and is used in
logs, tags, etc.
</comment>
</property>
<property>
<name>airbnb.reair.clusters.src.metastore.url</name>
<value>thrift://host:9083</value>
<comment>Source metastore Thrift URL.</comment>
</property>
<property>
<name>airbnb.reair.clusters.src.hdfs.root</name>
<value>hdfs:///host:8020/</value>
<comment>Source cluster HDFS root. Note trailing slash.</comment>
</property>
<property>
<name>airbnb.reair.clusters.src.hdfs.tmp</name>
<value>hdfs:///tmp/replication</value>
<comment>
Directory for temporary files on the source cluster.
</comment>
</property>
<property>
<value>cluster</value>
<comment>
Name of the source cluster. It can be an arbitrary string and is used in
logs, tags, etc.
</comment>
</property>
<property>
<name>airbnb.reair.clusters.dest.metastore.url</name>
<value>thrift://host:9083</value>
<comment>Destination metastore Thrift URL.</comment>
</property>
<property>
<name>airbnb.reair.clusters.dest.hdfs.root</name>
<value>hdfs:///host:8020/</value>
<comment>Destination cluster HDFS root. Note trailing slash.</comment>
</property>
<property>
<name>airbnb.reair.clusters.dest.hdfs.tmp</name>
<value>hdfs:///tmp/hive_replication</value>
<comment>
Directory for temporary files on the source cluster. Table / partition
data is copied to this location before it is moved to the final location,
so it should be on the same filesystem as the final location.
</comment>
</property>
<property>
<name>airbnb.reair.clusters.batch.output.dir</name>
<value>hdfs:///user/batchOutput/output1</value>
<comment>
This configuration must be provided. It gives location to store each stage
MR job output.
</comment>
</property>
<property>
<name>airbnb.reair.clusters.batch.metastore.blacklist</name>
<value>testdb:test.*,tmp_.*:.*</value>
<comment>
Comma separated regex blacklist. dbname_regex:tablename_regex,...
</comment>
</property>
<property>
<name>airbnb.reair.batch.metastore.parallelism</name>
<value>150</value>
<comment>
The parallelism to use for jobs requiring metastore calls. This translates to the number of
mappers or reducers in the relevant jobs.
</comment>
</property>
<property>
<name>airbnb.reair.batch.copy.parallelism</name>
<value>150</value>
<comment>
The parallelism to use for jobs that copy files. This translates to the number of reducers
in the relevant jobs.
</comment>
</property>
<property>
<name>airbnb.reair.batch.overwrite.newer</name>
<value>true</value>
<comment>
Whether the batch job will overwrite newer tables/partitions on the destination. Default is true.
</comment>
</property>
<property>
<name>mapreduce.map.speculative</name>
<value>false</value>
<comment>
Speculative execution is currently not supported for batch replication.
</comment>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>false</value>
<comment>
Speculative execution is currently not supported for batch replication.
</comment>
</property>
</configuration>
the tool finished successfully metadata created successfully, but I don't see the data in the destination cluster.
I'm running the job in the destination cluster as instructed in the docs
Any ideas what I'm missing here?
p.s.
don't know if it's related but , i'm using one oracle db for metastore.
Thanks,