Hello ReAir community,
I used the tool to successfully move the data within HDFS (source: HDFS, target: HDFS). But, when I attempt to move data from HDFS to S3, I found that the data lands in the incorrect location in S3. The tool copied over the entire dir structure of the tmp directory (argument provided as -temp) into the S3 bucket, and not into the S3 directory within the bucket. (using s3n instead of s3a results in the same behavior)
Here is the command that I executed:
hadoop jar airbnb-reair-main-1.0.0-all.jar com.airbnb.reair.batch.hdfs.ReplicationJob -Dmapreduce.job.reduces=10 -Dmapreduce.map.memory.mb=8000 -Dmapreduce.map.java.opts="-Djava.net.preferIPv4Stack=true -Xmx7000m" -source hdfs://<hdfs_dir_path>/ -destination s3a://<s3_key>:<s3_secret>@<s3_bucket>/<s3_dir_name>/ -log hdfs://<hdfs_log_path>/$JOB_START_TIME -temp hdfs://<hdfs_tmp_dir_path>/$JOB_START_TIME$ -blacklist ".*/tmp/.*" -operations a,u,d
Expected:
[s3_bucket] > [s3_dir_name] > [File 1] [File 2] [File 3]
Actual:
[s3_bucket] > [s3_dir_name] > (empty)
[s3_bucket] > tmp > reair > 1490203731429 > [__tmp_copy__file_attempt_1490...] [__tmp_copy__file_attempt_1490...] [__tmp_copy__file_attempt_1490...] [__tmp_copy__file_attempt_1490...]