Hive metastore not copied by batch update

90 views
Skip to first unread message

Joe Meadows

unread,
Aug 2, 2016, 4:28:43 PM8/2/16
to reair
Hi,

I'm tasked with migrating a Hive database from one Hadoop cluster to another and ReAir looks like it will be very helpful. 

I have set up a test with two single node Hadoop clusters (Hadoop 2.4) and a small Hive table (ORC format, 100 rows) on the source cluster.  I set up the config file according to the directions for a batch copy and run ReAir.  All three steps appear to succeed, but once the migration has completed a 'SHOW TABLES' on the destination does not have the table that should have been migrated.

The data file was moved successfully and it matches exactly the data file on the source.  If I create the table manually and then run MSCK REPAIR on the table it looks like a perfect replica of the source.

I stepped through in the Eclipse debugger and it looks like something goes wrong in step 3.  There it tries to COPY_UNPARTITIONED_TABLE and determines that a data copy is needed (i.e. needToCopy == true), but the boolean allowCopy is set to false so no further action is taken.  I forced allowCopy to true and then I get an exception:

Error: java.io.IOException: com.airbnb.reair.common.DistCpException: distcp result mismatch
    at com.airbnb.reair.incremental.DirectoryCopier.copy(DirectoryCopier.java:89)
    at com.airbnb.reair.incremental.primitives.CopyUnpartitionedTableTask.runTask(CopyUnpartitionedTableTask.java:134)
    at com.airbnb.reair.batch.hive.Stage3CommitChangeMapper.map(Stage3CommitChangeMapper.java:123)
    at com.airbnb.reair.batch.hive.Stage3CommitChangeMapper.map(Stage3CommitChangeMapper.java:42)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: com.airbnb.reair.common.DistCpException: distcp result mismatch
    at com.airbnb.reair.common.DistCpWrapper.copy(DistCpWrapper.java:193)
    at com.airbnb.reair.incremental.DirectoryCopier.copy(DirectoryCopier.java:86)
    ... 11 more

One point that may matter is that the DFS on the destination cluster is not HDFS, it is a proprietary HCFS implementation with the URL 'escalefs://'.  I don't know if this matters, as I said, the data copy was successful, but I will set up a second destination with HDFS to see if that is a factor.

Thanks in advance if you have any troubleshooting advice, and thanks for making ReAir available, it's a great tool.

Best regards,
Joe Meadows
Hitachi Data Systems


Paul Yang

unread,
Aug 2, 2016, 4:58:01 PM8/2/16
to Joe Meadows, reair
Hi Joe,

After the files are copied in the first 2 stages, the 3rd stage compares file sizes and modification times between the source and the destination to make sure they match. If the files don't match, the corresponding Hive metadata is not added to the metastore. From what you're describing, it sounds like there may be a mismatch in the modification times. Does your proprietary filesystem support a user job changing a file's modification time? If not, can you try adding

  <property>
    <name>airbnb.reair.copy.sync_modified_times</name>
    <value>false</value>
  </property>

to your configuration and re-running the job?

Cheers,
Paul

--
You received this message because you are subscribed to the Google Groups "reair" group.
To unsubscribe from this group and stop receiving emails from it, send an email to airbnb-reair...@googlegroups.com.
To post to this group, send email to airbnb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/airbnb-reair/4bd32c60-af47-421a-b5de-bb8a081821d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joe Meadows

unread,
Aug 2, 2016, 5:02:36 PM8/2/16
to reair
Hi Paul,

Thanks a lot for your quick reply.  I will try this when I get back to my office and let you know how that works.  Our HCFS plugin layers over NFS so your hunch about modification time sounds very possible.

Cheers,
Joe


Joe Meadows

unread,
Aug 2, 2016, 8:57:01 PM8/2/16
to reair, webo...@gmail.com
HI Paul,

I tried the properties setting and I got the same result, needToCopy is evaluating as true and allowDataCopy is set to false so it returns status NOT_COMPLETEABLE.

If I force needToCopy to be false then step 3 completes successfully with the table being created on destination as expected.

I will dive into the code that determines needToCopy and see what I can find out, perhaps my oddball, non-HDFS path is confusing matters.

I'll post back when I find anything interesting.  Thanks again for your help and for ReAir.

Cheers,
Joe

Joe Meadows

unread,
Aug 2, 2016, 10:14:46 PM8/2/16
to reair, webo...@gmail.com
Hi Paul,
Your initial hunch was correct, it was failing because of modification times not matching.  I found that the property setting you suggested was not getting merged into the configuration settings because it was not included in the list of configs to be merged.  After making the change below it seems to be working perfectly :)  Note that I have only tried batch replication so far, not incremental so I don't know if there's a similar case there, but at least I'll know what to look for.

One more time, thanks for your help and for sharing this tool!

Best regards,
Joe


diff --git a/main/src/main/java/com/airbnb/reair/batch/hive/MetastoreReplicationJob.java b/main/src/main/java/com/airbnb/reair/batch/hive/MetastoreReplicationJob.java
index 228aff9..49da5ad 100644
--- a/main/src/main/java/com/airbnb/reair/batch/hive/MetastoreReplicationJob.java
+++ b/main/src/main/java/com/airbnb/reair/batch/hive/MetastoreReplicationJob.java
@@ -323,6 +323,7 @@ public class MetastoreReplicationJob extends Configured implements Tool {
         ConfigurationKeys.BATCH_JOB_INPUT_LIST,
         ConfigurationKeys.BATCH_JOB_METASTORE_PARALLELISM,
         ConfigurationKeys.BATCH_JOB_COPY_PARALLELISM,
+        ConfigurationKeys.SYNC_MODIFIED_TIMES_FOR_FILE_COPY,
         MRJobConfig.MAP_SPECULATIVE,
         MRJobConfig.REDUCE_SPECULATIVE
         );


Paul Yang

unread,
Aug 2, 2016, 11:21:49 PM8/2/16
to Joe Meadows, reair
Awesome, great to hear that it worked out for you and thanks for finding that issue! That option was originally related to the incremental replication part of the project, so I can see how that came about. Feel free to make a pull request for it, or we can put out the fix as well.

Cheers,
Paul

--
You received this message because you are subscribed to the Google Groups "reair" group.
To unsubscribe from this group and stop receiving emails from it, send an email to airbnb-reair...@googlegroups.com.
To post to this group, send email to airbnb...@googlegroups.com.

Joe Meadows

unread,
Aug 3, 2016, 12:40:14 PM8/3/16
to reair, webo...@gmail.com
Pull request sent.

Thanks!
Joe
Reply all
Reply to author
Forward
0 new messages