How do we compare com.airbnb.reair.batch.hdfs.ReplicationJob and DistCp?

71 views
Skip to first unread message

Zheng Shao

unread,
Nov 2, 2016, 7:10:41 PM11/2/16
to reair
Is ReplicationJob aimed to replace DistCp for reair?  Or we need both?  What is the long-term differentiation?


--
Zheng

Paul Yang

unread,
Nov 2, 2016, 7:51:35 PM11/2/16
to Zheng Shao, reair
ReplicationJob was a tool that was developed to address some of the shortcomings of DistCp when copying directories with a large number of files (e.g. /user/hive/warehouse). It was included in the repo since it might be useful, but it's not directly used for Hive replication.

It can potentially replace DistCp in incremental replication, but since incremental replication generally runs copies for shallow directories with a relatively small number of files (e.g. /user/hive/warehouse/my_table/ds=2016-01-01), there isn't a strong need.

--
You received this message because you are subscribed to the Google Groups "reair" group.
To unsubscribe from this group and stop receiving emails from it, send an email to airbnb-reair+unsubscribe@googlegroups.com.
To post to this group, send email to airbnb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/airbnb-reair/CAAguJ7oaFqCwkRF%2BCAnXUvyHnGtM-UWwavcmOK-B58r9Gt-%3DHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages