Make Batch replication more efficient/consistent for partition append (insert into instead of insert overwrite)

24 views
Skip to first unread message

Zheng Shao

unread,
Jan 12, 2017, 2:33:54 AM1/12/17
to reair, Jingwei Lu, Paul Yang
Background:
Today, in Batch replication, we have 3 jobs:

* In first job, copyData =  whether a partition's source and dest directory is not exactly the same;
* In second job, for directories that have copyData = true, remove the dest directory if exists, and then copy files over in parallel in the reducers.
* In third job, manipulate metadata (register partition etc).

While this works well for newly created partitions, it doesn't really work well for partitions that are appended to, for example, when "INSERT INTO" is executed instead of "INSERT OVERWRITE".

Problems:
Specifically, there are 2 problems:
1. Race Condition: There will be a while that the partition is cleared to empty because the dest directory is removed, so users may end up with wrong results in the dest cluster.
2. Efficiency: The efficiency is low when the incremental addition is very small compared with the old data.


Proposals:
I'd like to propose this Proposal A:
* in the second job, we always copy files into a temporary location, and no deletion of the dest directory is done here.
* In the third job, right before we commit the metadata change, we commit the file system change (move all temporary files inside, and then delete the extra files already in the dest directory).

Proposal A fixes the efficiency problem, and dramatically reduces the window for race condition.


(I haven't looked at the incremental replication code to see if we have similar problems, but even if we do, the race condition window there will be very small already)
--
Zheng

Reply all
Reply to author
Forward
0 new messages