The OP is trying to optimize (ie: minimize the read/write IO) for two different cases:
- When the file is on the same volume as the destination, read it once to compute the checksum then move/rename it. ie: 1 read of file (rename/move is cheap)
- When the file is on a different volume, compute the checksum while it's being copied. ie: 1 read of file and 1 write of the file (instead of 2 reads + 1 write)
The problem is discerning between the two *BEFORE YOU START*. Your hardlink solution, like his move solution results in the worst case performace when the file is not on the same volume, as it reads the file once to compute the checksum, and then performs a fast operation that fails, and then has to manually copy the file (2 reads + 1 write). Also hardlinking is less portable, both to some OSes (eg: windows), and to certain filesystems.
The only advantage that hard linking would provide is that the destination file would always be a "copy" of the source, and so could be erased (unlinked, not zeroed) if the checksum is not found, and the operation re-done.
To Jared: if you program crashes / power is cut you still have problems with #2 similar to if a crash occurred during the checksum calculation. You would be left in a similar situation: a file in your destination folder with no checksum as the result of a failed operation. The only differences would be that the file is an incomplete copy, and it can be safely erased.
My suggestion to the original problem would be to split the operation into two steps:
- Move, or if that fails, Copy the file to the same volume as the destination folder (eg: in a temporary directory in the destination dir), and compute the checksum (either post-move or during the copy)
- Move the file to it's final location since it's now known to be on the same volume. (and do what you will with the checksum)
This way a failed move/copy doesn't result in garbage, or at least leaves garbage that is easily ignored by the server.