How can I check if two file paths on the same partition?

585 views
Skip to first unread message

Jared Bischof

unread,
Nov 8, 2013, 4:56:58 PM11/8/13
to golan...@googlegroups.com
I have a server application that accepts file uploads from the local disk into the server's data directory.  My application reads the input file in chunks and calculates the md5 checksum of each chunk while copying the file to it's destination.  However, I've found that if the file is on the same partition as my server's data directory, it is much faster to simply calculate the md5 checksum and then move the file with os.Rename.  But, os.Rename only works if the file is on the same partition.  The problem I have is I would like to calculate the md5 checksum before I import the file and the only way I currently can determine if a file is on the same partition is when os.Rename returns an error.

So, basically my question is, is there a way to determine in Go if two file paths are on the same partition?  That way, if I know I won't be able to rename the file, I can print the new file while I'm calculating the md5 checksum, rather than calcuating the checksum and then copying the file in 2 steps.

Thanks.

Kamil Kisiel

unread,
Nov 8, 2013, 5:12:04 PM11/8/13
to golan...@googlegroups.com
Why not try to os.Rename the file first, and if it fails, then write out to the new file while calculating the MD5?

Jared Bischof

unread,
Nov 8, 2013, 5:16:34 PM11/8/13
to golan...@googlegroups.com
I would like to calculate the MD5 before moving the file.  If something goes wrong or the server dies I don't want the file moved before it's properly imported (including MD5 calculation).  Currently, I'm doing os.Rename to test if it works, and then putting it back in it's original location until it's been properly imported, but that's kind of a hack.  I would like a more straightforward approach if possible.

Maxim Khitrov

unread,
Nov 8, 2013, 5:28:35 PM11/8/13
to Jared Bischof, golang-nuts
I haven't tested this, but try using os.Stat to get the device id for
both files. FileInfo.Sys() should be *syscall.Stat_t and Dev field
should be the device id.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Ingo Oeser

unread,
Nov 8, 2013, 5:45:07 PM11/8/13
to golan...@googlegroups.com
One approach is to check the directories where those files are contained in.

minux

unread,
Nov 8, 2013, 7:20:46 PM11/8/13
to Ingo Oeser, golang-nuts


On Nov 8, 2013 5:45 PM, "Ingo Oeser" <night...@googlemail.com> wrote:
> One approach is to check the directories where those files are contained in.
> Like here: http://play.golang.org/p/sIJHBtWksh

could it handle /part/a/b and /part/c/d ?

Tamás Gulácsi

unread,
Nov 9, 2013, 7:19:28 AM11/9/13
to golan...@googlegroups.com
Why not hardlink them? That fails through device boundaries, but leaves the file there. I do have a linkOrCopy function implementing link-if-possible-otherwise-copy.

Carlos Castillo

unread,
Nov 11, 2013, 5:36:52 AM11/11/13
to golan...@googlegroups.com
The OP is trying to optimize (ie: minimize the read/write IO) for two different cases:
  1. When the file is on the same volume as the destination, read it once to compute the checksum then move/rename it. ie: 1 read of file (rename/move is cheap)
  2. When the file is on a different volume, compute the checksum while it's being copied. ie: 1 read of file and 1 write of the file (instead of 2 reads + 1 write)
The problem is discerning between the two *BEFORE YOU START*. Your hardlink solution, like his move solution results in the worst case performace when the file is not on the same volume, as it reads the file once to compute the checksum, and then performs a fast operation that fails, and then has to manually copy the file (2 reads + 1 write). Also hardlinking is less portable, both to some OSes (eg: windows), and to certain filesystems.

The only advantage that hard linking would provide is that the destination file would always be a "copy" of the source, and so could be erased (unlinked, not zeroed) if the checksum is not found, and the operation re-done.

To Jared: if you program crashes / power is cut you still have problems with #2 similar to if a crash occurred during the checksum calculation. You would be left in a similar situation: a file in your destination folder with no checksum as the result of a failed operation. The only differences would be that the file is an incomplete copy, and it can be safely erased.

My suggestion to the original problem would be to split the operation into two steps:
  1. Move, or if that fails, Copy the file to the same volume as the destination folder (eg: in a temporary directory in the destination dir), and compute the checksum (either post-move or during the copy)
  2. Move the file to it's final location since it's now known to be on the same volume. (and do what you will with the checksum)
This way a failed move/copy doesn't result in garbage, or at least leaves garbage that is easily ignored by the server.

Jared Bischof

unread,
Nov 11, 2013, 11:41:45 AM11/11/13
to golan...@googlegroups.com
Thanks Carlos, you've characterized the situation accurately.  I didn't mention this but in fact, I am already using a temp directory in the manner you suggested.  However, the issue remains that if in fact I can just rename the file, I would like to rename the file after calculating the md5 checksum on that file because if the server did crash I could end up in a situation where the file has already been moved to the temp directory but not yet imported into the server's data directory and corresponding metadata database because md5 calculation had not yet completed.  I'm currently trying to figure out how to implement Maxim's suggestion to see if that will work.

Jared Bischof

unread,
Nov 11, 2013, 12:17:38 PM11/11/13
to golan...@googlegroups.com, Jared Bischof
This is what I needed, and it worked, thanks Maxim!

Jared Bischof

unread,
Nov 11, 2013, 1:25:36 PM11/11/13
to golan...@googlegroups.com, Jared Bischof
In case anyone will be interested, here is how I accessed the device id:

package main

import "fmt"
import "os"
import "syscall"

func main() {
    var f, _ = os.Open("foo")
    var fi, _ = f.Stat()
    s := fi.Sys()
    switch s := s.(type) {
        default:
            fmt.Printf("unexpected type %T", s)
        case *syscall.Stat_t:
            fmt.Printf("Device id: %v\n", s.Dev)
    }
}
Reply all
Reply to author
Forward
0 new messages