it would be great, if the sftp protocol could be
enhanced: get sha (or other hash value) from a file or part of a file.
This would make it possible to run a rsync like file transfer
on sftp.
I would suggest a protocol like this
Client sends to Server:
get-supported hash-methods
returns whitespace seperated list like md5 sha1 sha256 ....
get-hash HASH-METHOD FILENAME STARTOFFSET BYTECOUNT
returns: hexlified hash value (all lowercase)
To get the hash value of the whole file: STARTOFFSET=0 and BYTECOUNT=0
Anyone interested?
Thomas Güttler
_______________________________________________
openssh-unix-dev mailing list
openssh-...@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
But then you'd be reimplementing rsync anyway...
> To get the hash value of the whole file: STARTOFFSET=0 and BYTECOUNT=0
Better make that BYTECOUNT=-1, that's easier for divide-and-conquer
strategies - and it works for the case when the file is zero bytes long,
too.
Regards,
Phil
Hi Phil,
I tried to find more info about "Manber-Hash". You are the author
of this perl module?
Unfortunately the link on this page is broken:
http://search.cpan.org/~pmarek/Digest-ManberHash-0.7/ManberHash.pm
This page does not exist any more:
http://citeseer.nj.nec.com/manber94finding.html.
My intention for the hash values enhancement for sftp is a deduplication
backup system. I would cut files into chunks with a fixed offset.
Thomas
On Wed, Jun 29, 2011 at 3:57 AM, Thomas Güttler
<gue...@thomas-guettler.de>wrote:
> This page does not exist any more:
> http://citeseer.nj.nec.com/manber94finding.html.
The paper was named "Finding Similar Files in a Large File System";
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.3222&rep=rep1&type=pdf
seems to be a version of it.
> My intention for the hash values enhancement for sftp is a deduplication
> backup system. I would cut files into chunks with a fixed offset.
Well, if you'd prefer a C implementation, there's
http://fsvs.tigris.org/source/browse/fsvs/branches/fsvs-1.2.x/fsvs/src/checksum.c?revision=2423&view=markup
with documentation here:
http://doc.fsvs-software.org/doxygen-gif/checksum_8c.html#_details
Regards,
Phil
>> My intention for the hash values enhancement for sftp is a deduplication
>> backup system. I would cut files into chunks with a fixed offset.
> Well, if you'd prefer a C implementation, there's
> http://fsvs.tigris.org/source/browse/fsvs/branches/fsvs-1.2.x/fsvs/src/checksum.c?revision=2423&view=markup
> with documentation here:
> http://doc.fsvs-software.org/doxygen-gif/checksum_8c.html#_details
Incompatible license. =( It would have to be something closer to BSD 2 clause to be acceptable as part of
OpenSSH.
- Ben
A few years back I hacked in a simple "sums...@eviladmin.org" protocol
based on the block size that sftp set for it's window, but instead of SHA1
I was using MD5 at the time. You could simply request a single block
or loop through and request a list of blocks.
The server side code is dead simple and following the tradition of the
rest of sftp-server code be rather unintelligent and very very simple
(read: if you wanted a block list the client had to loop through the local
file with the current window size and request an MD5 check some
per block).
It was under 400 lines so it isn't that complex. It didn't support any cool
features like sliding windows, etc. But that complexity could be
implemented on the client side. It was more a proof of concept than a
real implementation (the implementation sucks rocks and I know there
are bugs in it).
I abandoned it for some reason. I really wish I knew why. I suspect it
had to do with the cost of doing the checksum list was approaching
the cost of actually downloading the file in the method I choice to
implement it.
- Ben
Rather than reinventing the wheel, you might take a look at the (expired)
Draft which proposed the "check-file-name" and "check-file-handle" SFTP
extensions:
http://tools.ietf.org/html/draft-ietf-secsh-filexfer-extensions-00
These extensions have been implemented by various SFTP clients and
servers.
Cheers,
TJ
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To spend too much time in studies is sloth.
-Francis Bacon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Rather than reinventing the wheel, you might take a look at the (expired)
> Draft which proposed the "check-file-name" and "check-file-handle" SFTP
> extensions:
>
> http://tools.ietf.org/html/draft-ietf-secsh-filexfer-extensions-00
>
> These extensions have been implemented by various SFTP clients and
> servers.
Thank you very much! I googled for server which support
check-file-hanlde, but I found only proftpd which supports it.
The ietf extension is even better than my first proposal: You can
give a blocksize and get N hash values with one response.
I not a fluent c programmer. Any volunteers to implement this?
Thomas Güttler
Perhaps it saves a bit of time - it certainly is no complicated piece of
code.
Regards,
Phil
In the file I referenced in the other mail I use the MD5, the previous-to-
last manber hash and the last manber-hash (which has per definition N
rightmost bits zero) - that's a few bits more security than just using MD5
(where collisions can be created).
Of course, using SHA1 might (at least for the moment ;) be enough.
Perhaps, to be on the safe side, another optional parameter could specify
"MD5+SHA1+SHA512+CRC32+..." to get all of these checksums ;)
Regards,
Phil
Again mine was a proof of concept where the server can have a very simple
extended feature and the client does all the heavy lifting (e.g. like with our
remote_glob() implementation).
And it mostly succeeded, and I'm sure it would be been better if the tests were
non-localhost. =)
At this moment I'm not inclined to breath life into my patch nor implement the
RFC. Just isn't in my timeline for the next month or two. However, implement
the server extension should be pretty much child's play. The hard part is making
a get/put that groks and take effective advantage of it (or so my experience had
shown).
- Ben
Hi,
I opened a "search for help" on openhatch:
http://openhatch.org/+projects/sftp%20get%20hash%20value
Maybe someone can help us ...
Thomas