Enhance sftp protocol: get SHA hash of file

1 Aufruf
Direkt zur ersten ungelesenen Nachricht

Thomas Güttler

ungelesen,
29.06.2011, 06:57:1529.06.11
an openssh-...@mindrot.org
Hi,

it would be great, if the sftp protocol could be
enhanced: get sha (or other hash value) from a file or part of a file.

This would make it possible to run a rsync like file transfer
on sftp.

I would suggest a protocol like this

Client sends to Server:

get-supported hash-methods

returns whitespace seperated list like md5 sha1 sha256 ....

get-hash HASH-METHOD FILENAME STARTOFFSET BYTECOUNT

returns: hexlified hash value (all lowercase)

To get the hash value of the whole file: STARTOFFSET=0 and BYTECOUNT=0

Anyone interested?

Thomas Güttler
_______________________________________________
openssh-unix-dev mailing list
openssh-...@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

Philipp Marek

ungelesen,
29.06.2011, 07:04:5929.06.11
an openssh-...@mindrot.org, Thomas Güttler
On Wednesday 29 June 2011, Thomas Güttler wrote:
> This would make it possible to run a rsync like file transfer
> on sftp.
Well, this would work for append-only files; if bytes get inserted or
deleted in the middle, you'd need a Manber-Hash like rsync uses.

But then you'd be reimplementing rsync anyway...


> To get the hash value of the whole file: STARTOFFSET=0 and BYTECOUNT=0

Better make that BYTECOUNT=-1, that's easier for divide-and-conquer
strategies - and it works for the case when the file is zero bytes long,
too.


Regards,

Phil

Thomas Güttler

ungelesen,
29.06.2011, 07:57:5529.06.11
an openssh-...@mindrot.org
Am 29.06.2011 13:04, schrieb Philipp Marek:
> On Wednesday 29 June 2011, Thomas Güttler wrote:
>> This would make it possible to run a rsync like file transfer
>> on sftp.
> Well, this would work for append-only files; if bytes get inserted or
> deleted in the middle, you'd need a Manber-Hash like rsync uses.

Hi Phil,

I tried to find more info about "Manber-Hash". You are the author
of this perl module?

Unfortunately the link on this page is broken:

http://search.cpan.org/~pmarek/Digest-ManberHash-0.7/ManberHash.pm

This page does not exist any more:
http://citeseer.nj.nec.com/manber94finding.html.


My intention for the hash values enhancement for sftp is a deduplication
backup system. I would cut files into chunks with a fixed offset.

Thomas

Dan Kaminsky

ungelesen,
29.06.2011, 07:05:5729.06.11
an Thomas Güttler, openssh-...@mindrot.org
I could see various uses of this, and its not like OpenSSH doesn't already
have sha1 built in. It could also be hacked in via a command line channel,
seeking sha1sum or a perl oneliner.

On Wed, Jun 29, 2011 at 3:57 AM, Thomas Güttler
<gue...@thomas-guettler.de>wrote:

Philipp Marek

ungelesen,
29.06.2011, 08:07:1029.06.11
an openssh-...@mindrot.org, Thomas Güttler
On Wednesday 29 June 2011, Thomas Güttler wrote:
> I tried to find more info about "Manber-Hash". You are the author
> of this perl module?
Yes.


> Unfortunately the link on this page is broken:
>
> http://search.cpan.org/~pmarek/Digest-ManberHash-0.7/ManberHash.pm
That's bad.

> This page does not exist any more:
> http://citeseer.nj.nec.com/manber94finding.html.

The paper was named "Finding Similar Files in a Large File System";
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.3222&rep=rep1&type=pdf
seems to be a version of it.


> My intention for the hash values enhancement for sftp is a deduplication
> backup system. I would cut files into chunks with a fixed offset.

Well, if you'd prefer a C implementation, there's
http://fsvs.tigris.org/source/browse/fsvs/branches/fsvs-1.2.x/fsvs/src/checksum.c?revision=2423&view=markup
with documentation here:
http://doc.fsvs-software.org/doxygen-gif/checksum_8c.html#_details


Regards,

Phil

Ben Lindstrom

ungelesen,
29.06.2011, 10:40:0029.06.11
an Philipp Marek, Thomas Güttler, openssh-...@mindrot.org

On Jun 29, 2011, at 7:07 AM, Philipp Marek wrote:

>> My intention for the hash values enhancement for sftp is a deduplication
>> backup system. I would cut files into chunks with a fixed offset.
> Well, if you'd prefer a C implementation, there's
> http://fsvs.tigris.org/source/browse/fsvs/branches/fsvs-1.2.x/fsvs/src/checksum.c?revision=2423&view=markup
> with documentation here:
> http://doc.fsvs-software.org/doxygen-gif/checksum_8c.html#_details

Incompatible license. =( It would have to be something closer to BSD 2 clause to be acceptable as part of
OpenSSH.

- Ben

Ben Lindstrom

ungelesen,
29.06.2011, 10:00:1829.06.11
an Dan Kaminsky, Thomas Güttler, openssh-...@mindrot.org

However, sftp doesn't link to crypto libraries by default. =-)

A few years back I hacked in a simple "sums...@eviladmin.org" protocol
based on the block size that sftp set for it's window, but instead of SHA1
I was using MD5 at the time. You could simply request a single block
or loop through and request a list of blocks.

The server side code is dead simple and following the tradition of the
rest of sftp-server code be rather unintelligent and very very simple
(read: if you wanted a block list the client had to loop through the local
file with the current window size and request an MD5 check some
per block).

It was under 400 lines so it isn't that complex. It didn't support any cool
features like sliding windows, etc. But that complexity could be
implemented on the client side. It was more a proof of concept than a
real implementation (the implementation sucks rocks and I know there
are bugs in it).

I abandoned it for some reason. I really wish I knew why. I suspect it
had to do with the cost of doing the checksum list was approaching
the cost of actually downloading the file in the method I choice to
implement it.

- Ben

TJ Saunders

ungelesen,
29.06.2011, 12:37:5829.06.11
an Thomas Güttler, openssh-...@mindrot.org

> it would be great, if the sftp protocol could be
> enhanced: get sha (or other hash value) from a file or part of a file.
>
> This would make it possible to run a rsync like file transfer
> on sftp.
>
> I would suggest a protocol like this
>
> Client sends to Server:
>
> get-supported hash-methods
>
> returns whitespace seperated list like md5 sha1 sha256 ....
>
> get-hash HASH-METHOD FILENAME STARTOFFSET BYTECOUNT
>
> returns: hexlified hash value (all lowercase)
>
> To get the hash value of the whole file: STARTOFFSET=0 and BYTECOUNT=0
>
> Anyone interested?

Rather than reinventing the wheel, you might take a look at the (expired)
Draft which proposed the "check-file-name" and "check-file-handle" SFTP
extensions:

http://tools.ietf.org/html/draft-ietf-secsh-filexfer-extensions-00

These extensions have been implemented by various SFTP clients and
servers.

Cheers,
TJ

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To spend too much time in studies is sloth.

-Francis Bacon

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Thomas Güttler

ungelesen,
29.06.2011, 16:25:1029.06.11
an openssh-...@mindrot.org
Am 29.06.2011 18:37, schrieb TJ Saunders:
>
>> it would be great, if the sftp protocol could be
>> enhanced: get sha (or other hash value) from a file or part of a file.
...

>
> Rather than reinventing the wheel, you might take a look at the (expired)
> Draft which proposed the "check-file-name" and "check-file-handle" SFTP
> extensions:
>
> http://tools.ietf.org/html/draft-ietf-secsh-filexfer-extensions-00
>
> These extensions have been implemented by various SFTP clients and
> servers.

Thank you very much! I googled for server which support
check-file-hanlde, but I found only proftpd which supports it.

The ietf extension is even better than my first proposal: You can
give a blocksize and get N hash values with one response.

I not a fluent c programmer. Any volunteers to implement this?

Thomas Güttler

Philipp Marek

ungelesen,
30.06.2011, 02:24:4530.06.11
an Ben Lindstrom, Thomas Güttler, openssh-...@mindrot.org
> >> My intention for the hash values enhancement for sftp is a
> >> deduplication backup system. I would cut files into chunks with a
> >> fixed offset.
> >
> > Well, if you'd prefer a C implementation, there's
> >
> > http://fsvs.tigris.org/source/browse/fsvs/branches/fsvs-1.2.x/fsvs/src
> > /checksum.c?revision=2423&view=markup
> >
> > with documentation here:
> > http://doc.fsvs-software.org/doxygen-gif/checksum_8c.html#_details
>
> Incompatible license. =( It would have to be something closer to BSD 2
> clause to be acceptable as part of OpenSSH.
Well, as I'm the author of that file, I hereby license the manber-related
code in that file as BSD 2 - or whatever else is needed for use in openssh.

Perhaps it saves a bit of time - it certainly is no complicated piece of
code.


Regards,

Phil

Philipp Marek

ungelesen,
30.06.2011, 02:34:0930.06.11
an openssh-...@mindrot.org, Dan Kaminsky, Thomas Güttler, Ben Lindstrom
> A few years back I hacked in a simple "sums...@eviladmin.org" protocol
> based on the block size that sftp set for it's window, but instead of
> SHA1 I was using MD5 at the time. You could simply request a single
> block or loop through and request a list of blocks.
...

> I abandoned it for some reason. I really wish I knew why. I suspect it
> had to do with the cost of doing the checksum list was approaching
> the cost of actually downloading the file in the method I choice to
> implement it.
Well, I'd expect a simple command "manber-hashes START LENGTH " - perhaps
with an optional setting that defines the average block size - that streams
(start, length, manber-hash, MD5/SHA1) back to the client to be much more
useful; it would be much faster than transmitting the whole file and
wouldn't need that many query operations.


In the file I referenced in the other mail I use the MD5, the previous-to-
last manber hash and the last manber-hash (which has per definition N
rightmost bits zero) - that's a few bits more security than just using MD5
(where collisions can be created).
Of course, using SHA1 might (at least for the moment ;) be enough.

Perhaps, to be on the safe side, another optional parameter could specify
"MD5+SHA1+SHA512+CRC32+..." to get all of these checksums ;)


Regards,

Phil

Ben Lindstrom

ungelesen,
30.06.2011, 02:55:3630.06.11
an Philipp Marek, openssh openssh

Looking at the RFC that was posted in this thread. It is best to implement that
as it is a bit more robust then mine.

Again mine was a proof of concept where the server can have a very simple
extended feature and the client does all the heavy lifting (e.g. like with our
remote_glob() implementation).

And it mostly succeeded, and I'm sure it would be been better if the tests were
non-localhost. =)

At this moment I'm not inclined to breath life into my patch nor implement the
RFC. Just isn't in my timeline for the next month or two. However, implement
the server extension should be pretty much child's play. The hard part is making
a get/put that groks and take effective advantage of it (or so my experience had
shown).

- Ben

Thomas Guettler

ungelesen,
30.06.2011, 11:22:3130.06.11
an openssh-...@mindrot.org
...
>> Rather than reinventing the wheel, you might take a look at the (expired)
>> Draft which proposed the "check-file-name" and "check-file-handle" SFTP
>> extensions:
>>
>> http://tools.ietf.org/html/draft-ietf-secsh-filexfer-extensions-00
>>
>> These extensions have been implemented by various SFTP clients and
>> servers.
>
> Thank you very much! I googled for server which support
> check-file-hanlde, but I found only proftpd which supports it.
>
> The ietf extension is even better than my first proposal: You can
> give a blocksize and get N hash values with one response.
>
> I not a fluent c programmer. Any volunteers to implement this?

Hi,

I opened a "search for help" on openhatch:

http://openhatch.org/+projects/sftp%20get%20hash%20value

Maybe someone can help us ...

Thomas

Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten