[PATCH] reject PUTs via Content-MD5 header on corrupt transfers

25 views
Skip to first unread message

Eric Wong

unread,
Nov 27, 2011, 5:06:45 AM11/27/11
to per...@googlegroups.com
If a client knows the MD5 checksum of what it is uploading ahead
of time, it may specify the Content-MD5 header and ask Perlbal
to verify it while the file is being read off the socket and
detect corrupted transfers before completion.

This allows to one to avoid rereading the entire file to verify
the MD5 post-upload. I find this functionality useful because
TCP checksums are weak and I've experienced network corruption
that TCP did not detect several times over the years.

I hope I've implemented this correctly according to section
14.15 of RFC 2616. I don't know if there are other examples of
other HTTP servers that can reject PUTs based on Content-MD5
(other servers not written by me, I don't trust myself with
this stuff).

Signed-off-by: Eric Wong <normal...@yhbt.net>
---

I'm working on checksumming for MogileFS, and that will be able
to (optionally) use this feature and avoid rereading a
just-uploaded file to get it's MD5. No easy/portable way to
know what actually hit the physical storage device isn't corrupt,
though...

lib/Perlbal/ClientHTTP.pm | 18 ++++++++++++++++++
t/20-put.t | 12 ++++++++++++
2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/lib/Perlbal/ClientHTTP.pm b/lib/Perlbal/ClientHTTP.pm
index 142eee2..9ebac43 100644
--- a/lib/Perlbal/ClientHTTP.pm
+++ b/lib/Perlbal/ClientHTTP.pm
@@ -22,6 +22,7 @@ use fields ('put_in_progress', # 1 when we're currently waiting for an async job
'content_length', # length of document being transferred
'content_length_remain', # bytes remaining to be read
'chunked_upload_state', # bool/obj: if processing a chunked upload, Perlbal::ChunkedUploadState object, else undef
+ 'md5_ctx', # Digest::MD5 used to verify Content-MD5
);

use HTTP::Date ();
@@ -29,6 +30,7 @@ use File::Path;

use Errno qw( EPIPE );
use POSIX qw( O_CREAT O_TRUNC O_WRONLY O_RDONLY ENOENT );
+use Digest::MD5;

# class list of directories we know exist
our (%VerifiedDirs);
@@ -61,6 +63,7 @@ sub init {
$self->{put_fh} = undef;
$self->{put_pos} = 0;
$self->{chunked_upload_state} = undef;
+ $self->{md5_ctx} = undef;
}

sub close {
@@ -134,6 +137,8 @@ sub handle_put {

return $self->send_response(403) unless $self->{service}->{enable_put};

+ $self->{md5_ctx} = $hd->header('Content-MD5') ? Digest::MD5->new : undef;
+
return if $self->handle_put_chunked;

# they want to put something, so let's setup and wait for more reads
@@ -427,6 +432,8 @@ sub put_writeout {

my $data = join("", map { $$_ } @{$self->{read_buf}});
my $count = length $data;
+ my $md5_ctx = $self->{md5_ctx};
+ $md5_ctx->add($data) if $md5_ctx;

# reset our input buffer
$self->{read_buf} = [];
@@ -469,6 +476,17 @@ sub put_close {

if (CORE::close($self->{put_fh})) {
$self->{put_fh} = undef;
+
+ my $md5_ctx = $self->{md5_ctx};
+ if ($md5_ctx) {
+ my $actual = $md5_ctx->b64digest;
+ my $expect = $self->{req_headers}->header("Content-MD5");
+ $expect =~ s/=+\s*\z//;
+ if ($actual ne $expect) {
+ return $self->send_response(400,
+ "Content-MD5 mismatch, expected: $expect actual: $actual");
+ }
+ }
return $self->send_response(200);
} else {
return $self->system_error("Error saving file", "error in close: $!");
diff --git a/t/20-put.t b/t/20-put.t
index df9f347..6171fb3 100644
--- a/t/20-put.t
+++ b/t/20-put.t
@@ -6,6 +6,7 @@ use warnings;
use Perlbal::Test;

use Test::More 'no_plan';
+use Digest::MD5 qw/md5_base64/;

my $port = new_port();
my $dir = tempdir();
@@ -65,6 +66,11 @@ sub verify_put {
ok(filecontent($disk_file) eq $content, "verified put");
}

+sub content_md5 {
+ # Digest::MD5 doesn't pad base64 digests, so we have to do it ourselves
+ [ "Content-MD5", md5_base64($_[0]) . '==' ]
+}
+
# successful puts
foreach_aio {
my $aio = shift;
@@ -120,5 +126,11 @@ ok(manage("SET test.enable_put = 0"));
ok(! put_file(), "put disabled");
ok(manage("SET test.enable_delete = 0"));
ok(! delete_file(), "delete disabled");
+ok(manage("SET test.enable_put = 1"));
+ok(put_file(), "put re-enabled");
+
+# Content-MD5 checking
+ok(put_file(content => "!", headers => content_md5('!')), "Content-MD5 OK");
+ok(! put_file(content => "?", headers => content_md5('!')), "Content-MD5 rejected");

1;
--
Eric Wong

Jonathan Steinert

unread,
Feb 2, 2012, 9:15:47 PM2/2/12
to per...@googlegroups.com
Hi Eric,

Thank you for your patch to add Content-MD5 support to perlbal. I would
like to see a couple features added to this if you could.

First off, can you please add a server setting to allow disabling of
Content-MD5 handling? The calculation of an MD5 sum of the content is
computationally more expensive than things are in perlbal right now, and
while I would like for this to be enabled by default, I do see a need
for a way to disable this if it causes problems for client usage or
server health.

Second, the Content-MD5 header is supposed to verify the content of the
PUT before committing the content as a written asset. Could you please
modify the code to not overwrite existing content unless the MD5 sum
matches, and do not leave an invalid piece of content on disk unless the
MD5 sum matches?

Thanks

--hachi

Eric Wong

unread,
Feb 2, 2012, 11:01:59 PM2/2/12
to per...@googlegroups.com
Jonathan Steinert <ha...@kuiki.net> wrote:
> Hi Eric,
>
> Thank you for your patch to add Content-MD5 support to perlbal. I
> would like to see a couple features added to this if you could.
>
> First off, can you please add a server setting to allow disabling of
> Content-MD5 handling? The calculation of an MD5 sum of the content
> is computationally more expensive than things are in perlbal right
> now, and while I would like for this to be enabled by default, I do
> see a need for a way to disable this if it causes problems for
> client usage or server health.
>
> Second, the Content-MD5 header is supposed to verify the content of
> the PUT before committing the content as a written asset. Could you
> please modify the code to not overwrite existing content unless the
> MD5 sum matches, and do not leave an invalid piece of content on
> disk unless the MD5 sum matches?

Hello Jonathan, thanks for the response.

I've pushed the following to "master" on top of my previous patch to
git://bogomips.org/perlbal.git which implements the features you've
requested.

I tried running "make docs" to regenerate docs/service-parameters.txt,
but it looks like the "server_tokens" parameter got desynced and there
were also some formatting differences, so I didn't include that in
this patch in this patch.

I'll update the rest of the docs if the parameter name makes sense
(enable_md5 vs enable_put_md5 vs ... ?) Naming things is hard :<

Let me know if there's anything else

From 12ce371274c91edf542aba96770c01bd4ff09267 Mon Sep 17 00:00:00 2001
From: Eric Wong <normal...@yhbt.net>
Date: Thu, 2 Feb 2012 19:46:20 -0800
Subject: [PATCH] implement additional Content-MD5 features

* Add "enable_md5" service parameter to toggle Content-MD5
verification as MD5 verification may use unnecessary CPU

* Do not clobber existing content on disk if Content-MD5
verification fails.

Signed-off-by: Eric Wong <normal...@yhbt.net>
---

lib/Perlbal/AIO.pm | 15 ++++++++++
lib/Perlbal/ClientHTTP.pm | 65 +++++++++++++++++++++++++++++++++++---------
lib/Perlbal/Service.pm | 8 +++++
t/20-put.t | 14 +++++++++
4 files changed, 88 insertions(+), 14 deletions(-)

diff --git a/lib/Perlbal/AIO.pm b/lib/Perlbal/AIO.pm
index fac72ab..95df01f 100644
--- a/lib/Perlbal/AIO.pm
+++ b/lib/Perlbal/AIO.pm
@@ -26,6 +26,21 @@ $Perlbal::AIO_MODE = "ioaio" if $Perlbal::OPTMOD_IO_AIO;
# AIO functions available to callers
############################################################################

+sub aio_rename {
+ my ($srcpath, $dstpath, $user_cb) = @_;
+ aio_channel_push(get_chan($srcpath), $user_cb, sub {
+ my $cb = shift;
+
+ if ($Perlbal::AIO_MODE eq "ioaio") {
+ IO::AIO::aio_rename($srcpath, $dstpath, $cb);
+ } else {
+ my $rv = rename($srcpath, $dstpath);
+ $rv = $rv ? 0 : -1;
+ $cb->($rv);
+ }
+ });
+}
+
sub aio_readahead {
my ($fh, $offset, $length, $user_cb) = @_;

diff --git a/lib/Perlbal/ClientHTTP.pm b/lib/Perlbal/ClientHTTP.pm
index 9ebac43..9fee265 100644
--- a/lib/Perlbal/ClientHTTP.pm
+++ b/lib/Perlbal/ClientHTTP.pm
@@ -17,6 +17,7 @@ use Perlbal::Util;
use fields ('put_in_progress', # 1 when we're currently waiting for an async job to return
'put_fh', # file handle to use for writing data
'put_fh_filename', # filename of put_fh
+ 'put_final_name', # final pathname of put_fh
'put_pos', # file offset to write next data at



'content_length', # length of document being transferred

@@ -29,7 +30,7 @@ use HTTP::Date ();


use File::Path;

use Errno qw( EPIPE );

-use POSIX qw( O_CREAT O_TRUNC O_WRONLY O_RDONLY ENOENT );
+use POSIX qw( O_CREAT O_TRUNC O_WRONLY O_RDONLY O_EXCL ENOENT EEXIST );


use Digest::MD5;

# class list of directories we know exist

@@ -64,6 +65,7 @@ sub init {


$self->{put_pos} = 0;
$self->{chunked_upload_state} = undef;

$self->{md5_ctx} = undef;
+ $self->{put_final_name} = undef;
}

sub close {
@@ -137,7 +139,7 @@ sub handle_put {



return $self->send_response(403) unless $self->{service}->{enable_put};

- $self->{md5_ctx} = $hd->header('Content-MD5') ? Digest::MD5->new : undef;
+ $self->{md5_ctx} = $self->{service}->{enable_md5} && $hd->header('Content-MD5') ? Digest::MD5->new : undef;

return if $self->handle_put_chunked;

@@ -381,10 +383,19 @@ sub validate_min_put_directory {
sub start_put_open {
my Perlbal::ClientHTTP $self = shift;
my ($path, $file) = @_;
+ my ($fs_path, $open_flags);

$self->{put_in_progress} = 1;
+ if ($self->{md5_ctx}) {
+ $fs_path = "$path/$file.$$." . rand . '.tmp';
+ $self->{put_final_name} = "$path/$file";
+ $open_flags = O_CREAT | O_EXCL | O_WRONLY;
+ } else {
+ $fs_path = "$path/$file";
+ $open_flags = O_CREAT | O_TRUNC | O_WRONLY;
+ }

- Perlbal::AIO::aio_open("$path/$file", O_CREAT | O_TRUNC | O_WRONLY, 0644, sub {
+ Perlbal::AIO::aio_open($fs_path, $open_flags, 0644, sub {
# get the fd
my $fh = shift;

@@ -399,6 +410,9 @@ sub start_put_open {

# should be created, call self recursively to try
return $self->start_put_open($path, $file);
+ } elsif ($! == EEXIST && $self->{put_final_name}) {
+ # temp name collision, try again
+ return $self->start_put_open($path, $file);
} else {
return $self->system_error("Internal error", "error = $!, path = $path, file = $file");
}
@@ -406,7 +420,7 @@ sub start_put_open {

$self->{put_fh} = $fh;
$self->{put_pos} = 0;
- $self->{put_fh_filename} = "$path/$file";
+ $self->{put_fh_filename} = $fs_path;

# We just opened the file, haven't read_ahead any bytes, are expecting 0 bytes for read and we're
# not in chunked mode, so close the file immediately, we're done.
@@ -469,6 +483,38 @@ sub put_writeout {
});
}

+sub put_check_md5 {
+ my Perlbal::ClientHTTP $self = shift;
+
+ my $actual = $self->{md5_ctx}->b64digest;


+ my $expect = $self->{req_headers}->header("Content-MD5");
+ $expect =~ s/=+\s*\z//;

+ if ($actual eq $expect) {
+ Perlbal::AIO::aio_rename($self->{put_fh_filename}, $self->{put_final_name}, sub {
+ my $err = shift;
+ $self->{put_fh_filename} = undef;
+ $self->{put_final_name} = undef;
+ if ($err == 0) {
+ return $self->send_response(201);
+ } else {
+ return $self->system_error("Error renaming file", "error in rename: $!");
+ }
+ });
+ } else {
+ Perlbal::AIO::aio_unlink($self->{put_fh_filename}, sub {
+ my $err = shift;
+ $self->{put_fh_filename} = undef;
+ $self->{put_final_name} = undef;
+ if ($err == 0) {


+ return $self->send_response(400,
+ "Content-MD5 mismatch, expected: $expect actual: $actual");

+ } else {
+ return $self->system_error("Error unlinking file", "error in unlink: $!");
+ }
+ });
+ }
+}
+
sub put_close {
my Perlbal::ClientHTTP $self = shift;
return if $self->{put_in_progress};
@@ -477,16 +523,7 @@ sub put_close {


if (CORE::close($self->{put_fh})) {
$self->{put_fh} = undef;

- my $md5_ctx = $self->{md5_ctx};
- if ($md5_ctx) {
- my $actual = $md5_ctx->b64digest;
- my $expect = $self->{req_headers}->header("Content-MD5");
- $expect =~ s/=+\s*\z//;
- if ($actual ne $expect) {
- return $self->send_response(400,
- "Content-MD5 mismatch, expected: $expect actual: $actual");
- }
- }
+ return $self->put_check_md5 if $self->{md5_ctx};


return $self->send_response(200);
} else {
return $self->system_error("Error saving file", "error in close: $!");

diff --git a/lib/Perlbal/Service.pm b/lib/Perlbal/Service.pm
index ad3f958..fed1e21 100644
--- a/lib/Perlbal/Service.pm
+++ b/lib/Perlbal/Service.pm
@@ -31,6 +31,7 @@ use fields (
'index_files', # arrayref of filenames to try for index files
'enable_concatenate_get', # bool: if user can request concatenated files
'enable_put', # bool: whether PUT is supported
+ 'enable_md5', # bool: whether Content-MD5 is supported on PUT
'max_put_size', # int: max size in bytes of a put file
'max_chunked_request_size', # int: max size in bytes of a chunked request (to be written to disk first)
'min_put_directory', # int: number of directories required to exist at beginning of URIs in put
@@ -267,6 +268,13 @@ our $tunables = {
check_type => "bool",
},

+ 'enable_md5' => {
+ des => "Enable verification of the Content-MD5 header in HTTP PUT requests",
+ default => 1,
+ check_role => "web_server",
+ check_type => "bool",
+ },
+
'enable_delete' => {
des => "Enable HTTP DELETE requests.",
default => 0,
diff --git a/t/20-put.t b/t/20-put.t
index 6171fb3..e3c1060 100644
--- a/t/20-put.t
+++ b/t/20-put.t
@@ -127,10 +127,24 @@ ok(! put_file(), "put disabled");


ok(manage("SET test.enable_delete = 0"));
ok(! delete_file(), "delete disabled");

ok(manage("SET test.enable_put = 1"));

+ok(manage("SET test.enable_md5 = 1"));
ok(put_file(), "put re-enabled");

# Content-MD5 checking


ok(put_file(content => "!", headers => content_md5('!')), "Content-MD5 OK");

+verify_put();


ok(! put_file(content => "?", headers => content_md5('!')), "Content-MD5 rejected");

+ok(filecontent($disk_file) ne $content, "verified put failure");
+{
+ my @list = (<$disk_file*>);
+ ok(scalar(@list) == 1 && $list[0] eq $disk_file, "no temporary file leftover");
+}
+
+$content = "!";
+verify_put();
+
+ok(manage("SET test.enable_md5 = 0"), "disable MD5 verification");
+ok(put_file(content => "?", headers => content_md5('!')), "Content-MD5 NOT rejected");
+verify_put();

1;
--
Eric Wong

Jonathan Steinert

unread,
Feb 3, 2012, 5:40:21 PM2/3/12
to per...@googlegroups.com
Thanks much for this work Eric.

I've applied your commits to perlbal master, next release will have
them. I did make a couple small changes to it myself. I hope they don't
change the behavior too much for your clients, but I feel they are safer
choices.

* Temp file names use an integer rather than a float
* If we have a temp file collision, we bail the request rather than
trying again

I figure the chances of having a temp file collision are so low in the
first place that if it does happen we may be in some strange situation
where rand() is broken or something. We should bail rather than
fast-looping for safety.

While applying this work I also discovered that we don't bail from
incomplete requests correctly and fixed that.

--hachi

> + } elsif ($! == EEXIST&& $self->{put_final_name}) {

> + ok(scalar(@list) == 1&& $list[0] eq $disk_file, "no temporary file leftover");

Eric Wong

unread,
Feb 4, 2012, 12:20:59 AM2/4/12
to per...@googlegroups.com
Jonathan Steinert <ha...@kuiki.net> wrote:
> Thanks much for this work Eric.

No problem, thanks for the review and applying!

> I've applied your commits to perlbal master, next release will have
> them. I did make a couple small changes to it myself. I hope they
> don't change the behavior too much for your clients, but I feel they
> are safer choices.

> * Temp file names use an integer rather than a float
> * If we have a temp file collision, we bail the request rather than
> trying again

No objections here.

> I figure the chances of having a temp file collision are so low in
> the first place that if it does happen we may be in some strange
> situation where rand() is broken or something. We should bail rather
> than fast-looping for safety.

I think you need to remove my original

return $self->start_put_open($path, $file);

line in that error path, though.

I wasn't even going to bother retrying, either, but I saw the ENOENT
error path retrying and figured it might be worth a try.

Jonathan Steinert

unread,
Feb 4, 2012, 12:48:47 AM2/4/12
to per...@googlegroups.com
Oh jeeze, and here (I thought) I was doing so well :)

Thanks for catching that.

Reply all
Reply to author
Forward
0 new messages