Problem with streaming...

已查看 38 次
跳至第一个未读帖子

Joseph Fridy

未读,
2020年5月4日 03:05:112020/5/4
收件人 Mojolicious
I am attempting to stream the output of a curl command with Mojolicious.  The curl command looks (roughly) like this:

curl -H "x-amz-server-side-encryption-customer-algorithm:AES256" -H "x-amz-server-side-encryption-cust\

omer-key:secretKey=" -H "x-amz-server-side-encryption-customer-key-M\

D5:HashOfASecret==" "https://mybucket.s3.amazonaws.com/obscureFileLocation?AWSAccessKeyId=secretStuff&Expires=1588568911&Signature=moreSecrets" --\

dump-header header.461 --silent


The curl command references a pre-signed URL of a file in AWS stored with Server Side Encryption with Client Side Keys (SSE-C), and supplies the necessary key information via HTTP headers (the -H command options).  The curl command works - but I don't want my users to have to have a system with curl on it to access their files.  The plan is to open the curl command as input to a pipe, and stream its output to the user's browser with Mojolicious.  The curl command also dumps out the HTTP headers from Amazon, so they can be used by Mojolicious.  They look like this:


x-amz-id-2: sgMzHD2FJEGJrcbvzQwdhZK6mxUW+ePd6xdghTfgSlV45lMhliIw4prfk4cZMTHbS4fJN8N7xio=

x-amz-request-id: 99B9CA56083DD9ED

Date: Mon, 04 May 2020 04:57:22 GMT

Last-Modified: Sat, 02 May 2020 03:47:35 GMT

ETag: "b3a11409be2705e4581119fa59af79d3-1025"

x-amz-server-side-encryption-customer-algorithm: AES256

x-amz-server-side-encryption-customer-key-MD5: HashOfSecretKey==

Content-Disposition: attachment; filename = "fiveGigFile"

Accept-Ranges: bytes

Content-Type: application/octet-stream; charset=UTF-8

Content-Length: 5368709125

Server: AmazonS3 


Note that the file is 5Gig.



This is my stab at streaming with Mojolicious:


use strict;
use Mojolicious::Lite;
use FileHandle;
use Digest::MD5;

any '/' => sub {
  my $c = shift;
  $c->render(template => "test");
};

any 'pickup' => sub {
  my $c = shift;
  my $nGigs = 0;
  my $nMegs = 0;
  $| = 1;
  open(CURLCMD,"curlCmd");
  my $curlCmd = <CURLCMD>;
  if ($curlCmd =~ /dump-header\s*(\S+)\s+/) {

    my $headerFile = $1;
    open(my $curl,"$curlCmd |");
    binmode $curl;
    my $initialized = 0;
    my $digester = Digest::MD5->new;

  my $transferLength = 0;

  my $drain;

  $drain = sub {

      my $c = shift;

      my $chunk;

      sysread($curl,$chunk,1024*1024);

      if (!$initialized) {
      # read the headers, and set up the transfer...

       open(HEADERS,$headerFile);

       while(my $line = <HEADERS>) {

         $c->res->headers->parse($line);

       }
       close(HEADERS);

       $initialized = 1;

       print "header initialization completed for the following headers\n";

       print join("\n",@{$c->res->headers->names}),"\n";

      }

      if ($initialized) {

         while (length($chunk)) {

           $digester->add($chunk);

           $transferLength += length($chunk);

           $c->write($chunk,$drain);

           my $currentMegs = int($transferLength/(1024*1024));

           if (($currentMegs > $nMegs) && ($currentMegs < 1024)) {

             print "TransferLength: $transferLength\n";

             $nMegs = $currentMegs;

           }

           my $currentGigs = int($transferLength/(1024*1024*1024));

           if ($currentGigs > $nGigs) {

             print "TransferLength: $transferLength\n";

             $nGigs = $currentGigs;

           }

         }

         if (length($chunk) <= 1) {

           if ($chunk == 0) {

            print "End of file found on curl pipe.";

            print "$transferLength bytes transmitted\n";

            print "with an MD5 hash of ",$digester->hexdigest,"\n";

            $drain = undef;

           }

           if (!defined $chunk) {

            print "Transfer error encountered on curl pipe.\n";

            print "Error:",$!,"\n";

            $drain = undef;

          }

        }

      }

    };


    $c->$drain;

  }

};

app->start;

__DATA__


@@ test.html.ep

<!DOCTYPE html>

<html>

<body>

<a href="/pickup" >Test of curl streaming... </a>

</body>

</html>



When I ran this the first time, it read about 606MB of data, and the server crashed with an "Out of memory!".  Subsequent runs failed at about 139MB, with a server crash and no "Out of memory!" message.


Obviously, I am an idiot.  Some guidance in the precise way I am being an idiot would be greatly appreciated.


Regards,


Joe Fridy

Dan Book

未读,
2020年5月4日 04:01:102020/5/4
收件人 mojol...@googlegroups.com
So there is a lot here and there's definitely easier ways to do it but just to start, Mojo::UserAgent can handle the streaming response in a more async manner so you may want to start by replacing the shell out to curl with that. Then see  https://metacpan.org/pod/Mojolicious::Guides::Cookbook#Streaming-response for a way to deal with the response as it is received, and  https://metacpan.org/pod/Mojolicious::Guides::Rendering#Streaming for how to write it to the response.

-Dan

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mojolicious/86c878ad-8bd2-4a40-94fd-cc16fbe39201%40googlegroups.com.

Sebastian Riedel

未读,
2020年5月4日 05:13:592020/5/4
收件人 Mojolicious
And if you want to stream directly to the browser that made the original request, there are also proxy helpers that do all the streaming stuff for you.


--
sebastian

Joseph Fridy

未读,
2020年5月4日 23:57:562020/5/4
收件人 mojol...@googlegroups.com
I had previously attempted to use Mojo::UserAgent, but failed to get it to work.  I resuscitated my prior efforts, and demonstrated that I still can't seem to get the amazon http headers correct.  


Here is a curl command that works:

curl -H "x-amz-server-side-encryption-customer-algorithm:AES256" -H "x-amz-server-side-encryption-cust\

omer-key:NGU5MWI3MjQ5NzQ5ZmU4NmEyZWVmMGY0MjQxZmE4YTc=" -H "x-amz-server-side-encryption-customer-key-M\

D5:Hwd4VOQrBAyysThObrXkzg==" "https://secure-transmit-vault.s3.amazonaws.com/ce72c6136b066c0f409354b1c\

861f1e2fcf716ec4063bd4490140cdf7a11d14e/72d38e96ac72a0b8924cb9905e991a8505135bebed2c72f8ed77300ac0535d\

48?AWSAccessKeyId=AKIAJSU3EBCVFZQ7KAEA&Expires=1588602165&Signature=IEfU7sPLv9UJ4L8eSaMA7GXjBNc%3D" --\

dump-header header.5121


Here is my attempt to reproduce it with Mojo::UserAgent:

_______________________________________________________

use strict;

use Mojo::UserAgent;


my $agent = Mojo::UserAgent->new(max_response_size => 0);


# get the presigned URL...


open(URL,"presignedURL");


my $url = <URL>;

chomp($url);

close(URL);


my $tx = $agent->build_tx(GET => "$url");


# get the required headers...


open(HEADERS,"headers");


while (my $line = <HEADERS>) {

  chomp($line);

  if ($line =~ /^\s*(\S+)\s*:\s*(\S.*\S).*$/) {

    my $headerName = $1;

    my $headerValue = $2;

    $tx->req->headers->header($headerName => $headerValue);

    print "Req: Header: ",$headerName," Value: ",$tx->req->headers->header($headerName),"\n";

    $tx->res->headers->header($headerName => $headerValue);

    print "Res: Header: ",$headerName," Value: ",$tx->res->headers->header($headerName),"\n";


  }

}

close(HEADERS);


#print join("\n",@{$tx->req->headers->names}),"\n";



$tx->res->content->unsubscribe('read')->on(read => sub {

    my($content, $bytes) = @_;

    print $bytes;

    });


$tx = $agent->start($tx);



__________________________________________________________


The headers look like this:


x-amz-server-side-encryption-customer-algorithm:AES256

x-amz-server-side-encryption-customer-key:NGU5MWI3MjQ5NzQ5ZmU4NmEyZWVmMGY0MjQxZmE4YTc=

x-amz-server-side-encryption-customer-key-MD5:Hwd4VOQrBAyysThObrXkzg==

Content-Disposition: attachment; filename = "fiveGigFile"

Content-Type: application/octet-stream; charset=UTF-8

Content-Length: 5368709125



And the presignedURL looks like this:


https://secure-transmit-vault.s3.amazonaws.com/ce72c6136b066c0f409354b1c\

861f1e2fcf716ec4063bd4490140cdf7a11d14e/72d38e96ac72a0b8924cb9905e991a8505135bebed2c72f8ed77300ac0535d\

48?AWSAccessKeyId=AKIAJSU3EBCVFZQ7KAEA&Expires=1588602165&Signature=IEfU7sPLv9UJ4L8eSaMA7GXjBNc%3D


How do I set the headers?  Also, how do I set the headers with the proxy-get_p option?


Thanks for your help and attention.


Regards,


Joe Fridy


Joseph Fridy

未读,
2020年5月10日 03:27:422020/5/10
收件人 Mojolicious
I reproduced the curl command with UserAgent successfully, and fixed my confusion with the headers.  Now I just have to be able to stream the results to the browser.  Here is the heart of my initial attempt, where $globalC points to the controller argument that should send data to the browser.  This reads data for 688MB, without apparently writing any of it to the browser, and then fails with an "Out of memory!"

$tx->res->content->unsubscribe('read')->on(read => sub {

    my($content, $bytes) = @_;

    our $globalC;

    our $digester;

    our $transferLength;

    our $contentLength;

    if (!$content->headers->header('headersWritten')) {

      foreach my $name (@{$content->headers->names}) {

my $value = $content->headers->header($name);


                                                 # $globalC should receive the streamed file

                                                 # This is setting the headers from the S3

                                                 # presigned URL.


$globalC->res->headers->header($name => $value);

if ($name =~ /[cC]ontent-[lL]ength/) {

  $contentLength = $value;

}

      }

      $content->headers->header('headersWritten' => 1);

      $globalC->write;

    }

    $digester->add($bytes);

    $transferLength += length($bytes);

    $globalC->write($bytes);

    print "$transferLength bytes written...\n";

  });


Attempted to use the $c->proxy->get_p helper in the following manner:

  $c->proxy->get_p($url => $headers)->catch(sub {

      my $err = shift;

      print "Proxy error is $err\n";

      $c->render(text => "Error: $err\n");

                                            });



This appears to restart three or four times before failing with a "Connection refused".  The headers are the same headers that work correctly in the download with Mojo::UserAgent.

Any guidance?  I assure you I have read the references you have mentioned heretofore assiduously.  

Regards,

Joe Fridy

Sebastian Riedel

未读,
2020年5月10日 06:53:422020/5/10
收件人 Mojolicious
Please don't use HTML for formatting, the code is completely unreadable.

--
sebastian

Joseph Fridy

未读,
2020年5月10日 15:26:552020/5/10
收件人 Mojolicious
My sincere apologies.  The two scraps are:

$tx->res->content->unsubscribe('read')->on(read => sub {

    my($content, $bytes) = @_;

    our $globalC;

    our $digester;

    our $transferLength;

    our $contentLength;

    if (!$content->headers->header('headersWritten')) {

      foreach my $name (@{$content->headers->names}) {

my $value = $content->headers->header($name);


                                                 # $globalC should receive the streamed file

                                                 # This is setting the headers from the S3

                                                 # presigned URL.


$globalC->res->headers->header($name => $value);

if ($name =~ /[cC]ontent-[lL]ength/) {

  $contentLength = $value;

}

      }

      $content->headers->header('headersWritten' => 1);

      $globalC->write;

    }

    $digester->add($bytes);

    $transferLength += length($bytes);

    $globalC->write($bytes);

    print "$transferLength bytes written...\n";

  });


Which fails after reading 668MB,


and 


  $c->proxy->get_p($url => $headers)->catch(sub {

      my $err = shift;

      print "Proxy error is $err\n";

      $c->render(text => "Error: $err\n");

    });


Which fails after trying three or four times with a Connection refused.

Thanks,

Joe Fridy

Joseph Fridy

未读,
2020年5月15日 01:13:412020/5/15
收件人 Mojolicious
I have Mojo::UserAgent working to get the file from AWS S3 (including all the crypto foo), but I cannot stream its output from a controller attached to a Mojolicious::Lite route.  I am running into the same problem as is discussed in this thread from 2018:


$ua->start($tx) is a blocking call, and as a result my chunks merely append to a buffer with nowhere to go until memory fills up.  Unfortunately, if the way to solve this problem is answered in the responses to the thread above, said answer is too subtle for me to suss out.

So, to restate my problem in words:

given a Mojo::UserAgent transaction that can read a very large file, how can I stream its output, buffer by buffer, into a browser via a Mojolicious::Lite route?

Regards,

Joe Fridy


Dan Book

未读,
2020年5月15日 04:27:062020/5/15
收件人 mojol...@googlegroups.com
Take a look at the proxy helpers added recently, at the least they should provide inspiration:  https://metacpan.org/pod/Mojolicious::Plugin::DefaultHelpers#proxy-%3Eget_p

-Dan

--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.
回复全部
回复作者
转发
0 个新帖子