Chunked upload/download from AWS S3

632 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Sal

ungelesen,
30.05.2016, 10:49:2130.05.16
an Haskell Pipes
Hello,

I am planning to use pipes-http for AWS S3 put/get operations (involving big binary objects). I noticed that the pipes-http `stream` api mentions that the server must support chunked encoding. So, I looked up AWS documentation which mentions that they have a different way of doing chunking (basically, adding signature to every chunk). 

 I also checked `aws` and `amazonka-s3` packages  - it seems to me that they are not compatible with pipes-http because they use conduit. Please correct me if I got this wrong. So, it seem to me I must write my own HTTP request/response using `pipes` for AWS S3 operations, and must write custom chunking.

If any one has already done this before, and could share tips, that will be very helpful.

Thanks.

Sal

ungelesen,
05.06.2016, 11:14:5105.06.16
an Haskell Pipes
Not sure why Ben's post isn't visible in this group yet though it was sent to the mailing list - here is what he wrote:

-----------
Have a look at my recently-uploaded pipes-s3 package [1].

Cheers,

- Ben

[1] https://hackage.haskell.org/package/pipes-s3
-----------

This looks very useful. One question though - shouldn't HTTP manager be created only once, instead of being recreated for every request in `fromS3'` request wrapper? Here is my code involving AWS.S3 with conduit - should we take a similar approach but with pipes-s3 apis?

{-# LANGUAGE OverloadedStrings #-}

import qualified Aws
import qualified Aws.Core as Aws
import qualified Aws.S3 as S3
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sourceFile)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,requestBodySource,newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
import Control.Monad.IO.Class
import System.IO
import Control.Monad.Trans.Resource (runResourceT)
import Control.Concurrent.Async (async,waitCatch)
import Control.Exception (displayException)
import Data.Text as T (pack)
import Data.List (lookup)

main
:: IO ()

main
= do
 
{- Set up AWS credentials and S3 configuration using the IA endpoint. -}
 
Just creds <- Aws.loadCredentialsFromEnv
  let cfg
= Aws.Configuration Aws.Timestamp creds (Aws.defaultLog Aws.Error)
  let s3cfg
= S3.s3 Aws.HTTP S3.s3EndpointUsClassic False

 
{- Set up a ResourceT region with an available HTTP manager. -}

  httpmgr
<- newManager tlsManagerSettings
  let file
="out" -- can create a 100MB test file like this on linux: dd if=/dev/urandom of=out bs=100M count=1 iflag=fullblock
  let inbytes
= sourceFile file
  lenb
<- System.IO.withFile file ReadMode hFileSize
  req
<- async $ runResourceT $ do
   
Aws.pureAws cfg s3cfg httpmgr $
     
(S3.putObject "put-your-test-bucket-here" ("testbucket/test") (requestBodySource (fromIntegral lenb) inbytes))
       
{  
          S3
.poMetadata = [("content-type","text;charset=UTF-8"),("content-length",T.pack $ show lenb)]
       
-- Automatically creates bucket on IA if it does not exist,
       
-- and uses the above metadata as the bucket's metadata.
          ,S3.poAutoMakeBucket = True
        }
  reqRes <- waitCatch req
  case reqRes of
    Left e -> print $ displayException $ e
    Right r -> print $ S3.porVersionId r


Ben Gamari

ungelesen,
05.06.2016, 14:55:3905.06.16
an Sal, Haskell Pipes
Sal <sanket....@gmail.com> writes:

> Not sure why Ben's post isn't visible in this group yet though it was sent
> to the mailing list - here is what he wrote:
>
Ahh, indeed it looks like I sent it from the wrong email address.

> -----------
> Have a look at my recently-uploaded pipes-s3 package [1].
>
> Cheers,
>
> - Ben
>
> [1] https://hackage.haskell.org/package/pipes-s3
> -----------
>
> This looks very useful. One question though - shouldn't HTTP manager be
> created only once, instead of being recreated for every request in
> `fromS3'` request wrapper?
>
Hmmm, perhaps, although in my previous use-cases the objects being
read/written were rather large so the cost of bringing up a new HTTP
manager is relatively quite small.

If it would help I could expose another variant of the interface
allowing one to provide a Manager to use. For instance,

fromS3' :: MonadSafe m
=> Manager -> Aws.Configuration -> Bucket -> Object
-> (Response (Producer BS.ByteString m ()) -> Producer BS.ByteString m a)
-> Producer BS.ByteString m a

My only hesistation in doing so is that this is a rather type-unsafe
interface since AWS requires TLS yet there is nothing in the type to
suggest this.

Cheers,

- Ben

Sal

ungelesen,
05.06.2016, 15:48:2005.06.16
an Haskell Pipes, sanket....@gmail.com


If it would help I could expose another variant of the interface
allowing one to provide a Manager to use. For instance,

    fromS3' :: MonadSafe m
            => Manager -> Aws.Configuration -> Bucket -> Object
            -> (Response (Producer BS.ByteString m ()) -> Producer BS.ByteString m a)
            -> Producer BS.ByteString m a

My only hesistation in doing so is that this is a rather type-unsafe
interface since AWS requires TLS yet there is nothing in the type to
suggest this.

Ben, understood. Perhaps TLS requirement warning can be added to documentation for the API. That way, we have a long-lived HTTP manager, instead of creating a new one every time, especially for short requests.

Ben Gamari

ungelesen,
06.06.2016, 05:01:3106.06.16
an Sal, Haskell Pipes, sanket....@gmail.com
Indeed, this sounds reasonable. How does this [1] look?

Cheers,

- Ben


[1] https://github.com/bgamari/pipes-s3/commit/598cb0ea1c43b8a11f423e849af047756296c723

Sal

ungelesen,
06.06.2016, 13:01:4606.06.16
an Haskell Pipes, sanket....@gmail.com
Looks good from what I eyeballed. I am going to try it out. I might also try to adapt streaming package by re-using your code for AWS request signing. streaming looks like very clean API - so, I am checking it out as well.
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten