s3 storage upload from alternate s3 location

151 views
Skip to first unread message

cmc...@gmail.com

unread,
Aug 22, 2018, 11:35:46 PM8/22/18
to Shrine
I have a fully working setup for uploading local files to s3 storage. I also want another mechanism for clients to upload files to a generic bucket that is scanned regularly and then those files picked up and processed via normal mechanism.

From docs looks like only option is to download and read to temp file then reattach to the mode correclty, Is there another way to 'move' files from an s3 location (the auto upload location) to be read into the the correct store with the model?

Thanks,
Craig

Janko Marohnić

unread,
Aug 25, 2018, 1:17:10 AM8/25/18
to cmc...@gmail.com, ruby-shrine
Hello Craig,

Shrine's direct upload flow works in a way that the client uploads their file to a "temporary" S3 location (it can be a temporary bucket, or a directory inside the same bucket), and then when you assign that cached file, Shrine will detect that the cached file is from S3 storage, and simply copy it to the permanent location (without downloading). You can put the copying into the background job by loading the backgrounding plugin.

Does that differ somehow from what you're trying to do?

Kind regards,
Janko

--
You received this message because you are subscribed to the Google Groups "Shrine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine...@googlegroups.com.
To post to this group, send email to ruby-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ruby-shrine/f959f4c2-e7ab-4056-aa7e-7e08c9130362%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cmc...@gmail.com

unread,
Sep 7, 2018, 2:04:43 AM9/7/18
to Shrine
HI Janko,

I see, makes sense - glad the tool is this clever! Can I do a move or a copy/delete in the same way? I want to remove the files that are uploaded after they have been saved to permanent location.

For clarity the process works likes this:
  • Clients have a powershell script running on their local desktop/server that uploads files from a local fileserver to a shared S3 bucket
  • On the server there is a scheduled task that runs every 10 minutes or so and saves files against a specific import process - code looks like the below (note - I'm not sure this is the best way to do this, but it works!)
    uploader = CsvUploader.new(:store)
   
import.file = uploader.upload Down.open(import_file) # import_file is the S3 object.key

The CsvUploader is very simple so I assume I need to add some processing in here to do the move/copy?

class CsvUploader < Shrine
  plugin
:direct_upload, max_size: 5 * 1024 * 1024 # 5 MB
end

I am using Shrine successfully for image processing, my shrine.rb initializer looks like this

require 'shrine'
require 'shrine/storage/s3'

s3_options
= {
  access_key_id
:     ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key
: ENV['AWS_SECRET_ACCESS_KEY'],
  region
:            ENV['AWS_REGION'],
  bucket
:            ENV['AWS_BUCKET']
}

Shrine.storages = {
  cache
: Shrine::Storage::S3.new(prefix: 'cache', **s3_options),
  store
: Shrine::Storage::S3.new(prefix: 'store', **s3_options)
}

Shrine.plugin :activerecord
Shrine.plugin :cached_attachment_data # for forms
Shrine.plugin :logging
Shrine.plugin :backgrounding
Shrine::Attacher.promote { |data| Delayed::Job.enqueue PromoteJob.new(data) }
Shrine::Attacher.delete { |data| Delayed::Job.enqueue DeleteJob.new(data) }

Appreciate any insights,
Craig

Janko Marohnić

unread,
Sep 7, 2018, 4:43:55 PM9/7/18
to Craig McGuff, ruby-shrine
Hi Craig,

Can you describe to me the exact pipeline you want to achieve, from initial upload to end destination? Is it something like this?
  1. original file gets uploaded to temporary storage directly on the client side
  2. cached S3 file is copied to permanent S3 storage in a background job
  3. stored S3 file is copied to a shared S3 bucket in the powershell script
I'm not sure that's correct, because you said the powershell script uploads files from local fileserver to S3 storage, and I don't see any local storage here, both your Shrine storages are S3.

When I know exactly what you want happen, I should be able to help you achieve it with optimal performance, as Shrine is quite flexible in this regard. The code snippet you posted above will work, but it does extra uploads and downloads that are not necessary.

Can I do a move or a copy/delete in the same way? I want to remove the files that are uploaded after they have been saved to permanent location.

The delete_promoted plugin should be able to achieve this, but since I have a feeling the solution will be custom, you'll most likely be better off just deleting the uploaded file manually. S3 doesn't have a "move" command, so copy + delete is the way to go.

Kind regards,
Janko

Janko Marohnić

unread,
Sep 9, 2018, 5:02:13 AM9/9/18
to Craig McGuff, ruby-shrine
Hi Craig,

You forgot to "Reply to All", so I'll just copy your previous email:

yeah, not obvious I guess :) there are two separate flows that were using paperclip that I am trying to replace:
  • Flow #1 Web user using the website attaches a CSV file directly to an Import model
    • The Import model has that CsvUploader attribute
    • Uses the cache/store approacj and seems to working as expected
  • Flow #2 Background task attaches CSV files to an import model (per previous email)
    • In this case the files are uploaded to another S3 bucket completely - let's call it Upload
    • Rake task is run every 10 minutes looking for files in Upload and if found creates an Import and attaches the CSV to it
    • I'm not sure at this point how the cache/store method works or how I should address it?

Thanks for clarifying. Ok, so it seems that in Flow #2 you don't need to go through cache storage, as attaching is done in the background. Shrine has a way to upload files directly to permanent storage and assign them like this.

Is the other S3 bucket static and known to your application? Because if it is, you can copy the file from that bucket directly to your permanent storage (S3 supports cross-bucket copies). I'll illustrate with this example:

 Shrine.storages = {

    cache:       Shrine::Storage::S3.new(bucket"my-bucket"...),

    store:       Shrine::Storage::S3.new(bucket"my-bucket"...),

  }


  Shrine.plugin :refresh_metadata


  uploaded_file = CsvUploader.uploaded_file("id" => object_key, "storage" => "other_store")

  uploaded_file.refresh_metadata! # retrieves metadata from the S3 object (recommended)


  store       = CsvUploader.new(:store)

  stored_file = store.upload(uploaded_file) # S3 object is copied from :other_store to :store


  import = Import.new

  import.file_attacher.set(stored_file) # stored file is assigned without any re-uploading

  import.save


If the other S3 bucket is dynamic or unknown to the application, and therefore you would rather treat object URLs as generic remote URLs, you can use `Down.open` like you already suggested, here is an example:

  Shrine.storages = {

    cacheShrine::Storage::S3.new(bucket"my-bucket"...),

    storeShrine::Storage::S3.new(bucket"my-bucket"...),

  }


  remote_file = Down.open(import_url)


  store       = CsvUploader.new(:store)

  stored_file = store.upload(remote_file) # downloads and uploads the remote file to :store


  import = Import.new

  import.file_attacher.set(stored_file) # assigns the stored file without any re-uploading

  import.save


Hope that helps :)

Kind regards,
Janko

Craig McGuff

unread,
Sep 11, 2018, 3:55:57 AM9/11/18
to Janko Marohnić, ruby-shrine
Hi Janko,

thanks - it's definitely the first option, the S3 bucket is static and known, I am getting an error trying to implement this, failing on this line

    uploaded_file = CsvUploader.uploaded_file("id" => object_key, "storage" => "other_store")

I am using object.key for the id, I am not clear what to reference for storage option.
If I run in console I get the following:

    uploaded_file = CsvUploader.uploaded_file('id' => object.key)
    Shrine::Error: {"id"=>"SAMPLE-USERS-09032016.CSV"} isn't valid uploaded file data

Thanks,
Craig

To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.

Craig McGuff

unread,
Sep 11, 2018, 4:06:02 AM9/11/18
to Janko Marohnić, ruby-shrine
Hi Janko, actually I think I figured it out:

I have added an extra storage to my shrine.rb to reference the upload location like this:

require 'shrine'
require 'shrine/storage/s3'

s3_options = {
  access_key_id:     ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
  region:            ENV['AWS_REGION'],
  bucket:            ENV['AWS_BUCKET']
}

upload_s3_options = {

  access_key_id:     ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
  region:            ENV['AWS_REGION'],
  bucket:            ENV['UPLOAD_BUCKET']
}


Shrine.storages = {
  cache: Shrine::Storage::S3.new(prefix: 'cache', **s3_options),
  store: Shrine::Storage::S3.new(prefix: 'store', **s3_options),
  upload_store: Shrine::Storage::S3.new(**upload_s3_options)

}

Shrine.plugin :activerecord
Shrine.plugin :cached_attachment_data # for forms
Shrine.plugin :logging
Shrine.plugin :backgrounding
Shrine.plugin :refresh_metadata


Shrine::Attacher.promote { |data| Delayed::Job.enqueue PromoteJob.new(data) }
Shrine::Attacher.delete { |data| Delayed::Job.enqueue DeleteJob.new(data) }

and now in the background task, I reference the following:

    uploaded_file = CsvUploader.uploaded_file('id' => import_file, 'storage' => 'upload_store')

This seems to be working, just need to remove the uploaded file after I have copied it.

Thanks,
Craig

Janko Marohnić

unread,
Sep 11, 2018, 4:26:18 AM9/11/18
to Craig McGuff, ruby-shrine
just need to remove the uploaded file after I have copied it.

You can do that with `Shrine::UploadedFile#delete`:

  uploaded_file = CsvUploader.uploaded_file('id' => import_file, 'storage' => 'upload_store')
  # ...
  uploaded_file.delete

Kind regards,
Janko 

Reply all
Reply to author
Forward
0 new messages