Direct to S3 + Post Processing memory usage question

234 views
Skip to first unread message

Sumit Gupta

unread,
Jan 22, 2016, 3:16:07 PM1/22/16
to ruby-shrine
Aloha!

Sorry in advance for whats probably going to be a very noob question. I believe its related to something you described in this thread so going to reply here.. let me know if its not and I'm happy to move it to a top-level thread. I'm actually in the midst of replacing some thumbnail generation being handled by Paperclip to be handled by Shrine before doing the big push of seeing if Shrine can manage the process im about to describe. 

The one big question I have is: In a "direct to s3" world, if I leave the pre/post process hooks empty, will Shrine still attempt to bring the files its trying to attach into the server/dyno memory? 

For me it makes my servers crap out since the file can be many gigs large and other than the encoding work Zencoder is doing, I don't have any need for post processing. 

Full description of what I'm going to try and solve with Shrine:

What I've currently got implemented is: 
- Custom code to provide a presign'ed endpoint, 
- A javascript uploader that sends directly to my S3 bucket (using evaporate.js) and comes back to hit one of my Rails endpoints to create the record.
- Which then kicks off a job to Zencoder to create a bunch of encoded versions, which Zencoder puts directly back onto S3 + sends a notification to my server about all the locations on the bucket.
- A bunch of code to create proper CDN urls to all those versions.
- A bunch of code to delete files if the rails record is deleted.

I originally had paperclip managing the files up on S3 after they got put there but paperclip was automatically trying to read the file into memory and I saw no way around that. I felt forced to write my own helpers to manage the files (which felt like the dark ages of "There should be a library that can handle this for me!"). 

My hope is that I can use Shrine to:

- Use the built in pre-signed endpoint, looks identical to the one I wrote myself, (I always love deleting custom code)
- Continue to use my custom javascript uploader which uses EvaporateJS under the hood to go directly to S3
- Have it save the rails record with the URL to the S3 as it does today and have shrine manage the file as the 'original'
-  Kicks off the Zencoder job which creates all the encoded versions and puts them directly back on S3
- When Zencoder tells my rails app about those versions I have Shrine manage all those versions too
- Get file deletion on rails-record deletion "for free". (again, love deleting code)
- Get CDN urls built for me "for free" with some Shrine configuration. (again, love deleting code)

All without the memory overhead of a multi-gig file being pulled down by Shrine automatically. 

I saw a lot of resources that touched on a bunch of these points/goals but nothing specifically about the memory management. My hope also one day I can say "Hey shrine, on this one particular version zencoder gave me, pull that one into a background job and rip EXIF data from it" since i can point it at a known-small version... but thats not a usecase I need to worry about for a while. 

Sorry for the verbosity and if this was already answered and I'm just too dense to have found or understood the answer that was in front me. 

Cheers and excited about what feels like a really big step forward in the Rails File Handling world!
Sumit








Janko Marohnić

unread,
Jan 22, 2016, 9:45:52 PM1/22/16
to Sumit Gupta, ruby-shrine
Aloha! :)

I'm very happy that you want to use Shrine in your advanced upload flow, use cases like these are one of the main reasons I created Shrine, so that it's maximally performant and easily hackable.

The answer is: no, the file won't be downloaded or loaded into memory in your case :)

How Shrine works is that, if you're doing processing, you're in charge of downloading the file from cache, so if you don't have any processing code, the file won't be downloaded. Normally what would happen that on "promoting" the cached file would be downloaded anyway so that it can be uploaded to store. However, Shrine is smarter when both cache and store are S3 (which is your case), so when the cached (directly uploaded) file is supposed to be moved to strore, instead of downloading and uploading, Shrine simply issues an S3 "COPY" HTTP request, telling S3 to do the copying. So, after the file is directly uploaded via your JS uploader, it won't be downloaded or loaded into memory.

After Zencoder tells you it has finished processing, you just need to update your record with a hash of versions in JSON format, where keys are version names, and values are representations of uploaded files. Hopefully it should be easy to understand what the fields mean, but feel free to ask questions if you need something clarified.

Cheers,
Janko

--
You received this message because you are subscribed to the Google Groups "ruby-shrine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine...@googlegroups.com.
To post to this group, send email to ruby-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ruby-shrine/1d6d45d8-b002-48a9-bf99-4eb55f8184bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sumit Gupta

unread,
Feb 5, 2016, 11:38:07 PM2/5/16
to ruby-shrine
Thank you for all your help! Sorry for the silence since your awesome response. The way you described work great. I have a few things I hacked into the library that Id like to get some feedback on cause I think it could be extracted into a plugin or maybe a patch to the library, but I'll get my thoughts and clear examples around that in a follow up post when I'm not under the gun to get some stuff launched. 

I probably should make this another thread but I had whats hopefully a quick question around minimagick / image_processing library.  I'm attempting to apply a watermark during the process phase with the following code:

def process(io, context)
case context[:phase]
when :store
watermark = MiniMagick::Image.open("#{Rails.root}/app/assets/images/watermarks/ampersand.png")
watermark.resize '30%'
asset = io.download

watermarked = with_minimagick(asset) do |image|
image.composite(watermark) do |c|
c.compose 'Over'
c.gravity 'Center'
end
end

    { original: io, watermarked: watermarked }
end
end


When I manually apply a watermark directly with MiniMagick on the command line it is saving the file properly but when I do it with the code above during the processing phase I don't see the watermark. Im hoping its a dumb issue on my part (most likely) and I'm happy to setup a sample app with the problem highlighted but thought I'd just see if your well trained eyes can spot the issue. 

I'm hoping to write at least a blog post on generating watermarks and also on our way we tied zencoder + shrine together for a super scalable at super cheap costs.. but just need to get this all deployed before I can focus on that fun stuff :)

No worries at all if you're not sure either! I'm going to keep plugging away at it till I get it working this weekend so hopefully something clicks soon. 

Cheers!

Janko Marohnić

unread,
Feb 6, 2016, 9:00:34 AM2/6/16
to ruby-shrine
The ImageProcessing::MiniMagick#with_minimagick currently works in a way that it expects you to modify the yielded MiniMagick::Image object. It does not use the result of the block. Since MiniMagick::Image#composite is nondestructive (it returns a new MiniMagick::Image instead of modifying), the yielded image object remains unchanged.

This is probably a bit unfortunate, I will see about changing that in the near future, it would probably make sense to use the result of the block. For now you can just use MiniMagick directly:

def process(io, context)
 
case context[:phase]
 
when :store
    watermark
= MiniMagick::Image.open("#{Rails.root}/app/assets/images/watermarks/ampersand.png")
    watermark
.resize '30%'

    tempfile
= io.download
    asset
= MiniMagick::Image.new(tempfile.path, tempfile)

    watermarked
= asset.composite(watermark) do |c|

      c
.compose 'Over'
      c
.gravity 'Center'
   
end

   
   
{original: io, watermarked: watermarked.tempfile}
 
end
end

I added MiniMagick::Image#tempfile method in the latest version of MiniMagick (4.4.0). If you can't/don't want to update MiniMagick, you can alternatively access the tempfile with `watermarked.instance_variable_get("@tempfile")`.

Cheers,
Janko

Sumit Gupta

unread,
Feb 12, 2016, 5:33:51 PM2/12/16
to ruby-shrine

So I actually did a bit of a change up from doing the image processing myself. I realized that my current catalog of stock photos are pretty big themselves (100, 200 meg files), which I did not want to bring into memory on my dynos either. Ive written into the hooks and processing directly a way to use Imgix but want to extract it into a plugin and use it for all my image transformations at this point. 

Im sure I have another 100 questions before all is said and done (and my hope is to have the plugins written by the end of the month after our big launch on Wednesday [which I also failed at scheduling on the same day as my wedding... but thats another story]). 

Thank you so much for all your help!

da...@harris.org.nz

unread,
Jul 6, 2018, 4:46:23 AM7/6/18
to Shrine
Hi Janko,

I have been using the above code with great success for a while, but I've just updated to ImageProcessing v1.4.0 and Shrine v2.11.0 and I can't seem to make it work with the new chainable API.

I have tried this:


pipeline = ImageProcessing::MiniMagick
  .source(io.download)

watermark = ImageProcessing::MiniMagick
  .source(context[:host].logo.download) # Also a Shrine Image...
  .call
  .path

shareable = pipeline
  .resize_to_limit(300, 300)
  .composite(watermark) do |cmd|
    cmd.compose 'Over'
    cmd.gravity 'SouthEast'
    cmd.geometry '+55+55'
  end
  .call

{original: io, shareable: shareable}

It doesn't break, but produces a really messed up image, but it looks like it's just using the watermark image, as it's the same dimensions as the resized-on-upload watermark. I notice above that you say that MiniMagick::Image#composite is nondestructive so maybe that is what is going on here.

I have attached the logo and the 'shareable' version from above in case you can glean anything from them.



Any help would be greatly appreciated. Once I have it working I will write it up on the wiki for others to reference as I imagine automatic watermarking is quite a common operation.

Dave

Janko Marohnić

unread,
Jul 6, 2018, 10:01:02 AM7/6/18
to da...@harris.org.nz, Shrine
Hello Dave,

There are two main changes in the new API (other than how you call it). One change is that now MiniMagick::Tool::Convert is used directly (you can find it in the MiniMagick README under "Metal"), so it's not possible to call MiniMagick::Image methods such as #composite anymore. Another change is that MiniMagick::Image#composite used the `composite` command-line tool, and here ImageProcessing::MiniMagick builds a single `convert` command, so we'll need to translate the arguments (you can see the difference here).

The following should generate the correct composite command:

  ImageProcessing::MiniMagick
    .source(original)
    .resize_to_limit(300, 300)
    .append(watermark.path)
    .compose("Over")
    .gravity("SouthEast")
    .geometry("+55+55")
    .composite
    .call

Btw, there is a feature request open for adding a dedicated #composite command (similar to MiniMagick::Image#composite), which wouldn't require you to remember the ImageMagick options. If you'd maybe like to implement it, that addition would be greatly appreciated. Otherwise I'll probably implement it soon.

Once I have it working I will write it up on the wiki for others to reference as I imagine automatic watermarking is quite a common operation.

That would be great! 

BTW, I would recommend using Shrine::UploadedFile#download with a block, so that the temporary file is automatically deleted.

  watermark = context[:host].logo.download do |logo|
    ImageProcessing::MiniMagick.call(logo)
  end

  shareable = io.download do |original|
    ImageProcessing::MiniMagick
      .source(original)
      .resize_to_limit(300, 300)
      .append(watermark.path)
      ...
      .call
  end

  watermark.close!

Kind regards,
Janko

You received this message because you are subscribed to the Google Groups "Shrine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.

To post to this group, send email to ruby-...@googlegroups.com.

da...@harris.org.nz

unread,
Jul 9, 2018, 5:41:47 PM7/9/18
to Shrine
Hi Janko,

Thank you for getting back to me so quickly, sorry it's taken a while to respond. Your code worked perfectly.
I was able to clean up my Uploader classes quite a lot with the use of the Shrine::UploadedFile#download with a block and using the approach from the Shrine README where you build up the hash of versions as you go:

process(:store) do |io, context|
  versions = { original: io } # retain original

  io.download do |original|
    pipeline = ImageProcessing::MiniMagick.source(original)

    versions[:show]  = pipeline.resize_to_limit!(1200, 1200)

    versions[:index] = pipeline.resize_to_limit!(300, 300)

    # Watermark example
    #
    watermark = host_logo_file(context)

    versions[:shareable] = pipeline
      .resize_to_limit(1200, 1200)
      .append(watermark.path)
      .compose('Over')
      .gravity('SouthEast')
      .geometry('+55+55')
      .composite
      .call

    watermark.close!
  end

  versions # return the hash of processed files
end


Now that I have upgraded I can try switching to VIPS, with hopefully a huge drop in processing time ;)

I see that you also provide documentation for using Uppy with direct S3 uploads now also. I am currently using Dropzone and felt like I had to discover a lot of stuff myself so it's really great to see a full example of that end-to-end process working. The upgrade from Dropzone to Uppy might be next on the cards, it looks really pretty ;)

Thank you for all the work you put into image_processing and shrine, it's obviously taken a lot of time and care to get everything working so seamlessly.

Dave

Janko Marohnić

unread,
Jul 9, 2018, 8:16:45 PM7/9/18
to Dave Harris, Shrine
Btw, I just implemented #composite and released version 1.5.0 of ImageProcessing. Basically it allows you to specify all parameters as #composite options, which I think is nicer from the readability perspective. It also loosens up the type requirement of the overlay argument, now you can just pass a File/Tempfile object directly.

  versions[:shareable] = pipeline
   .resize_to_limit(1200, 1200)
   .composite(watermark, compose: 'Over', gravity: 'SouthEast', geometry: '+55+55')
   .call

I also updated your wiki guide with the new #composite method, thanks for writing it btw!

Now that I have upgraded I can try switching to VIPS, with hopefully a huge drop in processing time ;)

Yeah, VIPS is really nice and fast. Note that the ImageProcessing API is not exactly the same for each method, so it's not a total drop-in replacement, but we try to make the APIs as similar as possible. Though now I realized that the new #composite APIs could be much more similar or actually the same.

I see that you also provide documentation for using Uppy with direct S3 uploads now also. I am currently using Dropzone and felt like I had to discover a lot of stuff myself so it's really great to see a full example of that end-to-end process working. The upgrade from Dropzone to Uppy might be next on the cards, it looks really pretty ;)

Oh yeah, I felt a lot of pain figuring out how to make jQuery-File-Upload, Dropzone or FineUploader work with Shrine. The problem wasn't that the integration was complex – actually it was as simple as it can be – the problem was in the assumptions those libraries made. jQuery-File-Upload was the only one that I was able to set up with direct S3 uploads. That's why I was so happy to discover Uppy; it was really simple to integrate with Shrine's upload_endpoint and presign_endpoint plugins, the presign endpoint even happened to return parameters that were already named according to Uppy's convention.

Thank you for all the work you put into image_processing and shrine, it's obviously taken a lot of time and care to get everything working so seamlessly.

You're welcome, it's great to receive such appreciation :)

Kind regards,
Janko 

To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.

To post to this group, send email to ruby-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages