shrine-url as "alternate" cache?

123 views
Skip to first unread message

jona...@dnil.net

unread,
Nov 13, 2018, 3:32:10 PM11/13/18
to Shrine
So I have an app that can accept files the "ordinary" sort of way (actually via a direct upload), to an ordinary shrine cache, say FileSystem or S3. 

But it ALSO needs to accept files from a remote url. 

When I first noticed shrine-url, I thought that could maybe magically just do it.  But of course, it turns out it can't, unless remote_url is actually the "cache" storage for the uploader. 

What I naively at first tried, was something like:

Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new("tmp/shrine_storage_testing", prefix: "cache"), # temporary
  store: Shrine::Storage::FileSystem.new("tmp/shrine_storage_testing", prefix: "store"),       # permanent
  remote_url: Shrine::Storage::Url.new
}

then just:

some_model.attached_file = { "id" => "http://example.com", "storage" => "remote_url"}.to_json

That doesn't work, because in several places shrine checks to see if the uploaded file (or a hash that will become it) is cached?/uploaded?, like:


Now, I could just handle this outside of shrine -- write my own code to download the fie (perhaps using the down gem, probably in my own background job) and assign it to cache. 

But it would be so nice to just let shrine take care of this for me, in the background becuase I'm already using :backgrounding.  And the "diy" approach kind of results in an extra copy of the file, to cache, when it never really needs to be there. 

So I played around a bit to see what would have to be changed in shrine to let it do this... as a hack, in fact if `uploaded?` (https://github.com/shrinerb/shrine/blob/203c2c9a0c83815c9ded1b09d5d006b2a523579c/lib/shrine.rb#L253) is just overridden to do:

`|| uploaded_file.storage_key == "remote_url"`

....everything works out great!

Of course, that's a terrible hack -- for one thing, we'd want this just to be on the registered `cache` storage, not the registered `store`. And I'm not sure there's any way to use a shrine plugin to only override `uploaded?` in the cache storage.  In general, even though a one-line intervention is all it takes... I'm not sure there's any non-violent way to get the shrine architecture to allow it. 

Do you have any thoughts?  Should I give up and just do it "manually", having to write my own code, and with the extra copy to cache?  Is there another useful approach?  Any thoughts welcome. It's just _so close_ to being able to do what I need... but so far. :)

To make matters more complicated, even if I could get this to work.... the shrine-url storage isn't _quite_ having the features I need anyway. In addition to giving it a url (in `id:`), I'd need to give it some headers to send with the HTTP request. Because these remote file URLs were obtained with OAuth2 (imagine allowing the user to attach a file from, say, their Dropbox account, which is the actual usecase here), and shrine-url would need a way to be told, based on the uploaded file hash, to send a particular `Authorization` header along with the HTTP request.  But that seems do-able one way or another, if the architectural issue has a solution. 

Jonathan
   

Janko Marohnić

unread,
Nov 14, 2018, 2:27:23 PM11/14/18
to Jonathan Rochkind, ruby-...@googlegroups.com
Hey Jonathan,

To summarize what we've discussed offline, you could have Shrine trigger promotion by overriding the internal Shrine::Attacher instance that the Shrine::Attachment module adds to the model, and setting the cache storage to your desired storage. For example:

  if remote_url

    photo.image_attacher(cache: :remote_url)

    photo.image_attacher.assign({url: remote_url, storage: :remote_url, metadata: {}}.to_json)

  else

    photo.image = file

  end


  photo.save


That should successfully trigger promotion, in your case spawn a background job. Then in the background, even though the Shrine::Attacher instance that will get instantiated will have the default cache storage again, the promotion should still happen. Shrine only checks whether the assigned file is uploaded to "cache" storage on assignment and when it has to decide whether to start promoting the file; it's only important to "pass" those two things. Because the storage is "encoded" in the Shrine::UploadedFile itself, Shrine will find the :remote_url storage, because it's registered globally.

-------

Onto the second problem. So, yeah, when shrine-url "opens" the file from cache storage – in your case a remote URL – it will not pass any headers to the download request. So in your case the open will fail, because the URL requires headers. 

But shrine-url has a way to pass headers. Shrine::Storage::Url#open and #download both accept options for Down::Http.open and Down::Http.download (Down::Http is the default downloader of shrine-url). These open can be passed via a Shrine::UploadedFile object:

 uploaded_file # cached file uploaded to shrine-url

 uploaded_file.open(headers: { "Authorization" => "...", ... }) # "opens" the file with headers


When Shrine re-uploads the cached file to permanent storage, it calls Storage#open internally for the source Shrine::UploadedFile, which is the one from cache storage in this case. So, internally this happens:

 store = Shrine.new(:store)

 store.upload(uploaded_file) # here Storage#open is called internally


But you can choose to open the Shrine::UploadedFile yourself before it's uploaded to permanent storage by calling Shrine::UploadedFile#open, like we've shown in the previous example.

There are two ways you could currently get the behaviour you want. One is overriding Shrine::Attacher#promote:

  Shrine.plugin :module_include

  Shrine.attacher_module do

    def promote(cached_file, **options)

      if cached_file.storage.is_a?(Shrine::Storage::Url)

        cached_file.open(headers: { ... }) # open the Shrine::UploadedFile ourselves

      end

      super

    end

  end


Other is overriding Shrine::UploadedFile#open:

  Shrine.plugin :module_include

  Shrine.file_methods do

    def open(**options)

      if storage.is_a?(Shrine::Storage::Url)

        super(headers: {...}, **options)

      else

        super

      end

    end

  end


I'm open to accepting a PR which adds a way (maybe in form of a plugin), to be able to dynamically set these options. Maybe default_open_options, mimicking the existing default_url_options plugin? Whatever you think is best.

-----

I also forgot to mention how we would actually store the headers, which we also discussed offline. I initially suggested storing it in metadata, but that doesn't feel entirely right, and would mean this information would get copied to the file on permanent storage, where it becomes irrelevant.

As you suggested, we could store it in the uploaded file data hash on the top level:

 photo.image_attacher.assign({ id: remote_url, storage: :remote_url, headers: { ... } }.to_json)


Then from the corresponding Shrine::UploadedFile instance you can access it like this:

 uploaded_file.data["headers"] #=> { ... }


Since only metadata gets copied when re-uploading to permanent storage, this won't stick around anywhere after promotion.

Kind regards,
Janko

--
You received this message because you are subscribed to the Google Groups "Shrine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine...@googlegroups.com.
To post to this group, send email to ruby-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ruby-shrine/6e09ebcb-32b1-4678-bacc-576dbbb7ac80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Janko Marohnić

unread,
Nov 15, 2018, 4:41:37 AM11/15/18
to Jonathan Rochkind, ruby-...@googlegroups.com
Jonathan,

You forgot to "Reply All", so I'll quote here what you said:

Thank you!

Before I got this reply, I took a completely different approach -- a plugin that actually allows the shrine uploader object to recognize multiple different storage keys as "cache", and thus be willing to let them be assigned and promoted. 

That actually works fine too.  I wonder what the pro's and con's are. 

I haven't tackled the "headers" issue yet though. 

My code isn't actually committed anywhere yet, but it is definitely working (with tests). 

custom plugin: https://gist.github.com/jrochkind/fd112d724a51f01e246aafcd94d7012f
tests: https://gist.github.com/jrochkind/c75b8cf9e5649a270a4576573a1e6df8

My approach was arrived at by basically figuring out what in shrine was _not_ letting me assign an UploadedFile that didn't match the `cache` storage, or _not_ letting such a thing be promoted... and then fixing it so it didn't. Only took a few lines of code, although they're weird ones. 

If you felt like it and had time, I'd be curious of your opinion of this approach vs the one you suggested.

PS: One thing I like about where I ended up, is with the plugin I arrived at, you can use the shrine model and attacher just like normal, no need for special assignment code like "photo.image_attacher(cache: :remote_url" in the controller. 

I like it when I can make it work without changing the API that callers interacting with the model/attacher/uploader have to use. 

Your approach looks interesting, I understand your way of thinking. However, I would recommend going one layer above, by overriding Shrine::Attacher#cached? instead of Shrine#uploaded?.

And now I also see why you didn't, because it's not being called everywhere. We should probably modify Attacher#cached? to accept an argument, so that we can reuse it for uploaded files that are not necessarily assigned to the attacher.

 def cached?(file = get)
    file && cache.uploaded?(file)
  end

  def stored?(file = get)
    file && store.uploaded?(file)
  end

Feel free to make that PR if you agree and you want to (I don't think we'd have to modify any tests for that change).

However, I don't think I would like to merge that functionality into Shrine core. I like the explicitness of specifying which cache storage do you want to use for the next assignment. That also has the added advantage that assignment of raw files will go to the specified cache storage as well. So, when remote_url plugin gets modified to use `Down.open` instead of `Down.download` for "lazy download", you would be able to simply assign the remote URL:

  Shrine.plugin :remote_url

  photo.image_attacher(cache: :remote_url)
  photo.image_remote_url = remote_url
  photo.image # "uploaded" to Shrine::Storage::Url

Kind regards,
Janko

Jonathan Rochkind

unread,
Nov 15, 2018, 8:38:12 AM11/15/18
to janko.m...@gmail.com, ruby-...@googlegroups.com
Thanks for correcting the "reply all"! And thanks for the feedback, much appreciated. 

So what you'd welcome into core is "modify Attacher#cached? to accept an argument" (yes, you noticed the same thing I did about how it's not used everywhere), but not a plugin that allows multiple cache storage keys?

To me it seems quite explicit that I am assigning a hash with a certain storage specified, and if it's a storage I specified in Uploader definition is allowed as cache, it should just be used as cache.  I think there could conceivably be other cases where people want multiple cache storages for different use cases, all of which get promoted to one store storage (always one store).  And it's very nice to have this be something you can configure on uploader/attacher, without having to use different API to assign cache data. 

But you've done great at shrine so far, so I'll defer to your judgement!

I will see about a PR on `cached?`... that alone is easy enough, but it ends up being time-consuming in that I have to get that PR in, then wait for it to be released in a shrine version, then re-write my plugin to use it. Software development is hard sometimes. 

Jonathan

Janko Marohnić

unread,
Nov 20, 2018, 6:33:05 PM11/20/18
to Jonathan Rochkind, ruby-...@googlegroups.com
I think there could conceivably be other cases where people want multiple cache storages for different use cases, all of which get promoted to one store storage (always one store).  And it's very nice to have this be something you can configure on uploader/attacher, without having to use different API to assign cache data.

I still think it's beneficial to explicitly declare which non-default cache storage you want to use. For example, let's say you're using shrine-url as your alternative cache storage, and you're receiving a URL from the user that you want to validate:

  remote_url = photo_params[:image_remote_url]
  if remote_url
    io = Down::Http.open(remote_url)
    validation_error! if io.size > 10.megabytes # io.size will return "Content-Length" header

    photo.image = {id: remote_url, storage: :remote_url}.to_json
  else
    photo.image = file
  end

If you were to allow both cache storages at any time, an attacker could in this case bypass the URL validation by sending the `{"id":"https://...","storage":"remote_url",...}` as the regular attachment parameter (e.g. writing it to the hidden attachment field). In that case it will get submitted as `params[:photo][:image]` instead of `params[:photo][:image_remote_url]`, and the assignment would still succeed, but your validations wouldn't be run because the `else` block would be the one that gets executed.

Because of these cases I would prefer the explicit version, because I think it would be safer. I think the extra line is worth it.

Notice that you don't have to change the code other than add a `photo.image_attacher(cache: :other_cache)` line before. For some reason I showed calling `Shrine::Attacher#assign` explicitly, but you can use regular `photo.image =` (that includes mass assignment as well).

Kind regards,
Janko

Jonathan Rochkind

unread,
Nov 21, 2018, 1:18:46 AM11/21/18
to Janko Marohnić, ruby-...@googlegroups.com
I still think it makes a lot of sense for my use case. But if it doesn't make sense for shrine, fortunately I can pretty easily write my own custom plugin to make it so (even easier after your suggestion for https://github.com/shrinerb/shrine/pull/319)

I am not sending a url in a separate param that I might be tempted to validate in the controller. Instead, I have a javascript front-end similar to the shrine uppy examples, that does use uppy, and sends back json hashes specifying cache-storage-location for the direct-uploaded files, just like the uppy example. 

The same front-end also allows the user to choose files from various cloud storage locations -- and sends back their selections also as json hashes specifying them, just the same. The remote cloud storage locations can basically be considered 'cache' for shrine, which is of course exactly what shrine-url makes so. 

So I've got a controller which can take these json-cache-location-hashes and assign them. It is incredibly convenient to be able to do this in very little code -- the controller can treat both kinds of input the same way. 

But I'm also not doing validation quite like that, I'm basically pushing it all into the background. If I _did_ want to validate in controller in foreground like that, since the controller's just getting hashes anyway, it'd just have to de-serialize the JSON and check the `storage` key to know what to do. Which isn't really that odd or hard to do, no harder than checking different param keys in your example. 

Alternately, I might consider a feature for shrine-url where you can set a max filesize when defining a shrine-url storage?

But... maybe I'm making a mistake. It makes sense to me and I want to try it out, but it's okay if you think it's too dangerous or mistaken for general use and shouldn't be included in shrine. The beauty of shrine is how composable/customizable it is so I can do it anyway myself. The danger of course is that my customization may end up breaking in future shrine, and my customization isn't tested by shrine CI.  I'm looking at shrine as an experiment to see how stable and forwards-compatible we can make a super flexible/composable attachment toolkit.  What you've done with shrine so far gives reasons to be optimistic, shrine's history is pretty stable and backwards-compat-maintaining. 

Jonathan

Janko Marohnić

unread,
Nov 24, 2018, 6:20:28 PM11/24/18
to Jonathan Rochkind, ruby-...@googlegroups.com
I still think it makes a lot of sense for my use case. But if it doesn't make sense for shrine, fortunately I can pretty easily write my own custom plugin to make it so (even easier after your suggestion for https://github.com/shrinerb/shrine/pull/319)

Ok, I'm glad that making a custom plugin suits your needs.

Alternately, I might consider a feature for shrine-url where you can set a max filesize when defining a shrine-url storage?

Filesize validation doesn't really belong in a storage IMO. I'm curious what did you have in mind?

What you've done with shrine so far gives reasons to be optimistic, shrine's history is pretty stable and backwards-compat-maintaining.

Thanks, I'm really glad to hear that :)

Kind regards,
Janko

jona...@dnil.net

unread,
Nov 25, 2018, 3:49:52 PM11/25/18
to Shrine
>  Filesize validation doesn't really belong in a storage IMO. I'm curious what did you have in mind?

Hmm, I see what you mean, but I don't really want to do an HTTP request in the controller (not even a HEAD request), because I don't want to do something potentially slow in the foreground -- I want to do it as part of promotion, where I am using `backgrounding`. 

`down` already has great maximum file size support (https://github.com/janko-m/down#maximum-size), which doesn't even rely only on content-type header exclusively, it'll actually cancel the download if the bytes streamed go above the limit. How might we hook into that feature in down that's already there? 

One way would be as mentioned putting it as configuration in the shrine-url Storage, even though it seems like the wrong layer... it might work out pretty fine practically, since the the cache can be assigned uploader-specifically, it can still become effectively an attribute of the uploader. (although note https://github.com/shrinerb/shrine/issues/310)

Another way might be using the same technique you recommended earlier in this thread for getting a shrine-url cache to have specified headers, overriding `promote` to do an early lazy `open` with headers. I suppose you could also do an open with a `down` max_size provided there too. 

Whitelisting URLs doesn't do any network IO, so there's no performance reason not to have that one in the controller. But if we have some validation in the uploader itself, why should we have to split other validation out into the controller instead, having our validation in two places?

Thinking about this, shrine-url is quite analogous to other kinds of 'direct upload', in that shrine (and your app in general) doesn't know anything reliable about the file until promotion.  Which, if you want to use backgrounding AND avoid a foreground fetch of the file (that is you do not want to use restore_cached_data) -- already has certain problems for validation with any direct upload case (you can't simply display errors to user because you're already in the bg).  As discussed in the other thread. (https://groups.google.com/forum/#!topic/ruby-shrine/EQ_3yq8u_hg

But these problems are really quite similar for shrine-url (with backgrounding), shrine-url as cache storage really does work much like a direct upload, once you get to promotion phase. In either case, you're dealing with promoting a file that exists on some storage, and the app hasn't gotten a chance to touch the file before promotion. (Again, in both cases, unless you use `restore_cached_data`, which you might not want to do when you're using backgrounding and trying to keep expensive IO in the bg). 

So... I'm not sure, but being forced to handle certain things in the foreground in controller doesn't seem quite right, when other validation is still possibly in the uploader (maybe preferably, because it requires "expensive" IO, such as metadata extraction, that you might want in backgrounding). 

This isn't actually an immediate need for my app, so I keep letting it marinate in our brains. 

But I feel like I want a solution that is backgrounding-friendly, that can apply to both shrine-url-specific validation as well as other more generic direct-upload validation, and that can use down's already existing fully reliable max-size argument when dealing with max size for shrine-url.  I think that's _some_ kind of validation integrated into promotion, which can handle both shrine-url and generic direct-upload cases (with more ordinary shrine storages), can handle max-size as well as metadata-based validation. 

Jonathan
Reply all
Reply to author
Forward
0 new messages