Empty MIME type using :mime_types analyzer when Tempfile instance is returned from process block

736 views
Skip to first unread message

David Montesdeoca

unread,
Jul 2, 2018, 6:15:49 AM7/2/18
to Shrine
Hi Janko!

Currently my uploader is as follows (only include relevant parts):

class FileUploader < Shrine
  plugin :determine_mime_type, analyzer: -> (io, analyzers) do
    mime_type = analyzers[:file].call(io)
    mime_type = analyzers[:mime_types].call(io) if mime_type == 'text/plain'
    mime_type
  end

  process(:store) do |io, context|
    if context[:record].html?
      ManipulateContent.new(io.download).call
    elsif io.width
      { original: io, thumb: thumb_version(io) }
    end
  end
end

Within the process block, I'm manipulating given record when it's an HTML file using a service object that adds some content using Nokogiri and then return a Tempfile instance.

If I use above configuration for :determine_mime_type plugin, I get 'text/html' as MIME type. However, if I change configuration and only use :mime_types as analyzer, I get an empty MIME type.

plugin :determine_mime_type, analyzer: :mime_types

I know that documentation says that :mime_types analyzer is not guaranteed to return the actual MIME type of the file, but I usually get better results when I use it as analyzer, except for this case and for mp4 files.

Am I missing something? Is there a way to return the correct MIME type?

Kind regards,
David

Janko Marohnić

unread,
Jul 2, 2018, 3:11:40 PM7/2/18
to David Montesdeoca, Shrine
Hi David,

The :mime_types analyzer uses the mime-types gem to determine the MIME type from filename extension. You might expect that creating a Tempfile with

  Tempfile.new("some-basename.html", binmode: true)

will create a file with a .html extension, but it won't – Tempfile treats this whole value as the basename after which it will append a random string, so the filename still won't have any extension, as you can see when you inspect the result:

  #<File:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/some-basename.html20180702-67135-jt4sfe>

To create a Tempfile with the desired extension, you need to pass an array with the desired extension as the second element:

  Tempfile.new(["some-basename", ".html"])
  #=> #<File:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/some-basename20180702-67712-1ep41k.html>

Then the :mime_types analyzer will be able to properly detect the MIME type.

Kind regards,
Janko


Advertencia legal: Este mensaje y, en su caso, los ficheros anexos son confidenciales, especialmente en lo que respecta a los datos personales, y se dirigen exclusivamente al destinatario referenciado. Si usted no lo es y lo ha recibido por error o tiene conocimiento del mismo por cualquier motivo, le rogamos que nos lo comunique por este medio y proceda a destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar, archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo ello bajo pena de incurrir en responsabilidades legales. El emisor no garantiza la integridad, rapidez o seguridad del presente correo, ni se responsabiliza de posibles perjuicios derivados de la captura, incorporaciones de virus o cualesquiera otras manipulaciones efectuadas por terceros.

Disclaimer: This message and any attached files transmitted with it, is confidential, especially as regards personal data. It is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient and have received this information in error or have accessed it for any reason, please notify us of this fact by email reply and then destroy or delete the message, refraining from any reproduction, use, alteration, filing or communication to third parties of this message and attached files on penalty of incurring legal responsibilities. The sender does not guarantee the integrity, the accuracy, the swift delivery or the security of this email transmission, and assumes no responsibility for any possible damage incurred through data capture, virus incorporation or any manipulation carried out by third parties.

--
You received this message because you are subscribed to the Google Groups "Shrine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.
To post to this group, send email to ruby-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ruby-shrine/3c2de98f-2ec6-4410-a3d4-14e16af757d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Montesdeoca

unread,
Jul 9, 2018, 9:31:01 AM7/9/18
to Shrine
Hi Janko!

I cannot (or I don't know how to do it) specify the name of the Tempfile once it is already created (it comes from process block => io.download).

However, it works simply renaming the file, adding the .html extension, after the file has been manipulated.

Once again, thanks for your priceless help.

To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine...@googlegroups.com.

Janko Marohnić

unread,
Jul 9, 2018, 1:38:59 PM7/9/18
to David Montesdeoca, Shrine
I cannot (or I don't know how to do it) specify the name of the Tempfile once it is already created (it comes from process block => io.download).
 
However, it works simply renaming the file, adding the .html extension, after the file has been manipulated.

When manipulating a file, I would recommend always writing the results to a new file, rather than modifying the original in-place. That's just from my experience when maintaining the ImageProcessing gem, that's why in version 1.0 I removed in-place processing.

So, in your code you could have MainpulateContent create a new Tempfile and write result data in it, and use `io.download` with a block so that the original tempfile is automatically deleted aferwards.

  # in your uploader
  io.download { |original| ManipulateContent.new(original).call }

  # path/to/manipulate_content.rb
  class ManipulateContent
    def new(file)
      @file = file
    end

    def call
      tempfile = Tempfile.new(["manipulated", ".html"], binmode: true)

      # ... generate new content ...

      tempfile.write(maniuplated_content)
      tempfile.open # re-open the Tempfile, which is like #flush/#fsync + #rewind
      tempfile
    end
  end

Once again, thanks for your priceless help.

You're welcome, I'm happy to help :)

Kind regards,
Janko

To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.

To post to this group, send email to ruby-...@googlegroups.com.

David Montesdeoca

unread,
Jul 10, 2018, 4:22:20 AM7/10/18
to Shrine
Wow, Janko, it works like a charm!

However, I wonder why is necessary to re-open the file after is has been written. I have tested without that line and it seems to work fine too.

Can you explain me why it would be necessary, please?

Thanks.

Kind regards,
David

Janko Marohnić

unread,
Jul 10, 2018, 7:53:16 AM7/10/18
to David Montesdeoca, Shrine
Re-opening the Tempfile effectively does two things:
  • gives you a rewinded Tempfile
  • flushes any buffered content to disk
Some Shrine storages, when given a File/Tempfile object to upload, they will just take the path to the file and create their own File object from it (e.g. the S3 storage uses this to achieve multipart uploads). In that case it doesn't matter whether the file object was rewinded, but it matters if written content was flushed to disk. When you write content to a Ruby File object, Ruby will buffer received content so that it can write it to disk in larger chunks (write("abc") is more efficient than write("a") + write("b") + write("c")).

When you're done writing content and you don't explicitly tell Ruby to flush it to disk (via File#flush), it's not guaranteed that the content would be written to disk. That means the new File object that the storage opened might not have all the data. Additionally, the OS also has its own buffer, so to really make ensure content was written to disk you should actually call File#fsync (which will raise NotImplementedError on OS-es that don't support it). See refile#253.

On the other hand, if the storage or other Shrine components treat the File/Tempfile object just as any IO object and interact with it using #read, if you've just written the content and didn't #rewind the File aftewards, calling #read on it will just return empty data, because you're at EOF.

So, to be safe, you should both flush content to disk and rewind it. In my opinion the easiest way to do both is just to re-open the file. This also solves another problem not solvable with neither #rewind nor #fsync, which is if another process modifies the file on disk in place. If after this you still use the same file descriptor as before the processing, you could get weird errors. For example, the following script raises an Aws::S3::Errors::XAmzContentSHA256Mismatch exception, and it goes away if I re-open the Tempfile after the `convert` command; when I got this error for the first time was when I started using and recommending this idiom.

require "aws-sdk-s3"
require "down"

client = Aws::S3::Client.new(
  access_key_id:     ENV.fetch("S3_ACCESS_KEY_ID"),
  secret_access_key: ENV.fetch("S3_SECRET_ACCESS_KEY"),
  region:            ENV.fetch("S3_REGION"),
)


system "mogrify -resize 300x300 #{image.path}"

# image.open

client.put_object(
  bucket: ENV.fetch("S3_BUCKET"),
  key:    "foo",
  body:   image,
)

I totally understand that it might seem strange at first, but hopefully these examples explain the reasons :)

Kind regards,
Janko

To unsubscribe from this group and stop receiving emails from it, send an email to ruby-shrine+unsubscribe@googlegroups.com.

To post to this group, send email to ruby-...@googlegroups.com.

David Montesdeoca

unread,
Jul 11, 2018, 9:23:27 AM7/11/18
to Shrine
Thanks for your great explanation, Janko. Understood!

Kind regards,
David

Reply all
Reply to author
Forward
0 new messages