Difference between GridFS metadata and collection metadata retrieval

Christian Kaps

unread,

Feb 26, 2018, 4:14:52 AM2/26/18

to ReactiveMongo - http://reactivemongo.org

Hi,

In my app I serve files from GridFS. Every file contains also some metadata. Now I have two possibilities to serve the metadata:

1. Serve it directly with the collection API

2. Serve it with the GridFS API

My idea was it to unify the model and the access to the file content as also the metadata. So I thought I create one model with the file specific data (md5, filename, ...) and the metadata. The model gets also a "content" property from type Option[Enumerator[Array[Byte]]].

The issue that I have is that sometimes I need only the metadata to show it in a table. So, If I use the GridFS API to query all the data, is there a lot of overhead if I do not use the content? The content is of type Enumerator[Array[Byte]], so there should not be more overhead if I do not consume the file, right? If there is more overhead, then I think it would be better to separate the file content and metadata retrieval, right?

And information would be highly appreciated.

Best regards,

Christian

Cédric Chantepie

unread,

Feb 26, 2018, 5:23:26 AM2/26/18

to ReactiveMongo - http://reactivemongo.org

GridFS is using separate collections, for metadata (files) and chunks.

Using the API you can find the ReadFile corresponding to the metadata, and then fetch the corresponding data.

Christian Kaps

unread,

Feb 26, 2018, 6:18:48 AM2/26/18

to ReactiveMongo - http://reactivemongo.org

Hi,

Sorry for not being precise enough. I know that GridFS uses different collections and I know how to use the APIs. My question is, if it makes sense to use the following query also to only fetch the metadata? The query uses the fs.enumerate method. Does this produce more overhead (in sense of data that will be read from the chunks collection) on the server or the client side? As far as I can see, it will not produce any content as long if I consume the enumerator, right?

override def findByID[M <: FileMetadata](id: BSONObjectID)(
  implicit
  metadataReads: BSONReader[BSONDocument, M]
): Future[Option[UploadedFile[M]]] = {
  import reactivemongo.api.gridfs.Implicits._
  gridFS.flatMap { fs =>
    fs.find(BSONDocument(
      "_id" -> id
    )).headOption.map(_.map { file =>
      UploadedFile[M](
        id = file.id.as[BSONObjectID],
        filename = file.filename,
        chunkSize = Some(file.chunkSize),
        length = Some(file.length),
        uploadDate = file.uploadDate.map(Instant.ofEpochMilli),
        contentType = file.contentType,
        md5 = file.md5,
        content = fs.enumerate(file),
        metadata = file.metadata.as[M]
      )
    })
  }
}

Or should I create an extra query, to fetch only the metadata from the respective "files" collection?

I hope now it's more clear.

Best regards,

Christian

Cédric Chantepie

unread,

Feb 27, 2018, 3:24:02 AM2/27/18

to ReactiveMongo - http://reactivemongo.org

Enumerate will at least need to find the first chunk. Better to keep call to that only on demand/lazy.

Reply all

Reply to author

Forward