[confluence] MongoDB > GridFS Specification

6 views
Skip to first unread message

nor...@mongodb.onconfluence.com

unread,
Jan 16, 2013, 8:05:00 PM1/16/13
to mongodb...@googlegroups.com

nor...@mongodb.onconfluence.com

unread,
Jan 16, 2013, 8:05:00 PM1/16/13
to mongodb...@googlegroups.com

GridFS Specification

Page edited by Sam Kleinman


Changes (2)

{toc} {redirect:http://docs.mongodb.org/meta-driver/latest/legacy/gridfs-specification/}
h3. Introduction

GridFS is a storage specification for large objects in MongoDB. It works by splitting large object into small chunks, usually 256k in size. Each chunk is stored as a separate document in a {{chunks}} collection. Metadata about the file, including the filename, content type, and any optional information needed by the developer, is stored as a document in a {{files}} collection.

So for any given file stored using GridFS, there will exist one document in {{files}} collection and one or more documents in the {{chunks}} collection.

If you're just interested in using GridFS, see the docs on [storing files|DOCS:Storing Files]. If you'd like to understand the GridFS implementation, read on.

{dochub:gridfsspec}

h3. Specification

h5. Storage Collections

GridFS uses two collections to store data:

* {{files}} contains the object metadata
* {{chunks}} contains the binary chunks with some additional accounting information

In order to make more than one GridFS namespace possible for a single database, the files and chunks collections are named with a prefix. By default the prefix is {{fs.}}, so any default GridFS store will consist of collections named {{fs.files}} and {{fs.chunks}}. The drivers make it possible to change this prefix, so you might, for instance, have another GridFS namespace specifically for photos where the collections would be {{photos.files}} and {{photos.chunks}}.

Here's an example of the standard GridFS interface in Java:

{code}/*
* default root collection usage - must be supported
*/
GridFS myFS = new GridFS(myDatabase); // returns a default GridFS (e.g. "fs" root collection)
myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file into the "fs" GridFS store

/*
* specified root collection usage - optional
*/

GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns a GridFS where "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf")); // retrieves object whose filename is "smithco"
{code}

Note that the above API is for demonstration purposes only - this spec does not (at this time) recommend any API. See individual driver documentation for API specifics.

h6. {{files}}

Documents in the {{files}} collection require the following fields:

{code}
{
"_id" : <unspecified>, // unique ID for this file
"length" : data_number, // size of the file in bytes
"chunkSize" : data_number, // size of each of the chunks. Default is 256k
"uploadDate" : data_date, // date when object first stored
"md5" : data_string // result of running the "filemd5" command on this file's chunks
}
{code}

Any other desired fields may be added to the files document; common ones include the following:

{code}
{
"filename" : data_string, // human name for the file
"contentType" : data_string, // valid mime type for the object
"aliases" : data_array of data_string, // optional array of alias strings
"metadata" : data_object, // anything the user wants to store
}
{code}

Note that the _id field can be of any type, per the discretion of the spec implementor.

h6. {{chunks}}

The structure of documents from the {{chunks}} collection is as follows:

{code}{
"_id" : <unspecified>, // object id of the chunk in the _chunks collection
"files_id" : <unspecified>, // _id of the corresponding files collection entry
"n" : chunk_number, // chunks are numbered in order, starting with 0
"data" : data_binary, // the chunk's payload as a BSON binary type
}
{code}

Notes:
* The {{\_id}} is whatever type you choose. As with any MongoDB document, the default will be a BSON object id.
* The {{files_id}} is a foreign key containing the {{\_id}} field for the relevant {{files}} collection entry

h5. Indexes

GridFS implementations should create a unique, compound index in the {{chunks}} collection for {{files_id}} and {{n}}. Here's how you'd do that from the shell:

{code}
db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true});
{code}

This way, a chunk can be retrieved efficiently using it's {{files_id}} and {{n}} values:

{code}
cursor = db.fs.chunks.findOne({files_id: myFileID}).sort({n:1});
{code}

Full Content

Redirection Notice
This page should redirect to http://docs.mongodb.org/meta-driver/latest/legacy/gridfs-specification/.

Reply all
Reply to author
Forward
0 new messages