| From: | MongoDB > Writing Drivers and Tools |
| To: | MongoDB > Old Pages |
| {toc} {redirect:http://docs.mongodb.org/meta-driver/latest/legacy/gridfs-specification/} |
| h3. Introduction GridFS is a storage specification for large objects in MongoDB. It works by splitting large object into small chunks, usually 256k in size. Each chunk is stored as a separate document in a {{chunks}} collection. Metadata about the file, including the filename, content type, and any optional information needed by the developer, is stored as a document in a {{files}} collection. So for any given file stored using GridFS, there will exist one document in {{files}} collection and one or more documents in the {{chunks}} collection. If you're just interested in using GridFS, see the docs on [storing files|DOCS:Storing Files]. If you'd like to understand the GridFS implementation, read on. {dochub:gridfsspec} h3. Specification h5. Storage Collections GridFS uses two collections to store data: * {{files}} contains the object metadata * {{chunks}} contains the binary chunks with some additional accounting information In order to make more than one GridFS namespace possible for a single database, the files and chunks collections are named with a prefix. By default the prefix is {{fs.}}, so any default GridFS store will consist of collections named {{fs.files}} and {{fs.chunks}}. The drivers make it possible to change this prefix, so you might, for instance, have another GridFS namespace specifically for photos where the collections would be {{photos.files}} and {{photos.chunks}}. Here's an example of the standard GridFS interface in Java: {code}/* * default root collection usage - must be supported */ GridFS myFS = new GridFS(myDatabase); // returns a default GridFS (e.g. "fs" root collection) myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file into the "fs" GridFS store /* * specified root collection usage - optional */ GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns a GridFS where "contracts" is root myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf")); // retrieves object whose filename is "smithco" {code} Note that the above API is for demonstration purposes only - this spec does not (at this time) recommend any API. See individual driver documentation for API specifics. h6. {{files}} Documents in the {{files}} collection require the following fields: {code} { "_id" : <unspecified>, // unique ID for this file "length" : data_number, // size of the file in bytes "chunkSize" : data_number, // size of each of the chunks. Default is 256k "uploadDate" : data_date, // date when object first stored "md5" : data_string // result of running the "filemd5" command on this file's chunks } {code} Any other desired fields may be added to the files document; common ones include the following: {code} { "filename" : data_string, // human name for the file "contentType" : data_string, // valid mime type for the object "aliases" : data_array of data_string, // optional array of alias strings "metadata" : data_object, // anything the user wants to store } {code} Note that the _id field can be of any type, per the discretion of the spec implementor. h6. {{chunks}} The structure of documents from the {{chunks}} collection is as follows: {code}{ "_id" : <unspecified>, // object id of the chunk in the _chunks collection "files_id" : <unspecified>, // _id of the corresponding files collection entry "n" : chunk_number, // chunks are numbered in order, starting with 0 "data" : data_binary, // the chunk's payload as a BSON binary type } {code} Notes: * The {{\_id}} is whatever type you choose. As with any MongoDB document, the default will be a BSON object id. * The {{files_id}} is a foreign key containing the {{\_id}} field for the relevant {{files}} collection entry h5. Indexes GridFS implementations should create a unique, compound index in the {{chunks}} collection for {{files_id}} and {{n}}. Here's how you'd do that from the shell: {code} db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true}); {code} This way, a chunk can be retrieved efficiently using it's {{files_id}} and {{n}} values: {code} cursor = db.fs.chunks.findOne({files_id: myFileID}).sort({n:1}); {code} |
| Redirection Notice This page should redirect to http://docs.mongodb.org/meta-driver/latest/legacy/gridfs-specification/. |