gridfs - space ever re-used?

308 views
Skip to first unread message

jb

unread,
Jan 18, 2012, 12:00:28 PM1/18/12
to mongodb-user
I'm seeing my data on disk grow continuously and it's looking like
gridfs is the culprit. This is the same on primary and secondaries:

SECONDARY> db.fs.chunks.dataSize()
18716239308
SECONDARY> db.fs.chunks.storageSize()
93987114400

reading:

http://www.mongodb.org/display/DOCS/Excessive+Disk+Space

indicates space is never freed back to the OS (which is fine) but is
instead re-used. My continuous increase in space seems to indicate
this isn't the case, and the dataSize() vs. storageSize() seems to
indicate this as well. I'm using gridFS in a manner that there are
lots of writes of, but data is freed up quickly by a continuous delete
process so the storageSize() above shouldn't be. The dataSize looks
about correct.

Am I missing something?

Eliot Horowitz

unread,
Jan 18, 2012, 6:13:45 PM1/18/12
to mongod...@googlegroups.com
Its hard to tell from a static sample if space is being re-used.
If you add new files now, what happens?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Jeff Behl

unread,
Jan 19, 2012, 1:00:28 AM1/19/12
to mongod...@googlegroups.com
Here's a graph of disk space over the last week:

image.png

I recently added some metrics concerning the number of backup jobs in gridfs:

image.png


So at least in this time frame the  number of jobs has stayed consistent.  Obviously this isn't definitive as the presence of a job doesn't mean all the files were correctly deleted, but I'm pretty sure this is the case else the disparity between db.fs.chunks.dataSize and db.fs.chunks. storageSize wouldn't be so big.

I'll add additional gridfs metrics for a count of fs.chunks and fs.files, as well as dataSize and storageSize.  This should pain a definitive picture, though again, just based on the disparity between dataSize and storageSize, I'd say space is definitely not being reused.  Myabe something to do with chunks all being the same size?  I could try compacting...

Thanks -  Jeff
image.png
image.png

Scott Hernandez

unread,
Jan 19, 2012, 8:48:30 AM1/19/12
to mongod...@googlegroups.com
Are you deleting old (gridgfs) files? If so, how and when?

Depending on the size of the deleted files and size of the new ones it is possible that the space freed internally cannot be used well, and more must be allocated. Are all your files the same size?
image.png
image.png

jb

unread,
Jan 19, 2012, 11:51:34 AM1/19/12
to mongodb-user
Correct, I am deleting old gridfs files. Each backup job consists of
a couple thousand files which are generally a couple hundred KBs.
They're not all the same size, but petty close. After one backup job
has completed, all the files from the old are deleted within a few
minutes. In the mean time, other backup jobs are backing up files.

I'll get some metrics going on the fs.files and fs.chunks counts.
This should shed some light on things, but I'm guessing that what you
are saying is the case..
>  image.png
> 72KViewDownload
>
>  image.png
> 88KViewDownload

Scott Hernandez

unread,
Jan 19, 2012, 12:26:52 PM1/19/12
to mongod...@googlegroups.com
Yeah, stats will help. It has also been seen where people think they
are deleting gridfs files but weren't really. The remover/delete will
not error if you use an invalide or wrong _id/query for example.

Jeff Behl

unread,
Jan 20, 2012, 8:35:25 PM1/20/12
to mongod...@googlegroups.com
From the monitoring I put in yesterday it's looking pretty clear that space is not being re-used.  The two below graphs are:

 db.fs.chunks.dataSize() db.fs.chunks.storageSize()

image.png

and for count() on fs.chunks and fs.files

image.png

Unless someone has a better idea i'm going to try and compact/defrag to see if that has any effect.  In any case, this is definitely going to present problems with my intended use of GridFS...


On Wed, Jan 18, 2012 at 3:13 PM, Eliot Horowitz <el...@10gen.com> wrote:
image.png
image.png

Eliot Horowitz

unread,
Jan 20, 2012, 9:49:20 PM1/20/12
to mongod...@googlegroups.com
Can you describe the files you are storing a bit?
What's the size distribution for example?
image.png
image.png

Jeff Behl

unread,
Jan 21, 2012, 11:43:46 AM1/21/12
to mongod...@googlegroups.com
I did a compact on fs.chunks

image.png
which resulted in a large decrease in storageSize() (see attached graph).   Space on disk is no longer increasing so it seems it's something to do with fragmentation?
image.png
image.png
image.png

jb

unread,
Jan 21, 2012, 11:47:43 AM1/21/12
to mongodb-user
Files are generally pretty small, ranging form a few KBs to 300KB. If
it helps I can take the time and get a more detailed breakdown..

On Jan 20, 6:49 pm, Eliot Horowitz <el...@10gen.com> wrote:
> Can you describe the files you are storing a bit?
> What's the size distribution for example?
>
>
>
>
>
>
>
> On Fri, Jan 20, 2012 at 8:35 PM, Jeff Behl <jeffb...@gmail.com> wrote:
> > From the monitoring I put in yesterday it's looking pretty clear that
> > space is not being re-used.  The two below graphs are:
>
> >  db.fs.chunks.dataSize() db.fs.chunks.storageSize()
>
> > [image: image.png]
>
> > and for count() on fs.chunks and fs.files
>
> > [image: image.png]
>
> > Unless someone has a better idea i'm going to try and compact/defrag to
> > see if that has any effect.  In any case, this is definitely going to
> > present problems with my intended use of GridFS...
>
> > On Wed, Jan 18, 2012 at 3:13 PM, Eliot Horowitz <el...@10gen.com> wrote:
>
> >> Its hard to tell from a static sample if space is being re-used.
> >> If you add new files now, what happens?
>
>  image.png
> 74KViewDownload
>
>  image.png
> 61KViewDownload

Eliot Horowitz

unread,
Jan 21, 2012, 8:29:03 PM1/21/12
to mongod...@googlegroups.com
Its probably fragmentation - not sure why its so bad in this case.

jb

unread,
Jan 21, 2012, 11:26:15 PM1/21/12
to mongodb-user
Seems like what I'm seeing is a known issue:

https://jira.mongodb.org/browse/SERVER-2958

I'm surprised it's not been something more folks have been noticing.
I'll try running the 'cleanup' job (responsible for deletes) less
frequently though this is just a shot in the dark..

Chase

unread,
Jan 24, 2012, 11:13:21 AM1/24/12
to mongodb-user
Hi - I have posted my original fix to the following github branch.
https://github.com/cwolfinger/mongo in the 2.0 branch. There is a a
new option called --allocateFixedSizes which solves the fragmentation
issue. You sacrifice some memory since it rounds up the memory size
to the next whole chunk but it is definitely worth it.

jb

unread,
Jan 27, 2012, 11:31:30 AM1/27/12
to mongodb-user
That's great chase, I will check it out. Thanks much. Hopefully this
will eventually make it into the mainline release.

On Jan 24, 8:13 am, Chase <chase.wolfin...@gmail.com> wrote:
> Hi - I have posted my original fix to the following github branch.https://github.com/cwolfinger/mongo in the 2.0 branch.  There is a a
Reply all
Reply to author
Forward
0 new messages