Auto-delete the whole collection after a certain amount of time

1,720 views
Skip to first unread message

Vladimir Smirnov

unread,
Jun 17, 2014, 2:05:44 PM6/17/14
to mongod...@googlegroups.com
Hello,

I have a tool that records JSON messages as they become available and stores them in Mongo DB. The collections it creates are specific to the type of the messages and generally span a two-hour window (there can be several concurrent collections at the same time and it is important to distinguish between them depending on the messages that come in). 

I only want to store the messages for 4 days, so I set TTL on every collection and the documents expire correctly after 4 days. However, the collections remain in the DB and they are not ever going to be reused, so what I am left with is a ton of empty collections that have no documents and are essentially cluttering the database. In addition to that, the disk space reserved by these collections is never released, thus after a few days/weeks, I am all out of space and have to manually clean up.

I am wondering if there is a way to:
  1. Set collections to be deleted after a given amount of time
  2. Have a daily job that deletes empty collections
  3. Have MongoDB to release disk space after the documents in a collection have expired
Any of the three solutions would solve the problems I am having; however I would much prefer to have the collections deleted after they are no longer in use.

I couldn't find anything in MongoDB documentation that would address that issue, so any help would be greatly appreciated.

Thanks so much in advance.

Victor Hooi

unread,
Jun 18, 2014, 12:07:22 AM6/18/14
to mongod...@googlegroups.com
Hi Vladimir,

There is no way to TTL or automatically drop a collection in MongoDB.

The best way of automatically removing a collection after a set period would be a scheduled job outside of MongoDB - for example, through a system job (e.g. cron or Windows Task Scheduler) or a job within your application.

Regarding your disk space, when documents are dropped, the disk space will not be returned to the filesystem. However, that disk space can be reused by MongoDB, depending on the size of of the new document and the level of fragmentation. You can also use db.repairDatabase() to reclaim the free space in your MongoDB database and return it to the filesystem, however, this requires free disk space equal to the current on-disk size of the database.

Another way of approaching this problem might be to create databases based on dates, for example:
  • cats_20140101
  • cats_20140102
  • cats_20140103
  • cats_20140104
You could then have a script to automatically drop databases based on dates (e.g. older than 4 days ago), which would also return the disk space afterwards.

Regards,
Victor

s.molinari

unread,
Jun 18, 2014, 9:46:30 AM6/18/14
to mongod...@googlegroups.com
Would it not be feasible to simply not start new collections and instead just keep the data going to the same collection per message type? Your two hour window could be a single document, depending on the amount of data of course.

Scott


r...@jsonar.com

unread,
Jun 18, 2014, 10:42:36 AM6/18/14
to mongod...@googlegroups.com
Take a look at the aggregation framework jsonstudio.com offers. I think it addresses your issue.  

Vladimir Smirnov

unread,
Jun 18, 2014, 11:32:31 AM6/18/14
to mongod...@googlegroups.com
Thanks for the reply Victor. 

I will look into scheduling a daily job which will run through all the collections in the database and delete collections for which .stats().count returns 0, that should do the trick for now.

Scott, unfortunately I cannot store my data in this manner due to the domain and the requirements I have got. It is important that the structure of the database & collections does not change.

Cheers,
Vladimir
Reply all
Reply to author
Forward
0 new messages