Re: Deleting Data That Has Been Processed

17 views
Skip to first unread message

Stephen Steneker

unread,
Oct 7, 2012, 7:58:04 PM10/7/12
to mongod...@googlegroups.com

I am writing a batch job to aggregate some data using the aggregation framework. Since the data output is potentially larger than the document limit size, I am using a $limit in the top of my pipeline to reduce the number of objects that are processed at a time. After the aggregation is complete, I save off the result in another collection. Now, I would like to remove all the records I have processed so far. how would I reliably do this without having to worry about race conditions? Or is there a better way to go about what I am trying to do?


Hi Toby,

This question has also been discussed on StackOverflow:

As suggested in the comments there, the expected approach would be to use a deterministic boundary such as date or id values to filter rather than relying on $limit.

Cheers,
Stephen

viC

unread,
Oct 8, 2012, 6:54:34 AM10/8/12
to mongod...@googlegroups.com
In case you don't have a criteria to separate them, try this:

    1> rename your collection
    
    2> Now do all your processing on the new collection.

*Warning: It doesn't work well in Sharded environments*

On Thursday, 4 October 2012 18:52:37 UTC+5:30, Toby Ho wrote:
Hello all,

I am writing a batch job to aggregate some data using the aggregation framework. Since the data output is potentially larger than the document limit size, I am using a $limit in the top of my pipeline to reduce the number of objects that are processed at a time. After the aggregation is complete, I save off the result in another collection. Now, I would like to remove all the records I have processed so far. how would I reliably do this without having to worry about race conditions? Or is there a better way to go about what I am trying to do?

Thanks,
Toby
Reply all
Reply to author
Forward
0 new messages