You can do this with a simple function in Java:
Mongo connection = new Mongo("localhost", 27017);
DB db = connection.getDB("test");
DBCollection myColl = db.getCollection("test");
DBCollection myNewColl = db.getCollection("NEW");
myNewColl.drop();
BasicDBObject query1 = new BasicDBObject("time", new
BasicDBObject("$lt", 1324039518));
DBCursor cursor = myColl.find(query1);
System.out.println(cursor.count());
DBObject current = new BasicDBObject();
while(cursor.hasNext()) {
current = cursor.next();
myNewColl.save(current);
}
In the above code, each document matching your query is being pulled
from the server and inserted into a new collection ("NEW").
This will obviously create a lot of extra network traffic. It is
logical to want to try to do the operation on the server. This can be
done with a very simple Map Reduce: (In fact, because the data is not
being aggregated in any way, the Reduce function is not even used.)
myNewColl = db.getCollection("NEW_2");
myNewColl.drop();
String m = "function(){emit(this._id, {humans:this.humans,
time:this.time, date:this.date})}";
String r = "function(key, values){return values}";
String Output = "NEW_2";
BasicDBObject query1 = new BasicDBObject("time", new
BasicDBObject("$lt", 1324039518));
myColl.mapReduce(m, r, Output, query1);
Unfortunately, due to the nature of MapReduce, the output collection
must be in the form of {"_id":some_id, "value":some_value}. The above
code will create a collection with documents that look like:
{ "_id" : 51, "value" : { "humans" : 9, "time" : 1324039008, "date" :
ISODate("2011-12-16T12:36:48Z") } }
The Mongo Documentation on Map Reduce may be found here:
http://www.mongodb.org/display/DOCS/MapReduce
Additionally, here is a link to a MongoDB cookbook recipe for a Map
Reduce operation. The "Extras" section goes into more detail on how
Map Reduce works, and why the output has to be in this format.
http://cookbook.mongodb.org/patterns/finding_max_and_min
If your collection is not sharded, you can use the db.eval() command
to execute the JavaScript on the server:
myNewColl = db.getCollection("NEW_3");
myNewColl.drop();
db.eval("function(){db.test.find({time:{$lt:
1325346084}}).forEach(function(t){db.NEW_3.save(t);});}");
This will create an output collection of the documents that match your
query, and all of the operations will be done on the server, reducing
network traffic. The only downside is that db.eval() cannot be used
with a sharded collection.
The Mongo Documentation on db.eval() is here:
http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-Using{{db.eval%28%29}}
Hopefully I understood your question correctly. Is this what you
wanted to do, or did you want to do some different operation on your
data?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Do you wish to calculate the total_humans with a "time" value at or
above the value that you give AND below the value that you give?
The following map function will emit the year, the time, and the
humans for each document in the collection.
The reduce function will sum all of the humans with a corresponding
"time" value before the value of time that is specified, and after the
value of time that is specified. In both cases the greatest value of
"time" will be returned with the value of "total_humans".
I am not sure this is exactly what you are looking for, but hopefully
it will be able to point you in the right direction, and at least get
you one step closer to achieving your goal.
This is the collection that I created to use as an example, stored in
the collection "humans"
> db.humans.save({ "_id" : 1, "humans" : 111, "time" : 1324052258, "date" : ISODate("2011-12-16T16:17:38Z") });
> db.humans.save({ "_id" : 2, "humans" : 112, "time" : 1324052238, "date" : ISODate("2011-12-16T16:17:18Z") });
> db.humans.save({ "_id" : 3, "humans" : 9, "time" : 1324052208, "date" : ISODate("2011-12-16T16:16:48Z") });
> db.humans.save({ "_id" : 4, "humans" : 22, "time" : 1324052168, "date" : ISODate("2011-12-16T16:16:08Z") });
> db.humans.save({ "_id" : 5, "humans" : 21, "time" : 1324052258, "date" : ISODate("2010-01-23T23:43:10Z") });
> db.humans.save({ "_id" : 6, "humans" : 32, "time" : 1324052238, "date" : ISODate("2010-01-23T23:26:10Z") });
> db.humans.save({ "_id" : 7, "humans" : 61, "time" : 1324052208, "date" : ISODate("2010-01-23T23:09:00Z") });
> db.humans.save({ "_id" : 8, "humans" : 36, "time" : 1324052168, "date" : ISODate("2010-01-23T22:51:40Z") });
I have chosen time = 1324052238 as the input value.
map = function () {
year = this.date.getFullYear()
emit(year, {"total_humans":this.humans, "time":this.time});
}
reduce = function (key, values) {
print("Reducing " + key);
population = [];
total_humans_before = 0;
max_time_before = 0;
total_humans_after = 0;
max_time_after = 0;
for(i in values){
if(values[i].time < 1324052238){
total_humans_before = total_humans_before +
values[i].total_humans;
if(values[i].time > max_time_before){
max_time_before = values[i].time;
};
}
else {
total_humans_after = total_humans_after +
values[i].total_humans;
if(values[i].time > max_time_after){
max_time_after = values[i].time;
};
};
};
population.push({"total_humans":total_humans_before,
"time":max_time_before});
population.push({"total_humans":total_humans_after,
"time":max_time_after});
return {"population":population};
}
> result = db.runCommand({"mapreduce" : "humans","map" : map,"reduce" : reduce,"out" : "humans_output"})
> db.humans_output.find().pretty()
{
"_id" : 2010,
"value" : {
"population" : [
{
"total_humans" : 97,
"time" : 1324052208
},
{
"total_humans" : 53,
"time" : 1324052258
}
]
}
}
{
"_id" : 2011,
"value" : {
"population" : [
{
"total_humans" : 31,
"time" : 1324052208
},
{
"total_humans" : 223,
"time" : 1324052258
}
]
}
}
>
Now on to your second question: Is it faster to perform the operation
on your server via Map/Reduce, or is it faster to perform the
operation from your client:
In a nutshell, there is a trade-off. Doing the operation on your
server via Map Reduce will reduce the amount of traffic on your
network. However, it will slow down your Mongo server. On the other
hand, if you perform the operation in your client, the processing load
will be taken off of your Mongo server, but there will be added
network traffic because each document will have to be retrieved from
your server, sent to the client, and then each document in the output
collection will have to be sent back to the server.
Additionally, if you would like your output collection to be of a form
other than {"_id" : some_id, "value" : some_value}, then you probably
want to do the operation in your client.
If you have any follow-up questions concerning Map Reduce, the Mongo
Community is here to help. Good luck!