How to update a field with the value of another field (by using map/reduce) in java - using mongodb 1.8

755 views
Skip to first unread message

Martin Grotzke

unread,
Mar 17, 2011, 5:27:48 AM3/17/11
to mongod...@googlegroups.com
Hi,

-- at first: congrats to the mongodb team to the 1.8 release, great job! --

Now my question: I want to set the value of field1 (e.g. oldVersion)
to the value of field2 (e.g. newVersion) using a server-side update,
so that I don't have to pull all docs, modify it and save it back.

With mongodb 1.6 I used a solution using map/reduce:
https://gist.github.com/865065

Right now I'm trying to get this running with mongodb 1.8, but as the
finalize function can no longer access the db this solution doesn't
work anymore.
I also played with output options "reduce" and "merge" but didn't get
it to work.

Is it possible to achieve what I want with 1.8 and map/reduce?
Or would you suggest choosing another way?

Thanx && cheers,
Martin

Nat

unread,
Mar 17, 2011, 9:22:19 AM3/17/11
to mongodb-user
You aren't supposed to perform database operations inside M/R
functions. For merge, reduce option, see a nice blog article from Kyle
at http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

On Mar 17, 5:27 pm, Martin Grotzke <martin.grot...@googlemail.com>
wrote:

Gaetan Voyer-Perrault

unread,
Mar 17, 2011, 12:14:13 PM3/17/11
to mongod...@googlegroups.com
> Now my question: I want to set the value of field1 (e.g. oldVersion)
> to the value of field2 (e.g. newVersion) using a server-side update,
> so that I don't have to pull all docs, modify it and save it back.

There is a whole page in the docs on server-side javascript.

Is there something missing from those docs?

Likewise, how much data needs to be processed here? 
How do you plan to track progress? 
Recover from any failures during the process?

On the one hand, I can understand that you don't want to send a bunch of data across the network just to update it. On the other hand, if you're running updating millions of documents, the process may take a while and trying to do it server side means that you don't get any progress information.

- Gates


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Martin Grotzke

unread,
Mar 17, 2011, 12:30:11 PM3/17/11
to mongod...@googlegroups.com, Nat
On Thu, Mar 17, 2011 at 2:22 PM, Nat <nat....@gmail.com> wrote:
> You aren't supposed to perform database operations inside M/R
> functions. For merge, reduce option, see a nice blog article from Kyle
> at http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

Great, just this, tried again, and on the mongo shell it's working
fine with this:

> m = function() { if (this.processedVersion < this.newVersion) { emit(this._id, this); } }
> r = function(key, values) { values[0].processedVersion = values[0].newVersion; return values[0]; }
> db.things.mapReduce(m, r, { out : { reduce : "things" } });

Thanx && cheers,
Martin


>
> On Mar 17, 5:27 pm, Martin Grotzke <martin.grot...@googlemail.com>
> wrote:
>> Hi,
>>
>> -- at first: congrats to the mongodb team to the 1.8 release, great job! --
>>
>> Now my question: I want to set the value of field1 (e.g. oldVersion)
>> to the value of field2 (e.g. newVersion) using a server-side update,
>> so that I don't have to pull all docs, modify it and save it back.
>>
>> With mongodb 1.6 I used a solution using map/reduce:https://gist.github.com/865065
>>
>> Right now I'm trying to get this running with mongodb 1.8, but as the
>> finalize function can no longer access the db this solution doesn't
>> work anymore.
>> I also played with output options "reduce" and "merge" but didn't get
>> it to work.
>>
>> Is it possible to achieve what I want with 1.8 and map/reduce?
>> Or would you suggest choosing another way?
>>
>> Thanx && cheers,
>> Martin
>

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

--
Martin Grotzke
http://www.javakaffee.de/blog/

Martin Grotzke

unread,
Mar 17, 2011, 12:50:16 PM3/17/11
to mongod...@googlegroups.com
On Thu, Mar 17, 2011 at 5:14 PM, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
>> Now my question: I want to set the value of field1 (e.g. oldVersion)
>> to the value of field2 (e.g. newVersion) using a server-side update,
>> so that I don't have to pull all docs, modify it and save it back.
> There is a whole page in the docs on server-side javascript.
> http://www.mongodb.org/display/DOCS/Server-side+Code+Execution
> Is there something missing from those docs?
No, I think the docs are fine.

> Likewise, how much data needs to be processed here?

I expect some 1.000, 10.000 but not more than 100.000 or 200.000
objects to be processed as the map/reduce will get a query that
reduces it to s.th. like this.

> How do you plan to track progress?

I thought it would be possible to start this and wait until it's
finished. Is s.th. special necessary from java, or is it enough to use
DBCollection.mapReduce(DBObject)?

> Recover from any failures during the process?

When my call is blocking I should get an error when anything goes wrong.

> On the one hand, I can understand that you don't want to send a bunch of
> data across the network just to update it. On the other hand, if you're
> running updating millions of documents, the process may take a while and
> trying to do it server side means that you don't get any progress
> information.

I don't need progress info, it's done by a server that processes some
data. It's enough to see that it's finished at all sometimes.

I wanted to compare both strategies performance wise so that I can
choose one of them.

Cheers,
Martin

--
Martin Grotzke
http://www.javakaffee.de/blog/

Martin Grotzke

unread,
Mar 18, 2011, 11:47:18 AM3/18/11
to mongod...@googlegroups.com
On Thu, Mar 17, 2011 at 5:30 PM, Martin Grotzke
<martin....@googlemail.com> wrote:
> On Thu, Mar 17, 2011 at 2:22 PM, Nat <nat....@gmail.com> wrote:
>> You aren't supposed to perform database operations inside M/R
>> functions. For merge, reduce option, see a nice blog article from Kyle
>> at http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
>
> Great, just this, tried again, and on the mongo shell it's working
> fine with this:
>
>> m = function() { if (this.processedVersion < this.newVersion) { emit(this._id, this); } }
>> r = function(key, values) { values[0].processedVersion = values[0].newVersion; return values[0]; }
>> db.things.mapReduce(m, r, { out : { reduce : "things" } });

I just wanted to pull this back to java and now when testing this
again I realized that it doesn't work, neither with merge nor with
reduce.

In detail:
> db.things.save( { processedVersion : 1, newVersion : 2 } );
> db.things.save( { processedVersion : 2, newVersion : 2 } );

> m = function() { if (this.processedVersion < this.newVersion) { emit(this._id, this); } }
> r = function(key, values) { values[0].processedVersion = values[0].newVersion; return values[0]; }

With merge:
> db.things.mapReduce(m, r, { out : { merge : "things" } });
> db.things.find();
{ "_id" : ObjectId("4d837bbf2b0ff30089e70d79"), "value" : { "_id" :
ObjectId("4d837bbf2b0ff30089e70d79"), "processedVersion" : 1,
"newVersion" : 2 } }
{ "_id" : ObjectId("4d837bc32b0ff30089e70d7a"), "processedVersion" :
2, "newVersion" : 2 }

With reduce (first create another thing with processedVersion=1):
> db.things.save( { processedVersion : 1, newVersion : 2 } );


> db.things.mapReduce(m, r, { out : { reduce : "things" } });

> db.things.find();
{ "_id" : ObjectId("4d837bbf2b0ff30089e70d79"), "value" : { "_id" :
ObjectId("4d837bbf2b0ff30089e70d79"), "processedVersion" : 1,
"newVersion" : 2 } }
{ "_id" : ObjectId("4d837bc32b0ff30089e70d7a"), "processedVersion" :
2, "newVersion" : 2 }
{ "_id" : ObjectId("4d837c8f2b0ff30089e70d7b"), "value" : { "_id" :
ObjectId("4d837c8f2b0ff30089e70d7b"), "processedVersion" : 2,
"newVersion" : 2 } }

What I want to have is s.th. like this:
{ "_id" : ObjectId("4d837bc32b0ff30089e70d7a"), "processedVersion" :
2, "newVersion" : 2 }
{ "_id" : ObjectId("4d837c8f2b0ff30089e70d7b"), "processedVersion" :
2, "newVersion" : 2 }

Is it possible at all to create this result via map/reduce?

Cheers,
Martin

Gaetan Voyer-Perrault

unread,
Mar 18, 2011, 1:22:51 PM3/18/11
to mongod...@googlegroups.com
Let's break this down quickly:
 map => outputs { _id, object }
 reduce => accepts an array of objects for a single key
    outputs the processed version of the first object in the array

Does that sound correct?

The output is exactly what I would expect.
But you're doing three very weird things:
 - emitting the whole object
 - only using one value in the array
 - applying the map-reduce to the original collection

Can you clarify what you're trying to do?

- Gates


Cheers,
Martin

Martin Grotzke

unread,
Mar 18, 2011, 3:10:45 PM3/18/11
to mongod...@googlegroups.com
On Fri, Mar 18, 2011 at 6:22 PM, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
> Let's break this down quickly:
>  map => outputs { _id, object }
>  reduce => accepts an array of objects for a single key
>     outputs the processed version of the first object in the array
> Does that sound correct?
> The output is exactly what I would expect.
> But you're doing three very weird things:
>  - emitting the whole object
>  - only using one value in the array
>  - applying the map-reduce to the original collection
> Can you clarify what you're trying to do?
I want to set processedVersion = newVersion for each item where
processedVersion < newVersion.

Cheers,
Martin

--
Martin Grotzke
http://www.javakaffee.de/blog/

Gaetan Voyer-Perrault

unread,
Mar 18, 2011, 3:45:27 PM3/18/11
to mongod...@googlegroups.com
So the typical approach to such a problem is as follows:
 1. Do a find for all documents that meet this criteria
 2. Cursor through these documents and update

So:
#1: db.things.find( { '$where' : "this.processedVersion < this.newVersion" } )
#2: 
.forEach ( function(doc) { 
  doc.processedVersion = doc.newVersion;
  db.things.save(doc);
});

You should be able to do this from the shell or the equivalent with the driver of your choice.

The intent of map-reduce is to aggregate existing data into a new collection. It is not intended to perform maintenance tasks such as the one you're describing.

- Gatse

Martin Grotzke

unread,
Mar 18, 2011, 5:31:19 PM3/18/11
to mongod...@googlegroups.com

As I already wrote I want to compare both/two ways of updating the field: client-side and server-side.
Is it possible to do this in mongodb directly? E.g. should I use eval with nolock?
There are multiple threads working on different parts of the same collection (query on a field "clientId") so they should not lock the collection.

Thanx && cheers,
Martin

Am 18.03.2011 20:45 schrieb "Gaetan Voyer-Perrault" <ga...@10gen.com>:
So the typical approach to such a problem is as follows:
 1. Do a find for all documents that meet this criteria
 2. Cursor through these documents and update

So:
#1: db.things.find( { '$where' : "this.processedVersion < this.newVersion" } )
#2: 
.forEach ( function(doc) { 
  doc.processedVersion = doc.newVersion;
  db.things.save(doc);
});

You should be able to do this from the shell or the equivalent with the driver of your choice.

The intent of map-reduce is to aggregate existing data into a new collection. It is not intended to perform maintenance tasks such as the one you're describing.

- Gatse



On Fri, Mar 18, 2011 at 12:10 PM, Martin Grotzke <martin....@googlemail.com> wrote:
>

> On Fri...

Gaetan Voyer-Perrault

unread,
Mar 18, 2011, 8:17:40 PM3/18/11
to mongod...@googlegroups.com
> As I already wrote I want to compare both/two ways of updating the field: client-side and server-side.

I just provided you with client side code.

Have you tried putting that into server-side code and then running with a non-blocking db.eval?

--

Martin Grotzke

unread,
Mar 19, 2011, 5:13:22 PM3/19/11
to mongod...@googlegroups.com
On Sat, Mar 19, 2011 at 1:17 AM, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
>> As I already wrote I want to compare both/two ways of updating the field:
>> client-side and server-side.
> I just provided you with client side code.
Thanx for this!

> Have you tried putting that into server-side code and then running with a
> non-blocking db.eval?

Yes, I have turned it into an eval invoked via java:
https://gist.github.com/865065
This is the relevant code:

private static int updateCurrentVersionViaNewNonBlockingEval( final
DBCollection things ) throws MongoException {
final String code = "var i = 0;" +
"db.things.find({ '$where' : 'this.currentVersion <
this.newVersion' })" +
".forEach( function(doc) {" +
" doc.currentVersion = doc.newVersion;" +
" db.things.save(doc);" +
" i += 1; } );" +
"return i;";
final CommandResult commandResult = things.getDB().command(
BasicDBObjectBuilder.start()
.add( "$eval" , code )
.add( "nolock" , Boolean.TRUE )
.get() );
commandResult.throwOnError();
return commandResult.getInt( "retval" );
}

Two questions regarding this:
1) If the find would be find({ clientId : 42, '$where' :
'this.currentVersion < this.newVersion' }) and if there would be an
index on the clientId, would this index also be used by the find
inside the eval? I'd expect this to be the case but just want to be
sure.

2) Should it be faster to do
db.things.update( { _id : doc._id }, { $set : { currentVersion :
doc.newVersion } } );
instead of
doc.currentVersion = doc.newVersion;
db.things.save(doc);
?
A very simple test on my local machine seems to indicate that the
latter gets slower the more fields the objects have, and the in-place
updates seem to be always faster.

Just as I'm curious: is it possible at all to do this
update-field-by-another-field with map/reduce with 1.8?

Cheers,
Martin

Gaetan Voyer-Perrault

unread,
Mar 22, 2011, 7:25:11 PM3/22/11
to mongod...@googlegroups.com
>2) Should it be faster to do
> db.things.update( { _id : doc._id }, { $set : { currentVersion :
> doc.newVersion } } );
> instead of
> doc.currentVersion = doc.newVersion;
> db.things.save(doc);
> ?

The $set command will generally be faster than .save().
Less data on the wire and less logic. save() is actually using update "beneath the hood"

1) If the find would be find({ clientId : 42, '$where' :
> 'this.currentVersion < this.newVersion' })
> ...

In your example the find *will* leverage the index on clientId. The $where clause will only be applied to those objects where clientId was 42.

Just as I'm curious: is it possible at all to do this
> update-field-by-another-field with map/reduce with 1.8?

You _may_ be able to leverage the new merge option on map-reduce.
However, for this to work you need a very specific data layout which is not very practical.

- Gates

Reply all
Reply to author
Forward
0 new messages