Running stored functions from Java API

971 views
Skip to first unread message

HG

unread,
Jan 13, 2011, 8:32:41 AM1/13/11
to mongodb-user
Hi!

I have a problem where I have N servers that are at the same time
asking for string mappings. The mapping is from arbitrary unique
string to an integer and the integers start from 0. So basically I'm
thinking centralized "hashmap" where the key is the string and value
is an integer. Each time when I see a new string we give it a new int
(max+1). To do this in MongoDB, I thought I save the current max int
to one special key.

Now, I cannot do this directly using the Java API put and get directly
as multiple threads can be asking for the same max int before previous
one has updated it and thus I would not get unique mapping. How should
I do it? I know that I can create stored function to the server that
gets and increments the max int and stores that with the new string.
And AFAIK, this should block the writes so the mapping should be
unique. However, it's not clear from the tutorials how to call this
from the Java API? And whether this is the right way to do it?

Thanks for all tips!

Nat

unread,
Jan 13, 2011, 9:58:01 AM1/13/11
to mongodb-user

Keith Branton

unread,
Jan 13, 2011, 10:22:18 AM1/13/11
to mongod...@googlegroups.com
The findAndModify command is able to atomically change a document and return its new or old value in a single, safe operation.


In the java driver the method is on the DBCollection.

If you're not doing this very often, then one call to get and increment the number followed by an insert/update to store the number in your "hashmap" would probably work.

Keith.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


HG

unread,
Jan 14, 2011, 3:37:34 AM1/14/11
to mongod...@googlegroups.com
Hello!

Thanks for the answer. However, I think I was not clear enough on what
I need. I'll try again now that I know also a little more.

So, I have 100 nodes that are going through data with some strings in
it. These strings need to be mapped to unique int, however, the
problem is that there are 100 nodes doing this concurrently.

So, here is some pseudo code in the java client in node N:
String S = read.line();
int i = MongoDB.getInt(S);
if (i == null){ // i.e. the string S was not in DB, let's insert it there
int nexti = MongoDB.getInt("Current Max Value");
i = nexti + 1;
MongoDB.setInt2S(S, i);
MongoDB.setInt2S("Current Max Value", i);
}

So, the worry is that as I have 100 nodes doing this at the same time
(and at worst case as fast as possible), so different nodes could get
same int mapped to different strings. Most of the calls should hit old
values and be just nice gets and threading and non-blocking access is
great for that.

So, if I'd now use inc or findAndModify to update the "Current Max
Value", I'd still have the risk that different nodes asking mapping
for same string, would end up with different int mapping.

So, what I was thinking is to have a function like this:
function new(id){
var max = db.map.findOne({"_id" : "Current Max Value"});
max.value++;
db.map.save(max);
db.map.save({"_id" : id, "value" : max.value});
return max.value;
}

And now, my thinking is that if I run this with eval, this will block
the DB until the function has run. And therefore I'm safely
incrementing the max value and storing the string with that value
before anybody else can ask for either one.

So, 2 questions:
- am I on right track? Or did I misunderstand what you were saying?
- how do I call this function from Java?


Thanks again!

--
HG.

Nat

unread,
Jan 14, 2011, 3:43:02 AM1/14/11
to mongodb-user
Did you read the link? $inc operator with findandmodify allows you to
update and get the document in one call. So it's atomic.
Check out http://api.mongodb.org/java/current/com/mongodb/DBCollection.html

HG

unread,
Jan 14, 2011, 8:23:12 AM1/14/11
to mongod...@googlegroups.com
Hi Nat!

Sorry, I'm just starting with MongoDB and I guess I don't see how I
could do this with $inc and findAndModify. Please bear with me. The
java API docs don't have much explanation anywhere. So maybe I don't
understand these correctly. This for instance is how I understood that
I need to simple put that either puts new value with the key or if the
key already exists, replaces the value:
public boolean put(String key, String value) {
BasicDBObject row = new BasicDBObject();
row.put("_id", key);
DBObject old = coll.findOne(row);
if (old == null){
BasicDBObject doc = new BasicDBObject();
doc.put("_id", key);
doc.put("value", value);
coll.insert(doc);
return true;
} else {
old.put("value", value);
coll.findAndModify(row, old);
return true;
}
}

Have I understod findAndModify incorrectly?

However, that is not directly what I'm looking for. From the link that
you gave, the case "Insert if Not Present" is close to what I wanted
to do. They don't use $inc nor findAndModify, but they do create a
function for that. So, Nat, can you show me example on how to do what
I want with $inc and findAndModify?

Thanks in advance!


--
HG.

Nat

unread,
Jan 14, 2011, 9:11:58 AM1/14/11
to mongodb-user
Take a look at a sample in javascript at
https://github.com/mongodb/mongo/blob/master/jstests/find_and_modify4.js

Basically findandmodify will return the document in which you can
retrieve the value
http://api.mongodb.org/java/2.5-pre-/com/mongodb/DBCollection.html#findAndModify(com.mongodb.DBObject,
com.mongodb.DBObject, com.mongodb.DBObject, boolean,
com.mongodb.DBObject, boolean, boolean)
> > Check outhttp://api.mongodb.org/java/current/com/mongodb/DBCollection.html

Keith Branton

unread,
Jan 14, 2011, 11:43:47 AM1/14/11
to mongod...@googlegroups.com
HG, how are you getting on with this?

I meant for you to use findAndModify on a collection that maintains the max value. Sorry I was not clearer in my original post.

How you maintain the mapping in a concurrency-safe way is a little different. How big of a problem is it if you occasionally end up with gaps in your number sequence? If that's not a big deal then I would suggest an algorithm such as this, assuming one collection called map {_id:key, value:Integer}, and one called max {_id:0, current:Integer}:

public Integer getMappingFor(String key) {
    DBObject result = map.findOne(new BasicDBObject("_id",key));
    if (result != null) {
        return (Integer)result.get("value");
    }
    // at this point we need to record a new mapping
    // use find and modify on the max collection to get a new value
    DBObject query = new BasicDBObject("_id", 0);
    DBObject update = new BasicDBObject("$inc", new BasicDBObject("current", 1));
    Integer value = (Integer)max.findAndModify(query, null, null, false, update, true, true).get("current");
    // now we need to attempt to map the key to the value
    // note that another host may have done this since we starting this function
    try {
        map.insert(new BasicDBObject("_id", key).append("value", value), WriteConcern.SAFE);
        return value;
    } catch (DuplicateKey e) {
        // if we're here then another host already recorded the mapping
        return (Integer)max.find(new BasicDBObject("_id",key).get("value"));
    }
}

This assumes that DBCollections max and map are in scope. Untested code warning! Should be pretty close.

On the upsert we are providing a query to find the one and only record, an update to increment current, and flags to insert if that record did not exist, and return the new document rather than the original.

On the insert we use WriteConcern.SAFE so that the DuplicateKey exception gets thrown if we try to insert a duplicate. Without that mode our request would be quietly ignored.

It is possible for sequence numbers to be skipped in the case of collisions, because if we can't insert the new mapping we discard the new key and use the one from the collection we collided with. If skipping sequence numbers is a problem then that too can be solved, but it adds complexity.

If you are performing many of these lookups on each host I would also recommend caching the results on each host to save requesting the same value more than once. How you cache depends on how many distinct values you are creating keys for.

Hope this helps,

Keith.

HG

unread,
Jan 17, 2011, 11:42:04 AM1/17/11
to mongod...@googlegroups.com
Thanks Keith!

I think you understood my problem. What you propose seems to be ok -
except that I need to check the gaps. But otherwise - thanks a lot!
This is how far I got myself:
1) In java, get index for string
2) if it returns null, then call this function on server:
> function saveAndGetNew(s){
... var newstr = db.map.findOne({"_id" : s});
... if (newstr == null){
... var max = db.map.findOne({"_id" : 0});
... max.CurrentMax++;
... db.map.save(max);
... db.map.save({"_id" : s, "index" : max.CurrentMax});
... return max.CurrentMax;
... } else {
... return newstr.index;
... }
... }
> db.map.save({"_id": 0, "CurrentMax" : 0})

Which seems to work:

> db.map.find()
{ "_id" : 0, "CurrentMax" : 0 }
> saveAndGetNew("String1")
1
> saveAndGetNew("String2")
2
> saveAndGetNew("String3")
3
> saveAndGetNew("String2")
2
> db.map.find()
{ "_id" : 0, "CurrentMax" : 3 }
{ "_id" : "String1", "index" : 1 }
{ "_id" : "String2", "index" : 2 }
{ "_id" : "String3", "index" : 3 }

So, now if I can call this from Java, I can try you method and this to
see which works faster. (Yes, I will cache all ID's on the client
nodes as far as I have memory there.) However, how to call that from
Java (as was my original question)? Java docs contain "eval" and
"doEval" however, the javadoc doesn't tell anything about those
commands, so I don't know what they do or how to use them (at least my
trial calls such as above from the shell, have not worked).

--
HG

Keith Branton

unread,
Jan 17, 2011, 12:18:41 PM1/17/11
to mongod...@googlegroups.com
You can save the function from java using:

mongo.getDB("yourDbHere")
                .doEval("db.system.js.save({_id:'saveAndGetNew',value:function(s) {...etc...}});");

(or you can just save it from the shell)

Then you can call it using:

int current = ((Number) mongo.getDB("yourDBHere").eval(
                    "return saveAndGetNew(\"" + s + "\");")).intValue();



Keith Branton

unread,
Jan 17, 2011, 12:44:13 PM1/17/11
to mongod...@googlegroups.com
One warning with the eval approach - you can't use it with sharded collections - so as long as you don't anticipate sharding this collection the eval approach should work.



HG

unread,
Jan 18, 2011, 10:41:56 AM1/18/11
to mongod...@googlegroups.com
Hi!

Thanks! I guess my problem was that I had not saved the function to
right place. However, what is the difference with .eval and .doEval?
The API doesn't way anything about them (or even the args). Why are
you first running .doEval and then just .eval?

Ps. no need for sharding yet and if I get that big, I will need to use
something else than Mongo. I like Mongo a lot, but I've been told that
"there is no way to go to production with mongo because it is loosing
data". So as long as I have everything under my control, I'm fine with
Mongo :-)

--
HG.

Keith Branton

unread,
Jan 18, 2011, 11:51:35 AM1/18/11
to mongod...@googlegroups.com

what is the difference with .eval and .doEval?
The API doesn't say anything about them (or even the args). Why are

you first running .doEval and then just .eval?

The best documentation is well written code - always accurate and very up-to-date. The mongo driver is pretty well written. The difference can be seen nere:

    public CommandResult doEval( String code , Object ... args )
        throws MongoException {

        return command( BasicDBObjectBuilder.start()
                        .add( "$eval" , code )
                        .add( "args" , args )
                        .get() );
    }

    public Object eval( String code , Object ... args )
        throws MongoException {
        
        CommandResult res = doEval( code , args );
        
        if ( res.ok() ){
            return res.get( "retval" );
        }
        
        throw new MongoException( "eval failed: " + res );
    }

So eval simply wraps doEval, does a check for successful execution and returns the "retval" value from the result.

I used eval because I wanted the result of the function call. You could probably use eval for both calls - but once I got it working I didn't investigate much more. I don't use eval any more - I replaced it with findAndModify.

What I haven't found yet is much documentation on all the commands you can execute with db.command, and their args :(
 
 I've been told that
"there is no way to go to production with mongo because it is loosing
data". 

I have not experienced this, and I fully plan to go into production using Mongo in the next month or so.  Do you have any further details as to where it's loosing data? I know about the following...
  • if you don't use WriteConcern.SAFE then it is possible for writes to fail and you will not be notified
  • if you write a lot and then query before the writes are all on disk then some may not be there yet
  • replication is necessary if durability is a requirement
  • you need a good optimistic-locking strategy if concurrent modifications are possible and you can't use the limited set of operation modifiers, otherwise later changes can overwrite previous changes
Are you aware of any other potential problems?

HG

unread,
Jan 20, 2011, 5:02:50 AM1/20/11
to mongod...@googlegroups.com
Hi!

On Tue, Jan 18, 2011 at 6:51 PM, Keith Branton <ke...@branton.co.uk> wrote:
>>
>> what is the difference with .eval and .doEval?
>> The API doesn't say anything about them (or even the args). Why are
>> you first running .doEval and then just .eval?
>>
> The best documentation is well written code - always accurate and very
> up-to-date. The mongo driver is pretty well written. The difference can be

Well... yes, it works for those who write that code, I agree. However,
for the rest of us who write our own code and want to use some great
library to help there, it really doesn't make sense to start learning
your code as there is just no time for such. Therefore, the
documentation is what enables to use great libraries and products.
With IDE, you just import the API docs and go from there. But since
the API docs are completely empty, it's really hard to use and it's
much easier to move to something else that is documented and let's me
concentrate on my problem.
But thanks for the code - I understand the difference now.

>>  I've been told that
>> "there is no way to go to production with mongo because it is loosing
>> data".
>
> I have not experienced this, and I fully plan to go into production using
> Mongo in the next month or so.  Do you have any further details as to where
> it's loosing data? I know about the following...

Like I wrote, I've been told from our architects (I do trust them
mostly) not to trust it (no evidence). I don't question MongoDB! It
was just a side remark that I will not be allowed to use MongoDB for
more than adhoc things I'm sure. I've not experienced anything and am
not worried at all.

--
HG.

Reply all
Reply to author
Forward
0 new messages