Standard queries set up as a Mongo JS function

335 views
Skip to first unread message

s.molinari

unread,
Jan 3, 2014, 12:49:35 AM1/3/14
to mongod...@googlegroups.com
Hi,

If we had a certain set of queries, which were standard for certain CRUD functionality needed for our application, would it be possible or wise or useful to push those functions down to the database level as JS functions? What I am looking for is lowering network overhead of sending multiple queries to the database. Let's say reducing from 10 standard queries without the JS function to just 1 with the JS function. Or am I missing the purpose of JS functions on the DB server side completely? 

Scott


Sam Millman

unread,
Jan 3, 2014, 3:01:25 AM1/3/14
to mongod...@googlegroups.com
Hell no, make an abstraction layer client side to house and fire those functions as needed. Honestly forget about stored JS outside of Map Reduce, it provides no other benefit and can actually slow your app down a fair amount.

How big is a query? 4kb? Most networks are designed to handle GBps, you could send 100,000's of queries before you start to find a network bottle neck.

You have also got to think that you still gotta send a query down to the shard through the mongos to actually query that JS function, or send data to it...yeah you won't save anything really.

It is in place of returning large results where you could make use of projection to cut the overhead of network traversal.


--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sam Millman

unread,
Jan 3, 2014, 3:03:04 AM1/3/14
to mongod...@googlegroups.com
Actually after some measurements my average query is more like 1kb

s.molinari

unread,
Jan 3, 2014, 3:36:19 AM1/3/14
to mongod...@googlegroups.com
We are looking at a very busy usage of Mongo, so every bit of free network traffic will be good. 

I did just see the warning in the MongoDB docs about using the JS functions sparingly.

Note .

We do not recommend using server-side stored functions if possible.


I like the fact Mongo does that in their docs. Warn about limitations or things to avoid. Not all products are that transparent.

I'll look up projection. Thanks. I heard the term before, but I am still uncertain what it is. Still learning.:) 

It is too bad one can't create a JS function on the server side of Mongo to basically pull data together with a number of queries (i.e. simulating joins?), to return a single result. It seems there must be a benefit, when the client can do less querying to get the same result, when the network is being pushed to the limit, than have the client side have the logic and have to send out numerous queries for a standard request for data. 

I know, don't try to make Mongo what it isn't, an RDBMS, but still. :)

I can't think of an app that doesn't need relationships within its data and it is a fact of MongoDB life that relationships must exist too. Gathering data, because of those relationships is monumentally important to any application. I don't think it is right when I here, "If you need to join data, don't use Mongo." You might as well just say, "Don't use Mongo at all, if you have an even moderately complex application." Or, "Just use Mongo, if you have a simple applications for it." That isn't the truth I think, because I am sure it isn't what the MongoDB guys set out to make and Mongo is being used by a good number of fairly complex applications anyway. Right? So why not improve even more on the ability to join information smartly within the DB?  Is it really an architectural or basic conceptual problem? We aren't talking schema here. We are talking data relationships and "No-Sql" simply cannot mean "no relationships" or poor relationships. I aint buying it.:)

Scott

Sam Millman

unread,
Jan 3, 2014, 4:57:18 AM1/3/14
to mongod...@googlegroups.com
Well to put it into perspective I don't think facebook sees a bottleneck with querying with network traffic and they use huge and complex SQL statements to do some of their work (wya over 4kb in size), so if a decent network can handle their load I would consider this a bit pre-mature.

They also use techs like Cassandra and what not that do not have the benefit of stored procedures etc, so they can't get around the problem by using server side functions.

They see more of a bottle neck with how much work they have to do on the database.


--

Sam Millman

unread,
Jan 3, 2014, 4:59:58 AM1/3/14
to mongod...@googlegroups.com
A good example of a problem they face is HUGE IO problems due to writing out JOINs (hence why they front every db server with memcached editions of the JOINs), which is one problem MongoDB is supposed to solve.

If you are really going down a high usage tier here you might want to search around on how other sites do it and what ideas they use to combat the every day problems of using their database.

Sam Millman

unread,
Jan 3, 2014, 5:02:13 AM1/3/14
to mongod...@googlegroups.com
> It is too bad one can't create a JS function on the server side of Mongo to basically pull data together with a number of queries (i.e. simulating joins?),

You must bare in mind that MongoDB is not made of JS, it is made of C++. Everytime you call JS the C++ code actually fires up a number of isolates that contain threads of the V8 engine to run your JS on. This is extremely time and resource consuming.

You could always modify the base C++ code to take a basic JOIN.

s.molinari

unread,
Jan 3, 2014, 7:31:50 AM1/3/14
to mongod...@googlegroups.com
I wish we had the talent to do customized C++ work. Maybe in the future.

We'll see how it goes as is for now. :-) 

Can you clarify this statement a bit? 

A good example of a problem they face is HUGE IO problems due to writing out JOINs (hence why they front every db server with memcached editions of the JOINs), which is one problem MongoDB is supposed to solve.

How does Mongo solve this kind of issue? Because, that is exactly what I am looking at. If I know how Mongo can solve it, then it becomes a non-issue.

Scott

Sam Millman

unread,
Jan 3, 2014, 7:43:13 AM1/3/14
to mongod...@googlegroups.com
MongoDB, as ytou know demands either:

a, you rewrite your schema
b, you do the join client side

In a large distributed environment you can imagine that actually two separater queries to get to related documents is possibly faster than using a server-side join due to the non-existant overhead of doing those two queries except for network traffic, which is something would exist anyway, most likely in the exact same amount (maybe more since JOIN syntax can be quite long).

So for small JOINs in large distributed envos it soon becomes apparent that MongoDB can out perform.

It was, for historical (and probably current reasons), that MongoDB did not implement JOINs because to do them server side is very difficult in the environment it is designed for. It is a lot easier in SQLs environment but that's a whole different environment with its own draw backs.


--

Sam Millman

unread,
Jan 3, 2014, 7:45:18 AM1/3/14
to mongod...@googlegroups.com
As a PS if you are looking to do really complex paginated, sorted and queried JOINs across large datasets it might be worth knowing that it is unlikely your scenario would work all that great in SQL either.

It is possible to code it in SQL but just like stored JS in MongoDB that doesn't mean it is right to. If you are looking at a very high usage you will find that large complex joins just will not happen very easily.

Sam Millman

unread,
Jan 3, 2014, 7:49:33 AM1/3/14
to mongod...@googlegroups.com
C++ is actually quite an easy language, if you know Java or C# or even PHP (though they basre little resemblance, C++ is a way more formed and structured language, I am a PHP programmer so I am ain't just trolling that) then you will find jumping over to it to be a matter of just getting the difference between managed and unmanaged right and terminology settled.

s.molinari

unread,
Jan 3, 2014, 12:02:46 PM1/3/14
to mongod...@googlegroups.com
Thanks for your support Sammaye.

Scott

William Zola

unread,
Jan 4, 2014, 12:33:03 AM1/4/14
to mongod...@googlegroups.com
Hi Scott! 

My answers are in-line:


On Friday, January 3, 2014 12:36:19 AM UTC-8, s.molinari wrote:
We are looking at a very busy usage of Mongo, so every bit of free network traffic will be good. 
[snip] 

It is too bad one can't create a JS function on the server side of Mongo to basically pull data together with a number of queries (i.e. simulating joins?), to return a single result. It seems there must be a benefit, when the client can do less querying to get the same result, when the network is being pushed to the limit, than have the client side have the logic and have to send out numerous queries for a standard request for data. 

What makes you think that the network is the bottleneck?  Do you have evidence?  If it really is the bottleneck, is there any way you can get more bandwidth?  To quote Donald Knuth: "Premature optimization is the root of all evil (or at least most of it) in programming." 

I know, don't try to make Mongo what it isn't, an RDBMS, but still. :)

I can't think of an app that doesn't need relationships within its data and it is a fact of MongoDB life that relationships must exist too. Gathering data, because of those relationships is monumentally important to any application. I don't think it is right when I here, "If you need to join data, don't use Mongo." You might as well just say, "Don't use Mongo at all, if you have an even moderately complex application." Or, "Just use Mongo, if you have a simple applications for it." That isn't the truth I think, because I am sure it isn't what the MongoDB guys set out to make and Mongo is being used by a good number of fairly complex applications anyway. Right? So why not improve even more on the ability to join information smartly within the DB?  Is it really an architectural or basic conceptual problem? We aren't talking schema here. We are talking data relationships and "No-Sql" simply cannot mean "no relationships" or poor relationships. I aint buying it.:)

You absolutely need to have relationships between various data entities in any reasonable data model.  MongoDB gives you multiple design patterns to model one-to-many and many-to-many relationships.  Google for "MongoDB embed vs. reference" and you'll see close to 70K results.  You have to choose the logical data model that is closest to your requirements.

One of the basic design requirements of MongoDB is that the database does not support any operation which cannot be done efficiently with a sharded cluster.  This means that supporting joins within the database means supporting generalized *distributed* joins.  The problem is, nobody knows how to do this efficiently and quickly.  

But it's Just Not True to say "you can't do joins with MongoDB".  You can do joins, and people do them all the time.  They're just *application-level* joins.  Here's an example in Python using the DEPT and EMP tables you may remember from a certain famous relational database.

Here's some (JavaScript) code to set up and populate the tables.   For legibility, I'm going to use integers for the _id field, but everything I will do will work with ObjectIDs as well:


> db.dept.save( {_id: 1, name: 'Sales'} );

> db.dept.save( {_id: 2, name: 'Marketing'} );  
> db.dept.save( {_id: 3, name: 'Support'} );


> db.emp.save( { _id: 1, name: 'Ben', dept: 1 } );
> db.emp.save( { _id: 2, name: 'William', dept: 3 } );
> db.emp.save( { _id: 3, name: 'Jenna', dept: 3 } );
> db.emp.save( { _id: 4, name: 'Steven', dept: 3 } );


To make this efficient, I'll create a secondary index on the 'dept' field in the 'emp' collection, and the 'name' field in the 'dept' collection:

> db.emp.ensureIndex( dept: 1 );
> db.dept.ensureIndex( name: 1 );


By default, there are already indexes on "dept._id" and "emp._id".

To find all of the employees in the "Support" department, I need to do two queries in an application-level join:

> result = db.dept.findOne({name: 'Support'}, {_id:1} );
{ "_id" : 3 }

> desired_dept = result["_id"];
3

> db.emp.find( { dept: desired_dept }).sort({name:1}).pretty();

{ "_id" : 3, "name" : "Jenna", "dept" : 3 }
{ "_id" : 4, "name" : "Steven", "dept" : 3 }
{ "_id" : 2, "name" : "William", "dept" : 3 }


Here's some code in Python that does a master/detail query:

def setup() :
    port
= sys.argv[1]
    conn
= Connection('localhost', int(port) )
   
return conn.test

def query_dept_emp(db) :
    dept
= db.dept
    emp
= db.emp

   
for ddoc in dept.find().sort("name",1) :
       
print "Employees in", ddoc["name"]

       
for edoc in emp.find( {"dept": ddoc["_id"] } ).sort("name",1) :
           
print "  ", edoc["name"]

def main() :
   
if len(sys.argv) != 2:
       
print "usage: ", sys.argv[0], " <port number>"
       
exit(1)

    db
= setup()
    query_dept_emp
(db)

if __name__ == "__main__":
    main
()


I'll contend that there is no more than a rounding error's worth of difference in network bandwidth usage between this query and the equivalent SQL query.  If you had a bunch of other fields in the 'dept' collection that you don't want to fetch, you coul optimize this by using  projections (e.g. "fields=('_id', 'name')) to only fetch the fields of interest.

--William
 

Scott

Asya Kamsky

unread,
Jan 4, 2014, 9:18:41 PM1/4/14
to mongodb-user
Hi Scott,

In addition to what William said, with which I whole-heartedly agree,
I want to make another point.

You say "wouldn't it be nice to have the server do the joins" - but to
this I ask you "what server"? You don't have a single server, you
have many of them, because you plan to have a HUGE mongo deployment,
it means that you will eventually (or even right away) need to shard.
Now your data will be partitioned across multiple "servers". Which
one of them shall the "join" request be sent to? What if the data to
be joined lives on two different "servers"?

Architecturally, there is no single server. Once you accept that, you
will not want to push the work to "the server" as much :)

Asya

Sam Millman

unread,
Jan 5, 2014, 4:27:13 PM1/5/14
to mongod...@googlegroups.com
I do agree though that MongoDB can make much better use of indexes for joining and what not than any application ever could unless we had access to query the indexes and merge them etc.

As such "server-side" can just mean in an environment that can understand how to do the query right.
Reply all
Reply to author
Forward
0 new messages