Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
any alternative to client side processing in place of what would be a RDMS join?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  18 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Brent Gracey  
View profile  
 More options Oct 7 2012, 6:06 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Sun, 7 Oct 2012 03:06:38 -0700 (PDT)
Local: Sun, Oct 7 2012 6:06 am
Subject: any alternative to client side processing in place of what would be a RDMS join?

If a data set & use cases are focused on "relational" work - would that
reduce the fit of Mongodb for the project? For example a "friendship" data
set where questions like please get me the detailed info of users (held in
person_info) who interacted with each other (held in person interaction)
who bought product X ( held in product and purchase_history collections) at
event Y.

Would it be possible to use a map reduce function which operates over
multiple tables?

I guess foursquare have some way of answering those sorts of questions
efficiently with  Mongo!

(this is a shortened version of half of post
https://groups.google.com/forum/?fromgroups=#!searchin/mongodb-user/g... -
maybe the length of that post was too much!)

Regards,
Brent


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Zola  
View profile  
 More options Oct 8 2012, 9:17 pm
From: William Zola <william.z...@10gen.com>
Date: Mon, 8 Oct 2012 18:17:09 -0700 (PDT)
Local: Mon, Oct 8 2012 9:17 pm
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

For folks coming from the SQL world (such as myself) one of the hardest
things to learn about MongoDB is the new style of schema design.  In the
SQL world, everything goes into third normal form.  Folks come to think
that there is a single right way to design their schema, because there
typically is one.

There is a cognitive shift required from a relational mindset in order to
properly design a MongoDB schema.  If you are still "thinking
relationally", then you will miss most of the advantages of using MongoDB.

More importantly, if all you have is a hammer, everything looks like a
nail.  I know that, for myself, it took me a long time to take off the
"relational blinders".  While you still may have entities with one-to-one,
many-to-one, or many-to-many relationships, it is no more necessary to
force them into third normal form in the database than it is in your
programming language.  Just as there are ways to represent those
relationships within an object-oriented language such as Java or Ruby,
there are equally effective ways to represent them in a document-oriented
database, such as MongoDB.

Unlike the SQL world, in the MongoDB world, there is no one best schema
design.  More accurately, in MongoDB schema design depends on how the
application is going to access the data.  

Here are the key questions that you need to have answered in order to
design a good schema for MongoDB:
 - How much data do you have?
 - What are your most common operations?  Will you be mostly inserting new
data, updating existing data, or doing queries?
 - What are your most common queries?
 - What are your most common updates?
 - How many I/O operations do you expect per second?

Here's how these questions might play out if you are considering
one-to-many object relationships.

In SQL you model a many-to-one relationship by creating a pair of
master/detail tables with a primary key/foreign key relationship.  That's
really the only way to do it.

In MongoDB, you have a number of choices: you can embed the data, you can
create a linked relationship, you can duplicate and denormalize the data,
or you can use a hybrid approach. The correct approach would depend on a
lot of details about the use case of your application.

The key thing required to make this cognitive shift is to get enough
exposure to MongoDB schemas to un-learn what you have learned.  Here are
some references to get you started.

Here are some good general references on MongoDB schema design.

MongoDB presentations:
 - http://www.10gen.com/presentations/mongosf2011/schemabasics
 - http://www.10gen.com/presentations/mongosv-2011/schema-design-by-example
 - http://www.10gen.com/presentations/mongosf2011/schemascale
 -
http://www.10gen.com/presentations/MongoNYC-2012/Building-a-MongoDB-P...

Here is a book about MongoDB schema design that I think you would find
useful:
 - http://www.manning.com/banker/ (MongoDB in Action)

Here are some sample schema designs:
 - http://docs.mongodb.org/manual/use-cases/
 -
https://openshift.redhat.com/community/blogs/designing-mongodb-schema...

Ultimately, the way that relational databases work is by splitting your
data up among many tables & then joining them dynamically at query time.  
MongoDB has two modes.  The more native mode involves pre-aggregating your
data into logical chunks when you insert it, and then querying those
pre-aggregated chunks.  MongoDB can also work relationally: you just have
to do the joins on the client side in your code, rather than relying on the
database server to do it for you.  

In my opinion, you often get cleaner code and faster performance by
performing the join yourself: you know more about the semantics of the data
and the data distribution than any RDBMS optimizer can ever know.

I hope you found this useful.  Let me know if you have further questions.

 -William


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Oct 9 2012, 9:04 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Tue, 9 Oct 2012 06:04:21 -0700 (PDT)
Local: Tues, Oct 9 2012 9:04 am
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi - thank you very much for you input - I am trying to take of the
blinders!

At the moment I am working on a new requirement which my data is not pre
aggregted to answer - are the below my full set of options, or am I missing
a trick or two?

1) Update schema to provide pre aggregration
+ Can lead to optimal Mongo performance
- Requires db migration and code update to keep aggrations current with
ongoing system usage

2) Answer requirement with client side processing - eg get Master and
Detail table, then loop over Master in client code, and lookup required
details
+ Low number of db connections
- Perhaps this is just a knowledge gap for me, but I have to ensure I
optimize handling larger data sets in my "web app" code (ie ensure find
product details based on product ID isn't doing a full list scan on the
detail data set)
Sorting the data would be the first approach I will look into - any other
suggestions welcome.

3) Mixture client and multiple queries - eg get the Master table and then
do selects to get Details as required
+ Detail selects can use DB indexes to be efficient
- Very high number of db connections

At the moment I'm using option 3, which is slower than what I would get in
the same setup with a RDMS just joining the tables based on an index, but
I'm hesitant to move to 2, as users being able to drive full tables selects
isn't a model I'm used to working with, and 1 is a fairly large piece of
work, which may have to be repeated when the next requirement come along


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
william.z...@10gen.com  
View profile  
 More options Oct 9 2012, 9:52 am
From: william.z...@10gen.com
Date: Tue, 9 Oct 2012 06:52:15 -0700 (PDT)
Local: Tues, Oct 9 2012 9:52 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

On Tuesday, October 9, 2012 6:04:21 AM UTC-7, Brent Gracey wrote:

> Hi - thank you very much for you input - I am trying to take of the
> blinders!

> At the moment I am working on a new requirement which my data is not pre
> aggregted to answer - are the below my full set of options, or am I missing
> a trick or two?

Unfortunately, you haven't given me enough information to answer the
question.  There's no one "right" generalized solution for a master/detail
relationship in MongoDB.  

I'd need to know the details of your data model and the semantics of what
you're trying to accomplish in order to make suggestions for your
particular use case.  Are you doing targeted queries to satisfy a web page?
 Or are you trying to do analytics to roll-up a large data set that's
spread across two collections?  The tools in the toolbox and the set of
options are highly dependent on which one you're doing.

Once I have more information I can give you a better answer.  

 -William


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Oct 9 2012, 4:50 pm
From: Brent Gracey <brentgra...@gmail.com>
Date: Tue, 9 Oct 2012 13:50:37 -0700 (PDT)
Local: Tues, Oct 9 2012 4:50 pm
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

Hi,

I would like to get my head around as many of the tools in the toolbox as
possible, so I can make an informed choice of the best fit for my current
requirements and understand what road map I should have in mind for as the
data set grows. So if there is an approach that doesn't fit under one of
the three methods I listed, it would be great if you could highlight that
to me.

Some more details on my current work is its a report, so its is aggregating
records, and it needs to be available in real time, so users can view a
summary of the event as it progresses through the day. At the moment the
record sets are in the 1000s on the two collections, and the relationship
is 1 to M but a low M, so 1,2 detail records is the most common case.

Thanks again for your help


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Zola  
View profile  
 More options Oct 10 2012, 5:32 pm
From: William Zola <william.z...@10gen.com>
Date: Wed, 10 Oct 2012 14:32:35 -0700 (PDT)
Local: Wed, Oct 10 2012 5:32 pm
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

Here's a quick overview of the tools in the toolbox for many-to-one
relationships:

 - Embedding instead of referencing
 - Placing references to the detail document in the master document
 - Placing a reference to the master document in the detail document.  (AKA
a "Parent Pointer" tree.)  This lets you get all of the details for a
single master document in a single query.
 - Combining both master->detail and detail->master references
 - Denormalizing the data so that some of the needed information from the
detail documents are present in both the master and detail documents.  
(This works best if writes are infrequent.)

See here for more references on "link-vs-embed":
 -
http://stackoverflow.com/questions/5373198/a-simple-mongodb-question-...
 - http://www.slideshare.net/kbanker/mongodb-schema-design-mongofr

If you have a "hard" real-time (as in 'up-to-the-second') requirement, then
you have to do database queries at the time that the application requests
them.  If it's OK for data to be stale, then you can run batch jobs to do
the pre-aggregation, and then perform the reporting queries off of them.  
(Think "Data Mart" and "scheduled ETL".)

MongoDB has a reasonably powerful MapReduce functionality: you can often
use multiple mMapReduce jobs to join data from multiple collections.  While
not suitable for hard real-time requirements, it can be useful for

See here for details:
 -
http://stackoverflow.com/questions/3837394/mongodb-map-reduce-over-mu...
 - http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

For more example of what you can do with MapReduce, see here:
 - http://cookbook.mongodb.org/patterns/pivot/
 - http://cookbook.mongodb.org/index.html

You can also use the Hadoop adaptor to do MapReduce jobs.  The Hadoop
"dependent jobs" feature can allow you to read from multiple collections in
the context of a single Hadoop job.

See here for details:
 -
http://www.10gen.com/presentations/webinar/mongodb-hadoop-taming-elep...
 - http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html

Finally, if your document counts are only in the thousands or tens of
thousands, then I'd suggest staying with the simple strategy of doing joins
in your client application.  The database would have to do these anyway:
there's no extra work involved, just a matter of moving it off of the
database server.

Let me know if you have further questions.

 -William


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Oct 18 2012, 6:19 pm
From: Brent Gracey <brentgra...@gmail.com>
Date: Thu, 18 Oct 2012 15:19:54 -0700 (PDT)
Local: Thurs, Oct 18 2012 6:19 pm
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi - thanks for the comprehensive reply - I'm still working through most of
the links you have suggested. One thing I would like to clarify - for your
comment "suggest staying with the simple strategy of doing joins in your
client application.  The database would have to do these anyway:"

Yes the database application would have to join - but assuming it uses an
index, isn't the database operation not much more efficient than

for all x in array X
    Look up Y[x]

My assumptions for this are:

There is overhead in maintaining DB indexes - this has already been
"expended effort" at the time of getting data
Getting all the data from the DB to the client application loses the index
info
Repeating the effort of generating an index (not sure how many languages
would support that - any Scala specific thoughts welcome) is duplicated
effort
Not doing something to replicate index functionality will make the look up
in Y much less efficient.

So I can't see how client side processing can be as efficient as getting
the db to do it - and as one scales - you either get to a point of having
to do a large data restructure or live with the extra overhead.

Main point of the post if for people to point out if any of my assumptions
are incorrect.

Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Zola  
View profile  
 More options Oct 29 2012, 1:40 pm
From: William Zola <william.z...@10gen.com>
Date: Mon, 29 Oct 2012 10:40:26 -0700 (PDT)
Local: Mon, Oct 29 2012 1:40 pm
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

Sorry for taking so long to get back to you.  

The answer to your question is: the database is not going to be any more
efficient than doing the queries client side -- IF you have designed your
schema properly.

1) Let's begin by looking at how you'd do this in SQL:

CREATE TABLE dept (deptno NUMBER CONSTRAINT pk_dept PRIMARY KEY,
        dname VARCHAR2(30) );
CREATE TABLE emp (
        empid NUMBER CONSTRAINT pk_emp PRIMARY KEY,
        ename VARCHAR(20),
        deptno NUMER CONSTRAINT fk_deptno FOREIGN KEY REFERENCES
dept(deptno) );

First off: this creates two indexes "behind your back": one on dept.deptno,
one on emp.empid.

Second: consider how you would query these tables:

 SELECT empid, ename FROM emp, dept
    WHERE
        dept.dname = 'SALES'
    JOIN
        dept.deptno = emp.deptno
    ORDER BY ename;

Note that unless you index 'emp.deptno', this query is going to be slow
(since it will otherwise perform a full table scan on 'emp').  Unless you
index 'dept.dname', it will have to perform a full table scan on 'dept'.

You're now at no less than FOUR indexes.

This query is going to force the RDBMS to do the following:
  A) Read the 'dept.dname' index to find the location of the row(s) where
"dname = 'SALES'"
  B) Fetch the matching row(s) from the 'dept' table, and save off the set
of 'deptno' values
  C) Read the 'emp.deptno' index to find the locations of the matching rows
in the 'emp' table
  D) Fetch the matching rows from the 'emp' table
  E) Sort the result set in-memory

2) Now, let's consider how to do this in MongoDB.  For legibility, I'm
going to use integers for the _id field, but everything I will do will work
with ObjectIDs as well:

> db.dept.save( {_id: 1, name: 'Sales'} );
> db.dept.save( {_id: 2, name: 'Marketing'} );
> db.dept.save( {_id: 3, name: 'Support'} );
> db.emp.save( { _id: 1, name: 'Ben', dept: 1 } );
> db.emp.save( { _id: 2, name: 'William', dept: 3 } );
> db.emp.save( { _id: 3, name: 'Jenna', dept: 3 } );
> db.emp.save( { _id: 4, name: 'Steven', dept: 3 } );

To make this efficient, I'll create a secondary index on the 'dept' field
in the 'emp' collection, and the 'name' field in the 'dept' collection:

> db.emp.ensureIndex( dept: 1 );
> db.dept.ensureIndex( name: 1 );

By default, there are already indexes on "dept._id" and "emp._id".

To find all of the employees in the "Support" department, I need to do two
queries:

    > result = db.dept.findOne({name: 'Support'}, {_id:1} );
    { "_id" : 3 }
    > desired_dept = result["_id"];
    3
    > db.emp.find( { dept: desired_dept }).sort({name:1}).pretty();
    { "_id" : 3, "name" : "Jenna", "dept" : 3 }
    { "_id" : 4, "name" : "Steven", "dept" : 3 }
    { "_id" : 2, "name" : "William", "dept" : 3 }

In terms of the actual work done, there are the same number of indexes
involved, and the same number of operations involved as with the SQL
example.  The only difference is that I had to build one portion of the
query in the client side instead of on the server side.

Let me know if you have further questions.

 -William


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Nov 13 2012, 6:10 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Tue, 13 Nov 2012 03:10:15 -0800 (PST)
Local: Tues, Nov 13 2012 6:10 am
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi William - thanks again for the detailed reply

I can see how doing this for a query that requires information for one
record is efficient, I however need to do a look up for multiple records.

If I pull all the detail records in one query - then the client side
"detail" look up will be inefficient as there is no "index" concept in the
client side code.

If I do a query to the db for each detail record then I get 100s of db
connections which is also a big overhead.

So my understanding is I would need to update my data structure to address
this efficiently.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Millman  
View profile  
 More options Nov 13 2012, 7:32 am
From: Sam Millman <sam.mill...@gmail.com>
Date: Tue, 13 Nov 2012 12:31:27 +0000
Local: Tues, Nov 13 2012 7:31 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

"If I do a query to the db for each detail record then I get 100s of db
connections which is also a big overhead."

You only get one, and it should be a persistent one.

"If I pull all the detail records in one query - then the client side
"detail" look up will be inefficient as there is no "index" concept in the
client side code."

Indexes on JOINs are a questionable matter, I have seen many many times in
SQL where a simple JOIN has resulted in the index not being used for some
unknown reason even if that JOIN is specifically defined within the
internals of MySQL (which is what I was using at the time) using pk and fk
format, most specifically on non-standard ranges of data. You have also
gotta consider that JOINs can become a pain to scale in certain situations.

You have got to consider that in MySQL the client side "detail" lookup for
each record would be slow because of what it takes to make a SQL query and
result set for every row instead of a large JOINed result set and most
likely not because of the "index" concept.

MongoDB is quite different in how it queries, it is used to querying in
this way.

There is one caveat you will get into soon however, which I think needs
fixing, which is that MongoDB has no server-side sub selects, so if you
wanna do an $in based on another query you must pull the data over the wire
and operate on it client side even if you don't need those initial rows,
which is a little painful.

On 13 November 2012 11:10, Brent Gracey <brentgra...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Millman  
View profile  
 More options Nov 13 2012, 7:32 am
From: Sam Millman <sam.mill...@gmail.com>
Date: Tue, 13 Nov 2012 12:32:08 +0000
Local: Tues, Nov 13 2012 7:32 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

"You have got to consider that in MySQL"

Sorry I mean any SQL tech there.

On 13 November 2012 12:31, Sam Millman <sam.mill...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Nov 13 2012, 8:39 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Tue, 13 Nov 2012 05:39:19 -0800 (PST)
Local: Tues, Nov 13 2012 8:39 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

Thanks for the reply,

I know very well that joins in SQL can have their own nuances, but feel
that is a separate discussion. And yes, connection pooling will reduce the
overhead of multiple db queries, but it is still an overhead.

My current situation is I am doing multiple detail queries - and the report
is taking about 30 seconds to complete for  ~200 records, and my boss wants
it to go faster, and I don't know how to that without telling him we need
to re-architect our data structure.

I haven't got the exact breakdown of what is taking the time - so can't
tell you how much "overhead" the multiple db connections is introducing,
but I'll work on some logging to break down exactly where time is being
spend in the report.

Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Millman  
View profile  
 More options Nov 13 2012, 9:22 am
From: Sam Millman <sam.mill...@gmail.com>
Date: Tue, 13 Nov 2012 14:22:40 +0000
Local: Tues, Nov 13 2012 9:22 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

At 200 records it should be giving 0 overhead.

You are right that the initial post was a little tl;dr. I definitely missed
it.

So let's start fresh: if you turn on logging for MongoDB and check the log
for slow queries do you see any?

On 13 November 2012 13:39, Brent Gracey <brentgra...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Nov 14 2012, 5:51 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Wed, 14 Nov 2012 02:51:17 -0800 (PST)
Local: Wed, Nov 14 2012 5:51 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

Hi - so I done some logging to work out what is taking the time

First, a big correction - I'm actually hitting the db 2000 times, not 200,
I forgot that 1800 records don't actually do anything in this report, but I
have to go look for a detailed record for them before I know that.

Logging from my application code

09:27:31.409 [qtp741338313-114 - /man/json/counts] DEBUG
bootstrap.liftweb.RequestAnalyzer - Total request time on /json/counts: 0 ms
09:27:31.421 [qtp741338313-114 - /man/json/counts] DEBUG
c.b.sojo_manager.api.RestAPI - Starting to generate report
09:27:32.446 [qtp741338313-114 - /man/json/counts] DEBUG
c.b.sojo_manager.api.RestAPI - Got all users from the database -> This is
the first select that returns 2000 users
09:27:41.531 [qtp741338313-114 - /man/json/counts] DEBUG
c.b.sojo_manager.api.RestAPI - Have looped through all users and checked
which ones have devices with connections -> This is after looping through
all 2000 and doing a detail lookup in mongo, which takes about 10 seconds.
09:27:41.719 [qtp741338313-114 - /man/json/counts] DEBUG
c.b.sojo_manager.api.RestAPI - Have created the from to mapping for the
force diagram
09:27:41.719 [qtp741338313-114 - /man/json/counts] DEBUG
bootstrap.liftweb.RequestAnalyzer - Total request time on /json/counts:
10301 ms

I enabled  db.setProfilingLevel(2) - and after capturing commands for just
this report

> db.system.profile.find().count()

2845

Then I ran a summary - to just group everything and give a total "db" run
time - which comes in at 1794 milli sec, so I don't think the main part of
the 10 seconds is db execution. I am working on refining the application
logging to get a better view of what components of the query are taking the
remaining 8 seconds.

db.system.profile.group(
{ cond: {   }
, key: {client: "127.0.0.1"}
, initial: {count: 0, total_time:0}  
, reduce: function(doc, out){ out.count++; out.total_time+=doc.millis }
, finalize: function(out){ out.avg_time = out.total_time / out.count }

} );

OUTPUT

[
{
"client" : "127.0.0.1",
"count" : 2845,
"total_time" : 1794,
"avg_time" : 0.6305799648506151

}

]

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Millman  
View profile  
 More options Nov 14 2012, 9:57 am
From: Sam Millman <sam.mill...@gmail.com>
Date: Wed, 14 Nov 2012 14:57:42 +0000
Local: Wed, Nov 14 2012 9:57 am
Subject: Re: [mongodb-user] Re: any alternative to client side processing in place of what would be a RDMS join?

1794ms still sounds quite slow for 2000, especially since that figure comes
from the database; it, however, appears that the main bottleneck is the
app. Let's see what you find there :)

On 14 November 2012 10:51, Brent Gracey <brentgra...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Zola  
View profile  
 More options Nov 14 2012, 2:24 pm
From: William Zola <william.z...@10gen.com>
Date: Wed, 14 Nov 2012 11:24:26 -0800 (PST)
Local: Wed, Nov 14 2012 2:24 pm
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

I'm not sure I understand your issues here.  You're right that doing the
lookup on the client side is expensive.  I don't understand your concerns
about running multiple queries, though.  

Here is some sample Python code: it only uses one database connection, and
a maximum of two database cursors:

def setup() :
    port = sys.argv[1]
    conn = Connection('localhost', int(port) )
    return conn.test

def query_dept_emp(db) :
    dept = db.dept
    emp = db.emp
    for ddoc in dept.find().sort("name",1) :
        print "Employees in", ddoc["name"]
        for edoc in emp.find( {"dept": ddoc["_id"] } ).sort("name",1) :
            print "  ", edoc["name"]

def main() :
    if len(sys.argv) != 2:
        print "usage: ", sys.argv[0], " <port number>"
        exit(1)

    db = setup()
    query_dept_emp(db)

if __name__ == "__main__":
    main()

As I said before: this is no different than what an RDBMS engine would do.  
The only difference is that you're running the query on the client side,
instead of having the server run it for you invisibly on the server side.

If you're running into a separate performance issue, then you'll need to
diagnose that.  

It's not clear from your application logs what language you're using, or
how you've structured your code.  If it doesn't look (roughly) like the
sample above, then you're probably doing something inefficient.

Let me know if you have further questions.

 -William


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brent Gracey  
View profile  
 More options Nov 16 2012, 7:22 am
From: Brent Gracey <brentgra...@gmail.com>
Date: Fri, 16 Nov 2012 04:22:26 -0800 (PST)
Local: Fri, Nov 16 2012 7:22 am
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi,

Its a scala application  - and I'm a few layers of abstraction above the
actual db connection. But it comes down to
/mongo-java-driver-2.6.5.jar!/com/mongodb/DBTCPConnector.class and if this
is the right documentation
http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency "The Mongo
object maintains an internal pool of connections to the database (default
pool size of 10)"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
William Zola  
View profile  
 More options Nov 26 2012, 2:49 pm
From: William Zola <william.z...@10gen.com>
Date: Mon, 26 Nov 2012 11:49:49 -0800 (PST)
Local: Mon, Nov 26 2012 2:49 pm
Subject: Re: any alternative to client side processing in place of what would be a RDMS join?

Hi Brent!

I don't know Scala at all, but I believe your concern to be unfounded.

The Scala driver, like the Java driver, does *not* use a new connection for
each query.  If you've got a connection from the MongoConnection() method
(or whatever your framework wraps around that structure), then you can have
as many queries as you like multiplexed on that single connection.

In my code sample from before:

def query_dept_emp(db) :

>     dept = db.dept
>     emp = db.emp
>     for ddoc in dept.find().sort("name",1) :
>         print "Employees in", ddoc["name"]
>         for edoc in emp.find( {"dept": ddoc["_id"] } ).sort("name",1) :
>             print "  ", edoc["name"]

This code opens two cursors but only uses a single connection.  The same
would be true of the equivalent code in Scala or Java.  The dept.find() opens
a new cursor using the existing connection, and the emp.find() opens a
different cursor using the same connection.  Queries to the two different
cursors can be interleaved on a single connection without any problem.

Let me know if you have further questions.

 -William

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »