--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
I've multiple threads running, each is fetching data from different table.
In each thread, I first get Mongo from a singleton. It then loops thru
document and convert them to TableRow object and then to a ArrayList<Object>.
[See code for one of the thread.]
1. The time for a thread to run collection.find(dbQuery) seems inconsistent.
Have seen 5ms to 160ms (for the same query). Running mongo on 68G EC2
instance. Index size + storage size is around 32G. All the query fields are
indexed. Any suggestion how I can troubleshoot this. Want to get the query
time down >10ms.
2. Any suggestion on fastest way to convert DBObject to Java Object?
Does Morphia or other POJO mapper have lower overhead than
calling BasicDBObject.getString() and BasicDBObject.getDouble()?
Mongo mongo = MongoConnnection.getInstance().getMongo();
ArrayList<Tuple> tuples = new ArrayList<Tuple>();
long fStart = System.currentTimeMillis();
DB db = mongo.getDB(dbName);
if (db != null) {
DBCollection coll = db.getCollection(collectionName);
DBCursor cur = null;
cur = coll.find(dbQuery);
while (cur.hasNext()) {
BasicDBObject dbo = (BasicDBObject) cur.next();
TableRow row = new TableRow(dbo);
tuples.add(row.getTuple());
long time = (timeout - (System.currentTimeMillis() - fStart));
if (time < 0) {
break;
}
}
----- Original Message ----
From: Eliot Horowitz <elioth...@gmail.com>
To: mongod...@googlegroups.com
Sent: Wed, September 8, 2010 9:57:10 PM
Subject: Re: [mongodb-user] Suggestion on speeding up data fetch and conversion
Are field names case sensitive? For example, does Mongo treat
"lM" and "lm" as the same field name?
root 14757 29.3 33.7 36246288 24171516 ? Sl 17:18 66:01
/root/mongodb-linux-x86_64-1.6.2/bin/mongod --dbpath /mnt/data/lp --slave
--source ip-10-166-57-74:10000 --port 10000 --fork --logpath /mnt/data/logs/db
--maxConns 500 --oplogSize 50000 --autoresync
See
com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection
in webserver log.
http://groups.google.com/group/mongodb-user/browse_thread/thread/c699b5fde98eafe9
Set opt.connectionsPerHost=1000. Will rerun load test.
----- Original Message ----
From: Joseph Wang <joseph...@yahoo.com>
To: mongod...@googlegroups.com
in webserver log.
http://www.mongodb.org/display/DOCS/Optimization#Optimization-UsingtheProfiler
Didn't see any information related to my collection. I used
http://www.mongodb.org/display/DOCS/Database+Profiler
to setup profiler. Anything else that I need to do? How do I set it to profile
queries
that are slower than 10ms?
[root@ip-10-166-59-166 ~]# mongodb-linux-x86_64-1.6.2/bin/mongo localhost:10000
MongoDB shell version: 1.6.2
connecting to: localhost:10000/test
> db.getProfilingLevel()
0
> db.setProfilingLevel(2);
{ "was" : 0, "slowms" : 100, "ok" : 1 }
> db.getProfilingLevel()
2
> db.system.profile.find( { millis : { $gt : 0}})
> db.system.profile.find()
{ "ts" : "Sat Sep 18 2010 10:58:00 GMT-0400 (EDT)", "info" : "query test.$cmd
ntoreturn:1 command: { profile: -1.0 } reslen:74 bytes:58", "millis" : 0 }
{ "ts" : "Sat Sep 18 2010 11:13:30 GMT-0400 (EDT)", "info" : "query
test.system.profile reslen:36 nscanned:1 \nquery: { millis: { $gt: 0.0 } }
nreturned:0 bytes:20", "millis" : 0 }
----- Original Message ----
From: Eliot Horowitz <elioth...@gmail.com>
To: mongod...@googlegroups.com
Sent: Wed, September 8, 2010 9:57:10 PM
Subject: Re: [mongodb-user] Suggestion on speeding up data fetch and conversion
1) Is it safe to store mongo.getDB() object across multiple threads?
2) For read query, do we still need to call db.requestStart() and
db.requestDone().
3) For the same query, timing for
while (cur.hasNext())
loop varies from 800ms to 20ms. Tried running profiling option on mongo
server. Didn't see anything related to the collection.
a) If mongo auto packs result up to 4M, is there a danger in specifying
batchSize() that may return results that is greater than 4M?
b) What is the difference between calling DBCursor.size() and
DBCuror.count()?
c) Is there a safe way to turn each chunk of results into an array?
Something like
List<DBObject> results = new List<DBObject>();
while (cur.hasNext()) {
List<DBObject> list = cur.toArray();
results.addAll(list);
}
d) Any explanation why the performance varies so much for the same query?
Code fragment:
ArrayList<Tuple> tuples = new ArrayList<Tuple>(200);
DB db = mongo.getDB(dbName);
if (db != null) {
db.requestStart();
DBCollection coll = db.getCollection(collectionName);
DBCursor cur = null;
cur = coll.find(dbQuery).addOption(Bytes.QUERYOPTION_SLAVEOK);
while (cur.hasNext()) {
BasicDBObjectdbo = (BasicDBObject) cur.next();
BaseTableRow row = new BaseTableRow(dbo);
tuples.add(row.getTuple());
}
db.requestDone();
}
Yes.
> 2) For read query, do we still need to call db.requestStart() and
> db.requestDone().
No - you only need to use those if you are doing writes then reads and
need the reads to know about the writes.
> 3) For the same query, timing for
> while (cur.hasNext())
> loop varies from 800ms to 20ms. Tried running profiling option on mongo
> server. Didn't see anything related to the collection.
> a) If mongo auto packs result up to 4M, is there a danger in specifying
> batchSize() that may return results that is greater than 4M?
No - it will make sure its sane.
> b) What is the difference between calling DBCursor.size() and
> DBCuror.count()?
See the javadocs - count is total, size looks at skip/limit.
> c) Is there a safe way to turn each chunk of results into an array?
> Something like
> List<DBObject> results = new List<DBObject>();
> while (cur.hasNext()) {
> List<DBObject> list = cur.toArray();
> results.addAll(list);
> }
DBCursor has a toArray() method on it.
> d) Any explanation why the performance varies so much for the same query?
>
>
> Code fragment:
> ArrayList<Tuple> tuples = new ArrayList<Tuple>(200);
> DB db = mongo.getDB(dbName);
> if (db != null) {
> db.requestStart();
>
> DBCollection coll = db.getCollection(collectionName);
> DBCursor cur = null;
>
> cur = coll.find(dbQuery).addOption(Bytes.QUERYOPTION_SLAVEOK);
> while (cur.hasNext()) {
> BasicDBObjectdbo = (BasicDBObject) cur.next();
> BaseTableRow row = new BaseTableRow(dbo);
> tuples.add(row.getTuple());
> }
> db.requestDone();
> }
>
Could be various things. How/where are you measuring?
To make sure its from previous writes - do a getLastError right before
doing the query if measuring client side.
> 3) For the same query, timing for
> while (cur.hasNext())
> loop varies from 800ms to 20ms. Tried running profiling option on mongo
> server. Didn't see anything related to the collection.
> a) If mongo auto packs result up to 4M, is there a danger in specifying
> batchSize() that may return results that is greater than 4M?
Elliots wrote:
No - it will make sure its sane.
Does that mean that we will only gain performance improvement if we
know the result set is less than 4M?
> d) Any explanation why the performance varies so much for the same query?
>
Elliots wrote:
Could be various things. How/where are you measuring?
To make sure its from previous writes - do a getLastError right before
doing the query if measuring client side.
Simply running queries. No write. In production, we'll write to master only
and read from slaves. I'm simply counting time difference before
and after the while (cur.hasNext()) loop.
On master and slave, I set maxConnection to 2000. On the webserver,
I add both master and slave to ArrayList<ServerAddress>. In the
collection.find(), I had set Bytes.QUERYOPTION_SLAVEOK option.
However, it seems the java driver only connects to master. In the quer
Both master and slave are running on port 10000.
[root@ip-10-160-26-159 ~]# netstat -an | grep 10000 | wc -l
2000
[root@ip-10-160-26-159 ~]# netstat -an | grep 10000 | grep -v 10.166.57.74 | wc
-l
0
[root@ip-10-160-26-159 ~]# netstat -an | grep 10000 | grep 10.166.57.74 | wc -l
2000
Connection code:
MongoOptions opt = new MongoOptions();
opt.autoConnectRetry = true;
opt.connectionsPerHost = 2000;
String[] servers = server_list.split(",");
ArrayList<ServerAddress> addr = new ArrayList<ServerAddress>();
int serverCount = 0;
for (int i = 0; i < servers.length; ++i) {
String[] serverInfo = servers[i].split(":");
try {
if (serverInfo.length == SERVER_INFORMATION_FIELD_SIZE) {
ServerAddress host = new ServerAddress(
serverInfo[SERVER_NAME_FIELD],
Integer.parseInt(serverInfo[SERVER_PORT_FIELD]));
addr.add(host);
serverCount++;
}
} catch (Exception ex) {
}
}
if (serverCount > 0) {
m = new Mongo(addr, opt);
}
Query code:
DBCollection coll = db.getCollection(collectionName);
DBCursor cur = null;
cur = coll.find(dbQuery).addOption(Bytes.QUERYOPTION_SLAVEOK);
----- Original Message ----
From: Joseph Wang <joseph...@yahoo.com>
To: mongod...@googlegroups.com
Does that mean that we will only gain performance improvement if we
know the result set is less than 4M?