Multi-thread query efficiency

216 views
Skip to first unread message

屎克螂

unread,
Apr 2, 2016, 6:53:06 AM4/2/16
to mongodb-user
import pymongo
conn = pymongo.Connection('localhost',27017)
db = conn.myproj

def xx(i):
    a=time.time()
    d = list(db.app_login_logout_log.find())
    print 'time:',time.time() - a, 'count:',len(d)

for i in range(1):
    time.sleep(0.1)
    t = threading.Thread(target=xx, args=(i,))
    t.start()

time.sleep(3)
print 'Three thread each time spent at the same time:'
for i in range(3):
    time.sleep(0.1)
    t = threading.Thread(target=xx, args=(i,))
    t.start()

out:
time: 1.94099998474 count: 479301
Three thread each time spent at the same time:
time: 6.62199997902 count: 479301
time: 6.79799985886 count: 479301
time: 6.9240000248 count: 479301

Multithreaded queries used it for longer periods of time, whether to have optimized code?

Kevin Adistambha

unread,
Apr 11, 2016, 10:28:28 PM4/11/16
to mongodb-user

Hello,

The reason for the threaded performance you are seeing is due to Python’s Global Interpreter Lock, which allows only one thread to perform a computation.

In your code, the “computation” is the line

d = list(db.app_login_logout_log.find())

where the GIL is held by the thread that is converting BSON into Python data structure.

The GIL is not held for IO, but since the mongod you’re connected to is in localhost, comparatively little time is spent doing IO.

Using the multiprocessing module allows the program to scale up a bit better. For example, by modifying the code a little to be:

import pymongo
import sys
import time
from multiprocessing import Process

def xx(i):
    conn = pymongo.MongoClient('localhost',27017)
    db = conn.test
    print i, 'started'
    a = time.time()
    d = list(db.test.find().limit(100000))
    print i,'finished. time:',time.time() - a

procs = [Process(target=xx, args=(i,)) for i in range(int(sys.argv[1]))]

start = time.time()
for p in procs:
    p.start()

for p in procs:
    p.join()

print 'all done: %.2f' % (time.time() - start)

The output shows better scaling vs. using threads:

$ python foo.py 1
0 started
0 finished. time: 0.338715076447
all done: 0.35
$ python foo.py 10
0 started
1 started
2 started
3 started
4 started
5 started
6 started
7 started
8 started
9 started
4 finished. time: 1.09985303879
5 finished. time: 1.10895490646
0 finished. time: 1.11398696899
8 finished. time: 1.12296009064
3 finished. time: 1.13394904137
6 finished. time: 1.13271999359
2 finished. time: 1.13802504539
9 finished. time: 1.13268995285
7 finished. time: 1.13850402832
1 finished. time: 1.15410399437
all done: 1.17

For more information, please see Thread State and the Global Interpreter Lock.

Best regards,
Kevin

屎克螂

unread,
Apr 18, 2016, 10:37:37 PM4/18/16
to mongodb-user
thank you  Kevin Adistambha

在 2016年4月12日星期二 UTC+8上午10:28:28,Kevin Adistambha写道:
Reply all
Reply to author
Forward
0 new messages