Re: Return distinct keys from MongoDB using asyncmongo

260 views
Skip to first unread message

aliane abdelouahab

unread,
Sep 24, 2012, 5:29:06 PM9/24/12
to Tornado Web Server
have you tried it using this:
http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/

On 24 sep, 22:14, L-R <laur...@human.co> wrote:
> I'm using asyncmongo with Tornado + gen.engine, and just wondering what the
> syntax is for the equivalent of db.collection.distinct("mykey")?
>
> I assume it's something like
>
> result, error = yield gen.Task(settings.DB.my_data.find, {},
> distinct=[("key","mykey")])
>
> But that won't work. Anyone know how to do this?
>
> Thanks.

L-R

unread,
Sep 25, 2012, 12:10:23 AM9/25/12
to python-...@googlegroups.com
I've heard of motor, yeah. But installing a new driver just to grab distinct values? I'd be essentially replacing my async driver with another async driver.

aliane abdelouahab

unread,
Sep 25, 2012, 5:27:42 AM9/25/12
to Tornado Web Server
am not good (at all) in async, and from what i've read in forums and
blogs, asyncmongo must be re-written everytime pymongo will get
updated, and which is not the case with
Motor, and it seems that Motor will be officialy maintained by 10Gen,
so it will support everthing offered by pymongo (like gridfs).
am sorry i cant go further, am just a beginner that love learning a
reading a lot :D


On 25 sep, 05:10, L-R <laur...@human.co> wrote:
> I've heard of motor, yeah. But installing a new driver just to grab
> distinct values? I'd be essentially replacing my async driver with another
> async driver.
>
>
>
>
>
>
>
> On Monday, September 24, 2012 5:29:09 PM UTC-4, aliane abdelouahab wrote:
>
> > have you tried it using this:
>
> >http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb...

Dan Yamins

unread,
Sep 25, 2012, 9:28:34 AM9/25/12
to python-...@googlegroups.com
Not only will asyncmongo not stay current with newer versions of pymongo unless updated explicitly, it never actually correctly reproduced basic pymongo functions to begin with.  For example, as the creator of motor has said: "AsyncMongo can't connect to a replica set, can't get more than the first batch of query results (around 100 documents), can't tail a capped collection, can't easily create indexes, and is probably missing a hundred other features and bugfixes. It would take a lot of work just to figure out which improvements need to be ported from PyMongo to AsyncMongo, much less to actually port them."

As someone who wrote his own asynchronous mongo driver for all these reasons (apymongo), I suggest that you, and anyone else using asyncmongo, to switch to a different driver as early as is convenient.  To my mind, Motor seems the nicest.  

aliane abdelouahab

unread,
Sep 25, 2012, 9:58:18 AM9/25/12
to Tornado Web Server
yes, and since it will be the next Official Async MongoDB driver for
Python, then it will be a good idea to learn it, for my case, i should
begin by learning the basic async functions!

On Sep 25, 2:28 pm, Dan Yamins <dyam...@gmail.com> wrote:
> Not only will asyncmongo not stay current with newer versions of pymongo
> unless updated explicitly, it never actually correctly reproduced basic
> pymongo functions to begin with.  For example, as the creator of motor has
> said: "AsyncMongo can't connect to a replica
> set<http://www.mongodb.org/display/DOCS/Replica+Sets>,
> can't get more than the first batch of query results (around 100
> documents), can't tail a capped collection, can't easily create indexes,
> and is probably missing a hundred other features and bugfixes. It would
> take a lot of work just to figure out which improvements need to be ported
> from PyMongo to AsyncMongo, much less to actually port them."
>
> As someone who wrote his own asynchronous mongo driver for all these
> reasons (apymongo), I suggest that you, and anyone else using asyncmongo,
> to switch to a different driver as early as is convenient.  To my mind, Motor
> seems the nicest.
>
> On Tue, Sep 25, 2012 at 5:27 AM, aliane abdelouahab <alabdeloua...@gmail.com

A. Jesse Jiryu Davis

unread,
Sep 25, 2012, 11:29:07 AM9/25/12
to python-...@googlegroups.com
Thanks for all the kind words about Motor. But if you're already using AsyncMongo and you just need to do a distinct query, you don't need to switch to Motor.

The problem is, "distinct" not an option for a query, it is a separate command:

http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

So use AsyncMongo's command() method:

>>> from tornado.ioloop import IOLoop
>>> import asyncmongo
>>> db = asyncmongo.Client(pool_id='mydb', host='127.0.0.1', port=27017, maxcached=10, maxconnections=50, dbname='test')
>>> def callback(result, error):
...     print result
...     IOLoop.instance().stop()
...
>>> db.command('distinct', 'my_data', key='my_key', callback=callback)
>>> IOLoop.instance().start()
{u'stats': {u'cursor': u'BasicCursor', u'timems': 0, u'nscannedObjects': 5, u'nscanned': 5, u'n': 5}, u'values': [1.0, 2.0], u'ok': 1.0}

The data you need is in result['values'].

More examples of using commands from AsyncMongo are in its test suite:

https://github.com/bitly/asyncmongo/blob/master/test/test_command.py

And information on MongoDB commands in general (the examples are in PHP but it's easy to understand even for Python coders like us):

http://www.kchodorow.com/blog/2011/01/25/why-command-helpers-suck/

A. Jesse Jiryu Davis

unread,
Sep 25, 2012, 11:30:59 AM9/25/12
to python-...@googlegroups.com
Of course, it is a little nicer in Motor:

>>> import motor
>>> from tornado.ioloop import IOLoop
>>> db = motor.MotorConnection().open_sync().test

>>> def callback(result, error):
...     print result
...     IOLoop.instance().stop()
...
>>> db.my_data.distinct('my_key', callback=callback)
>>> IOLoop.instance().start()
[1.0, 2.0]

Serge S. Koval

unread,
Sep 25, 2012, 11:43:16 AM9/25/12
to python-...@googlegroups.com
Biggest problem I see with Motor - it uses greenlets.

If you can use greenlets, why not use gevent instead of Tornado and write all code without thinking about callbacks?

Why it matters:
1. Compatibility. Greenlets won't work in PyPy. Well, they will, but PyPy will disable JIT for code which uses them.
2. Greenlet implementation in CPython is scary. They're great when they work though.

Too bad that "pure" AsyncMongo is falling behind in terms of features..

Serge.

A Jesse Jiryu Davis

unread,
Sep 25, 2012, 12:09:20 PM9/25/12
to python-...@googlegroups.com
Gevent is great, I can't dispute that, and PyMongo already works well
with Gevent. The point of Motor is if you have some prior reason to
use Tornado instead of Gevent, e.g. you've built a Tornado application
already and you want to add MongoDB to your stack, or you depend on a
library that works with Tornado but not Gevent, or you have reason to
believe Tornado will perform better than Gevent for your application.

1. I'm unclear exactly the status of PyPy's JIT and greenlets. I know
PyPy's greenlets work -- I've tested Motor with PyPy. This bug
suggests some disabling of the JIT, but does it mean the JIT is
useless or only somewhat hampered?: https://bugs.pypy.org/issue895

2. What about CPython's greenlets is "scary"? Have you seen any issues
with them?

L-R

unread,
Sep 25, 2012, 12:17:05 PM9/25/12
to python-...@googlegroups.com
hey Jesse,
thanks for the info, indeed I realized that asyncmongo didn't seem to have this as a query. I spent the day looking at motor, and I indeed will be switching! Not so much because of this specific example, but I love this stuff http://emptysquare.net/motor/pymongo/api/motor/generator_interface.html#generator-interface, which happens all the time in the app I'm building. Being able to wait for 2 async calls to finish and then running a callback afterwards is just what I needed (the only answer on SO about this recommends to nest callbacks into one another...eurk). Plus all the added little bonuses of course :)

Cheers!

Serge S. Koval

unread,
Sep 25, 2012, 12:35:24 PM9/25/12
to python-...@googlegroups.com
1. Here's the post: http://morepypy.blogspot.com/2011/11/pypy-17-widening-sweet-spot.html
Quote:
PyPy now comes with stackless features enabled by default. However, any loop using stackless features will interrupt the JIT for now, so no real performance improvement for stackless-based programs. Contact pypy-dev for info how to help on removing this restriction.

As far as I know, nothing has changed so far and PyPy still disables JIT for greenlet-enabled code. I'm not sure if it disables JIT for whole program or just for functions which use greenlet interface.

2. Greenlet is a hack: it has small bit of assembly, which copies stack data into heap data and patches cpython interpreter structures to make it think that nothing bad happened. 

There are some known problems with garbage collection - if greenlet references itself, it is guaranteed memory leak. There were some problems with C modules - you had chance to catch bizarre stack corruptions when C module calls your python code. There was also memory leak when there is more than one thread using greenlets, which was fixed by Mitsuhiko like 2 months ago.

So, to summarise - it is magic, which works most of the time, but might lead to debugging nightmare. Not sure how PyPy implements greenlets, but as I heard - they have "proper" (but slow) implementation.

Anyway, there's no silver bullet :-\

Serge.

Even it is pretty old, nothing has changed.

Ben Darnell

unread,
Sep 25, 2012, 1:29:33 PM9/25/12
to python-...@googlegroups.com
Greenlet's implementation is kind of scary when you get into its guts,
but so is a JIT, or a garbage collector, etc. It's important to
distinguish greenlet (which just does pseudo-threading, and is what
motor uses) from the more comprehensive frameworks built on top of it
like eventlet and gevent. The latter are risky because of the way
they monkey patch python socket objects, so they're subtly
incompatible with many C extensions, etc. I think that motor is a
nice way to use greenlets to bridge the sync/async gap, but I'd stay
away from gevent (and either use real threads or do everything
asynchronously).

-Ben

L-R

unread,
Sep 25, 2012, 2:06:02 PM9/25/12
to python-...@googlegroups.com
Having a small issue tho : how would I go about using the last example of http://emptysquare.net/motor/pymongo/api/motor/generator_interface.html#motor.WaitAllOps (waitAllOps), but using find() instead of findOne()? I'm getting "InvalidOperation: Pass a callback to next_object, each, to_list, count, or tail, not to find".

A Jesse Jiryu Davis

unread,
Sep 25, 2012, 2:11:07 PM9/25/12
to python-...@googlegroups.com
Yeah, I have clearly not documented this well enough. find_one() passes the one document to the callback, but find() returns a MotorCursor, which supports to_list(callback), each(callback), and next(callback). So the example would become:

@gen.engine
def get_two_documents_in_parallel(db, id_one, id_two):
    db.collection.find(
        {'query': 'goes here'}
    ).to_list(callback=(yield gen.Callback('one')))

    db.collection.find(
        {'query': 'goes here'}
    ).to_list(callback=(yield gen.Callback('two')))

    try:
        documents_one, documents_two = yield motor.WaitAllOps(['one', 'two'])
        return documents_one, documents_two
    except Exception, e:
        print e

L-R

unread,
Sep 25, 2012, 2:15:45 PM9/25/12
to python-...@googlegroups.com
Ah, I see! Muchos thanks.

A Jesse Jiryu Davis

unread,
Sep 25, 2012, 2:27:16 PM9/25/12
to python-...@googlegroups.com
You're welcome; sorry that's so unclear in the docs.
Reply all
Reply to author
Forward
0 new messages