How to store open connection to external MongoDB

315 views
Skip to first unread message

Minie Takalova

unread,
Nov 4, 2015, 2:39:17 AM11/4/15
to Google App Engine
Hello

I'am trying to build application on GAE which use MongoDB hosted at Mongolab provider.
Connection work, if I made new connection every request, but if I want to reuse once established connection, GAE freeze and die on timeout.
I believe that exist some simple way, how to do that such simple task.


-----  This is woring, but open new connection every request: ------
def get_database_connection():
    client = pymongo.MongoClient(MONGO_URI)
    DBCONN = client.get_default_database()
    return DBCONN

DBCON = get_database_connection()
 ... do something usefull with DBCONN ...
 ... not store connection anywhere ....
 ... end request ....


---- This freeze GAE and die on timeout ---
def get_database_connection():
    if "DBCONN" in globals():             # look for open connection in instance memory
        return globals()["DBCONN"]
    client = pymongo.MongoClient(MONGO_URI)
    DBCONN = client.get_default_database()
    globals()["DBCONN"] = DBCONN  # store connection to instance memory
    return DBCONN

DBCONN = get_database_connection()
 ... do something usefull with DBCONN ...
 ... end request ....

--------

Probably GAE module with backend thread can be uset to keep connection open in separate thread, but after many hours I can´t find working solution.
Hope somebody can experience with this usecase and can help me. Also link to well-tried solution will be helpfull.

Have a nice day.




Jeff Schnitzer

unread,
Nov 4, 2015, 2:38:59 PM11/4/15
to Google App Engine
I'm using MongoDB as an analytics platform (not a primary datastore) from classic GAE/Java. I had to hack the driver pretty seriously with some help from the author. I haven't looked at the Python driver code, but in very general terms, what I had to do:

 * Stop the monitoring thread from being created
 * Manually pump the monitor method once on startup
 * Pump the monitor method whenever an error occurs
 * Wrap all mongo access in a proxy that retries everything once (sockets timeout frequently)

This is not for the faint of heart but it is working very well for me. I suspect it might be more complicated if using a cluster (I have only a single instance). I wouldn't use this to treat mongo as a primary datastore, but for running aggregations on a slave copy it's great. And at $150/mo for a 26G compute engine instance it's an order of magnitude cheaper than Cloud SQL and way faster than BigQuery or Keen (at least, for my dataset).


Looking at the diffs might help you attack the Python driver, dunno.

Suerte,
Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/33df8ac3-d561-4726-8fbe-89bcaf14a9e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick (Cloud Platform Support)

unread,
Nov 4, 2015, 2:55:00 PM11/4/15
to Google App Engine
Speaking at a high level, I don't see any reason why a background thread holding a persistent connection object in memory shouldn't work. In fact, I'm a little curious why the globals lookup is failing you with a timeout at all - it doesn't seem clear how or why this would happen. What kind of timeout is this, on mongo, on your user's request, within GAE, etc.? Is it possible that you see a coincidental timeout when reaching your DB, but the client object is well-retrieved from globals?

The advice to modify the mongodb connector might be useful or relevant, although I personally can't say at this moment without debugging this situation more to see if that's a cause. It could be that this is expected behaviour given certain configuration steps or state conditions, which then results in the "timeout" but is ultimately avoidable through some means without changing anything in the mongo connector.

Minie Takalova

unread,
Nov 4, 2015, 4:39:20 PM11/4/15
to Google App Engine
Thank you for your responses. 

Re. Jeff Schnitze:
Nice mind-shift to change behaviour of driver. This way solution doesn't occured me.  On the other side it is sad. Instead of 5 lines code to rebuild driver. I keep this solution as last chance if I find nothing easier. 
Maybe it eventually prove, as the simplest solution.

Re. Nick:
My opinion is, that for connection is open new thread, but GAE frontend want to every opened thread stopped before finaly return result. 


--- The timeout mesage is: ----

Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.

--- Line of code where GAE freeze ---
It freeze on response. Doesn't matter which type of response it is. It can be simple response:
return HttpResponse('Done')

---Logs from development console ---
       .....

  File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/mongo_client.py", line 372, in __init__
    executor.open()
  File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/periodic_executor.py", line 64, in open
thread.start() File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/threading.py", line 505, in start _start_new_thread(self.__bootstrap, ()) File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/runtime.py", line 82, in StartNewThread return base_start_new_thread(Run, ())

--------------
     ....
  File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/mongo_client.py", line 353, in __init__
self._topology.open() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/topology.py", line 60, in open
self._ensure_opened() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/topology.py", line 273, in _ensure_opened
self._update_servers() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/topology.py", line 341, in _update_servers
server.open() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/server.py", line 35, in open
self._monitor.open() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/monitor.py", line 74, in open
self._executor.open() File "/base/data/home/apps/MyAPP/1.388269338940512065/pymongo/periodic_executor.py", line 64, in open
thread.start() File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/threading.py", line 505, in start _start_new_thread(self.__bootstrap, ()) File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/runtime.py", line 82, in StartNewThread return base_start_new_thread(Run, ())
--------------------
Threads started by this request continued executing past the hard deadline.





timh

unread,
Nov 5, 2015, 8:04:25 AM11/5/15
to Google App Engine
This just won't work on a front end request.  Any thread you create will be destroyed at the end of the request.

The appengine front end is not designed for this type of capabiltiy.

You could consider running your mongo interface on a managed vm, and then use some sort of simple rpc mechanism to talk to this interface.
Alternately find a light weight connection service that only exists for the duration of a request.

T

Minie Takalova

unread,
Nov 5, 2015, 3:34:52 PM11/5/15
to Google App Engine
Hi

Re. timh: OK, I understand that keep open connection which is based on threads, in frontend is prohibited. So, what is the easiest and clearest way how to solve it?

What I tried:

* convinced PyMongo to work without threads. Tried many parameters (from here http://api.mongodb.org/python/current/api/pymongo/mongo_client.html ) during conectin established but still GAE end on timeout.

* observe another connecion service like Motor  https://motor.readthedocs.org/en/stable/  asyncmongo https://github.com/bitly/asyncmongo  micromongo http://pythonhosted.org/micromongo/  mongoengine http://mongoengine.org/
Most of them depend on pymongo and another ones have unsuitable atributes. Maybe, after many hours of research some of them will be usefull.

* Tried to encapsulate connection to co-routine functionality. This is working, but only on development server.
def coroutine_db_connection():
    client = pymongo.MongoClient(MONGO_URI)
    DBCON = client.get_default_database()
    while True:
        yield DBCON

* Tried run DB connection on separate backend module. Old backend is depricated and new one, modules based is badly documented. Cannot find usable example :(  Find https://cloud.google.com/appengine/docs/python/modules/
Is there some step-by-step tutorial how to made simple GAE backend module and how to take variable (open connection handle in my case) from backend to frontend?

Also I'am afraid about $cost if the connection will be open in backend thread for lifetime of instance. Any clarify of that will be helpfull.

* Made changes in PyMongo - in progress, I explore this files now.


--- Oh my GOD, keep open connection to database is such a simple task, in another environment... ---









Minie Takalova

unread,
Nov 6, 2015, 1:16:17 AM11/6/15
to Google App Engine
Can be used, Task Queue as separate, non request dependent, process for storing open database connection?

Cite:... With the Task Queue API, applications can perform work outside of a user request, initiated by a user request. If an app needs to execute some background work, it can use the Task Queue API ...

Is there any simple sample of that usage?



Jeff Schnitzer

unread,
Nov 6, 2015, 1:46:33 AM11/6/15
to Google App Engine
Due to the nature of classic google app engine, you can't leave threads running. This doesn't mean you can't have sockets open between requests, but the MongoDB driver wants to use threads to manage those sockets for you and that's a no-no.

This leaves you pretty much two choices if you want to use MongoDB in some capacity with Google App Engine:

 1) Use Managed VMs. You can do almost anything you want in a Managed VM, including creating threads.
 2) Hack the driver to not use threads.

There really aren't a lot of other options. GAE is inherently constrained somewhat more than other platforms. There are advantages and disadvantages to this, but if you really want to make heavy use of MongoDB, I suggest investigating #1 above.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

Minie Takalova

unread,
Nov 6, 2015, 2:46:54 AM11/6/15
to Google App Engine
Re. Jeff SchnitzerThank you for advice.

Also it mean, that without Managed VMs or hacking DB driver is no way how to keep open connection between requests?
Not GAE Backend process, not Task Queue, not Backend Threads, not Multitenancy, not Non-Blocking Sockets can be used ?





Nick (Cloud Platform Support)

unread,
Nov 6, 2015, 12:02:57 PM11/6/15
to Google App Engine
Hey Minie,

As Jeff Schnitzer has pointed out, the issue stems from the fact that the PyMongo driver is wanting to use threads (via thread.open()) to manage the connection pool. The thread opened in this way will be joined to the main thread when the response is finally sent for each request. You can find the driver's source on github.

As they also pointed out, you can use Managed VMs to run threads indefinitely, and your code will probably work without any alteration. In this regard - allowing threading, direct access to the network interface, process-control and the filesystem, Managed VMs are much more like a traditional VPS hosting your python runtime (used to run your WSGI app) rather than an App Engine instance, with the sandbox and special runtime. 

Now, the other connection libraries you mention, if they're based on PyMongo, will of course share the same issue with its connection pool code wanting access to threads, since they use it as a basis and probably don't write their own connection pool code.

As for the coroutine-based code you posted, generators work by returning a generator object, which is then iterable with reference to its stored state. Creating the DBCONN object through client.get_default_database() will mean storing a copy of that object in memory which can be accessed by calls to "next()" on the generator. The issue is, this doesn't change the fact that behind the abstraction of the client object returned by client.get_default_database(), the implementation is still opening threads.

As for Task Queues, they are a means of doing computation outside the life-cycle and limitations of a request on the instance, as specified by the limitations of the App Engine runtime, however they aren't the same as persistent threads, which would hold on to an open socket connection and communicate with request-handling threads. Through reading the docs, you can see tasks are scheduled to run, and a request is handled (with different deadlines) to represent the "invocation" of the task when it reaches the front of the queue. So, the one-time-invocation of tasks would prevent this from being a solution which you might have envisioned without looking more into the specifics.

It seems as though the two pieces of advice offered by Jeff are in line with what I would recommend as well. The App Engine runtime is different from bare-metal processing, and the power offered by the platform needs to be taken in mind at the same time as the limitations which make it possible. Even still, Managed VMs seem to offer the best of both worlds. There may be ways to look into the source of the mongo driver library and modify it to use sockets, instance memory, and any other services in App Engine which would then work as a persistent cross-request connection pool, although that would be up to you to look into, as it would likely be a rather complex task.

As a third option which would, in line with Jeff's idea to patch the driver, break out of certain implicit limitations in the problem-space as originally laid-out, you could drop the expectation of using mongo wire protocol and instead make use of REST to interact with the DB server. It's really up to you which direction to move in. I hope my contributions to this thread have been useful to you in clarifying exactly where you are, and possible roads forward.

Minie Takalova

unread,
Nov 6, 2015, 1:52:54 PM11/6/15
to Google App Engine
Thank You Nick for comprehensive answer.

Using VMs I exclude, because of inadequate billing conditions, for such machine simple task as is keeping database connection. Also Beta state of VMs and placing only in US is not suitable.

Using coroutine was experiment if code in it can satisfy GAE rules. It is function, but only on development server.

I also tried downgrade PyMongo driver.  Here are thle list: http://api.mongodb.org/python/ and changelog http://api.mongodb.org/python/3.1/changelog.html
More then 10 versions from 2.2 to 3.0.3 I examine. Version 2.9 contain depricated method pymongo.Connection()  which is without threads.  Version 2.8.1 also can comunicate with MongoDB >3  and use new authetication mechanism SCRAM-SHA-1.  But every versions under 3.0  had some issues. 


I was also testing gevent with greenlets as replacement for Threads.  Here is nice solution, 2 line code http://api.mongodb.org/python/current/examples/gevent.html
but unfortunately I don´t find how use that patch on production server. And I don´t know if greenlets will be non-blocking on GAE.


Nick, You don't mention backend module.

I can run backend module with open connection, but I don't  know, how to deliver handler to frontend. Is the Multitenancy ( https://cloud.google.com/appengine/docs/python/multitenancy/multitenancy ) and shared namespaces the way how to do in frontend something like:  DBCONN = backend_process_give_me("DBCONN") ?
Or even backend module even with background threads cannot be used for holding open connection?  Really, really?

Here is writen: Code running on a backend can start a background thread, a thread that can "outlive" the request that spawns it.

And what about Pulled Task Queue which can live after request. Can task hold open connection and give it to back to frontend?
Can one task, on the end of it life, create a new one task and pass open connection to the new one task?
It is possible to transfer handler of open connection from task to frontend?

Expect, that exist many solutions, also in GAE, how to keep open MongoDB connection, because it is nonsense to open connection in every request.
How this task solve the others?
All of them use VMs? 
Do they hack Pymongo? Where I can find their patch?
They are using another driver? Which one?
Hope I'am not the only one in such situation.

Jeff Schnitzer

unread,
Nov 7, 2015, 1:28:10 PM11/7/15
to Google App Engine
On Fri, Nov 6, 2015 at 10:52 AM, Minie Takalova <mint...@gmail.com> wrote:
Hope I'am not the only one in such situation.

You appear to have some conceptual misunderstandings of how computers work. Or possibly there is a language/miscommunication issue:

 * A socket is not a thread and a thread is not a socket.
 * Socket handles cannot be passed between instances, which are separate processes and probably on separate machines.
 * Threads are not necessary to hold open sockets - any socket can stay open across requests normally. They're just file handles.

However, the mongodb driver creates a thread to manage the connection pool. You're right that long ago both the Python and Java drivers had the option to forego this, however, the main guy behind the drivers created a specification for how all drivers should work and that spec involves a thread. So all "modern" mongodb drivers create threads. And yes, older drivers have issues.

GAE backends are deprecated and won't work anyways because you still can't create threads using the standard API - you need to use a GAE specific API which the driver does not know about. Your question about tasks makes no sense.

You are, pretty much, alone in this situation. It is very possible that I am the only person actively using MongoDB with non-VM GAE. I am almost certainly the only person doing it with Java with anything resembling a recent version of MongoDB. This is pretty far off the beaten path. If you are not comfortable hacking the driver (and since there appear to be some gaps in your understanding of threads and sockets, it will not be easy), then you should reconsider using either MongoDB or GAE.

Jeff

tetsujin1979

unread,
Nov 7, 2015, 4:01:47 PM11/7/15
to Google App Engine
If you're using MongoLab as a backend, is using the API an option - http://docs.mongolab.com/data-api/
I've used this with Java apps on GAE without any problems

Minie Takalova

unread,
Nov 7, 2015, 6:00:50 PM11/7/15
to Google App Engine

RE. Jeff Schnitzer: I like your pragmatic open-eye mindset. You are right, I have some conceptual misunderstandings of GAE environment. Probably because I’m old school dinosaur programming since 1986, I have another side of view. In my "world" open socket connections can be handled between threads, processes, instances even between applications on different computers. In my point of view It is all about bytes in memory and its handling. Sure, with dependency of environment.


Hope, after many days of testing GAE I better understand GAE environment and sandbox ( https://cloud.google.com/appengine/docs/python/ ).

I don't see GAE PaaS as one computer. I know, that every request can be handled by another instance on another computer in other side of Earth.

Still I don't understand why GAE have such a tools like backend modules, API for sockets, API for backend threads, Multitenancy, ... and keeping open connection to external database is such complicated even if it need running thread and open socket.


Probably is something wrong (with me) if I’m alone in this situation, but never mind, I feel that must be some nice, smart, easy and clear solution, something on 1..5 lines of code. This feeling never betray me (maybe it is for first time). Only my week knowledge and time are standing between me and that "silver bullet" solution. I believe on the silver bullets if you have enough time to find them :-)


OK, if I don't want to use VMs, what can I do now:

* I can try to rewrite PyMongo driver

* use REST API like user tetsujin1979 mention. Thanks for it. (I observe it as slower and less secure solution. )

* I still don't know enough about modules running in backend, so i don't reject this way (GAE backends is deprecated, but not modules on backend, if I understand right https://cloud.google.com/appengine/docs/python/modules/)

* use another driver, maybe it need rewrite less then PyMongo, driver Motor look like nice piece of code

* use something totally different


For now I little rewrite application logic to extensively use bulk operations. Background worker with its own open connection is still attractive imagination.


Thanks You all. If I find that "silver bullet" I'll be pleased to give it to others.

Prasanga Siripala

unread,
Jun 29, 2016, 12:43:11 AM6/29/16
to Google App Engine
I believe to keep the connection alive, SOCKET API requires a new request. Therefore after the request is finished, you can't reuse connection.

Nick (Cloud Platform Support)

unread,
Jun 30, 2016, 7:14:51 PM6/30/16
to Google App Engine
Hey Prasanga, 

I'm unsure what you meant by your last post. Do you think you could clarify in more precise language what behaviour you're referring to? Do you mean the fact that "Sockets may be reclaimed after 2 minutes of inactivity; any socket operation keeps the socket alive for a further 2 minutes.", as mentioned in the docs?

Reviewing this thread, I'd say that forcing a software implementation dependent on opening OS threads into the App Engine mold simply may not be the best use of dev cycles. It might be best to delegate the mongoDB connection functionality to a GCE instance which holds connection pools on proper OS threads. For modest loads, one such instance ought not to be much of a bottleneck because of the low-latency nature of simply forwarding requests and data. It would act basically like a proxy.

The other option is to connect to Mongo via REST, although this is more expensive, as it uses higher layers of the protocol stack, with HTTP headers, etc.

Anyways, let us know your thoughts Prasanga!

Cheers,

Nick
Cloud Platform Community Support 
Reply all
Reply to author
Forward
0 new messages