getLastError and connection pools problem

100 views
Skip to first unread message

mmaroti

unread,
Sep 14, 2012, 6:42:01 PM9/14/12
to node-mong...@googlegroups.com
Hi Guys!

Since the increase of the default connection pool size to 5 I have observed an error in our application: saved objects cannot be immediately reloaded. This is caused by the connection pool, there are save commands queued on some of the connections while I want to execute the findOne command on another connection. Calling getLastError on one connection will not solve the problem, and I have to call it for all connections. I question the usefulness of db.lastError, since because of the connection pool I will never know which connection it will use to execute the command. I think if no connection is specified, then it should execute the getLastError on all connections of the connection pool to make sure that there are no outstanding save requests and everything went normally on all connections. Currently, I do this manually. See this simple test program to reproduce the error:


Miklos

christkv

unread,
Sep 16, 2012, 11:02:06 PM9/16/12
to node-mong...@googlegroups.com
yes because you are not supposed to call getLastError yourself but use the safe parameters for insert/update/delete. As it's an async there is no concept of checking out any connections from a pool. to ensure correct behavior the driver packages the command with the getLastError and does it in a single write to the socket ensuring correct behavior. If you call getLastError yourself with a poolSize of 1 after doing an insert another insert might have happened inbetween.

mmaroti

unread,
Sep 17, 2012, 10:01:46 AM9/17/12
to node-mong...@googlegroups.com
I am inserting around 1 million records, so it makes a huge difference if I run it with safe=true or call getLastError at the end. And I do not care if another write slips past, all I care if everything I have saved so far without safe=true has actually visible by subsequent reads. With current connection pool behavior if you want to read your writes then you must use safe="true", otherwise there will be edge cases where your writes are still waiting somewhere (in memory queue, or tcpip buffers, stc). Of course calling getLastError (with optional sync) makes sure that all my writes have hit the server.

christkv

unread,
Sep 19, 2012, 3:07:50 PM9/19/12
to node-mong...@googlegroups.com
there's no way to have your cake and eat it to. the only possible way you can guarantee write/read consistency is getLastError (safe=true) on each insert or possibly a pool with a single connection only. You could set up X db instances with a pool of 1 to work around this.

matt.campbell....@gmail.com

unread,
Apr 24, 2013, 10:12:31 PM4/24/13
to node-mong...@googlegroups.com
Yes given node.js runs in a single thread it probably make sense to not have any checkout functionality and bundle the insert/update with a getLastError sequentially on the connection.

However, the next step to get more out of the node.js driver will be to use multiple threads for connections to increase throughput. So I suppose this will be the challenge if the project ever goes that route, because you will need to be able to checkout and release connections and also timeout connections which are idle for x number secs.

christkv

unread,
Apr 26, 2013, 2:30:24 AM4/26/13
to node-mong...@googlegroups.com
Not really a problem since node.js is probably never going to support threads (given that there is no way to synchronize threads in javascript) and due to philosophical decisions

matt.campbell....@gmail.com

unread,
Apr 26, 2013, 4:53:20 AM4/26/13
to node-mong...@googlegroups.com
Sorry, I meant using multiple child processes.

Given mongodb blocks on a connection and processes each message sent on the connection in order do you see a need to use child processes to achieve more throughput? I assume the more connections you have to mongod they would be better handled by child processes which can access the other cores? In most cases the current bottleneck isn't going to be network IO but CPU time given node runs on a single node.

I know you would say move it up a level and run multiple node apps but to me it would seem a better solution would be to move pools to child processes which can handle the IO while the main node process handles incoming requests.

I'd be interested in your thoughts on the architecture around pools. At what point does node.js running on a single core suffer when you open a number of connections to achieve throughput? For example is it worth considering moving pools to child processes so that requests and responses over connection pools can be performed simultaneously?

christkv

unread,
Apr 26, 2013, 4:57:07 AM4/26/13
to node-mong...@googlegroups.com
You cannot share objects across processes in node.js each process get's it's own pool so it's not needed. I think you are confusing threads (shared memory with their parent process) with processes (completely isolated memory and resources)

matt.campbell....@gmail.com

unread,
Apr 26, 2013, 5:03:51 AM4/26/13
to node-mong...@googlegroups.com
I'm not talking sharing an object between processes, just pass it via IPC similar to how node.js cluster works.

I guess I raise this because I like the feature in pymongo where you can checkout a connection and use it for a bunch of things so you are guaranteed to read your own write because they are piped on to the same connection and processed in order.

I could be wrong and correct me if I am but doesn't the node driver just randomly allocate a connection on each command?

christkv

unread,
Apr 26, 2013, 5:33:52 AM4/26/13
to node-mong...@googlegroups.com
Unless I've missed something you can only pass incoming server connections on IPC between child processes so this would not apply to the driver library that is using client connections.

matt.campbell....@gmail.com

unread,
Apr 26, 2013, 9:36:58 AM4/26/13
to node-mong...@googlegroups.com
I'm sorry, i don't understand what you mean.

You communicate between parent and child process.

At present creating a connection pool lets us send more commands to mongodb without getting queued behind a long running process on one connection (although the connection allocation could be better). However, as each connection receives a reply we can only process one callback at a time. This callback may include running more commands not necessarily just returning an answer via the response.

I would have thought being able to split this processing up over a number of processes would assist throughput.

Maybe we just misunderstand each other?

Reply all
Reply to author
Forward
0 new messages