Re: Updating data in a collection and some "collection version" simultaneously

Jeremy Mikola

unread,

Sep 21, 2012, 3:02:31 PM9/21/12

to mongod...@googlegroups.com

You mention tracking the version of an entire collection, and then querying that to effectively send back snapshots of changes to documents between "now" and the most recent version. How do you intend to formulate a "partial update" diff?

My first approach would be to track "updatedAt" times on documents within the collection and have a frontend that periodically polls the backend and queries for recently updated documents (sent back in their entirety, if the sizes are reasonable).

For a more intelligent approach, https://github.com/julianbrowne/rtsdemo is one example of using tailable cursors to feed data from a capped collection back to the web browser (via WebSockets). It starts off with an arbitrary capped collection, but then looks at following the oplog, which seems more relevant to what you're trying to do. Feeding back data from the oplog would be the individual changes that you'd want to apply to data being represented in the frontend. Of course, this might require interpretting the various atomic operators that MongoDB uses in its update queries.

As another idea, the interface you're describing reminds me of the functionality provided by MeteorJS, which is explained rather succinctly here: http://stackoverflow.com/questions/10214385/how-does-meteor-js-work

Николай Кучумов

unread,

Sep 21, 2012, 3:57:12 PM9/21/12

to mongod...@googlegroups.com

Yay, first reply.

Let's say 'movies' is version 1.

Then I do 'movies'.update({}, { $set: { blah: 'blah' }}), and when this update applies the 'movies' version becomes 2.

If I was querying 'movies' while it was being updated, I'd get stale results, but I'd also have means to detect this stallness, because the 'movies'.find({}) query would also return { ..., version: 1 }.

Then, on the client, I'd periodically (say, once in 10 seconds) query the current version of 'movies', and if it's greater than mine, I'd request partial updates for missing versions.

In this case I'd request a partial update = " 'movies'.update({}, { $set: { blah: 'blah' }}) ".

You are right here - this technique would require parsing MongoDB expression language, but this task is manageable as long as the update queries stay simple.

So, then, in the browser, I would reapply this update on the <ul/> element of the movies displayed on the page, and, therefore, have the exact copy of the data which resides on the MongoDB server.

I have already thought about an "updatedAt" property, but, since I should use pagination on the client, the fetched batch of data can have old "updatedAt" values and some far-far-away 'movie' would have a fresh "updatedAt" value, and I would never know that since that freshly-updated movie just isn't in the query results for now.

Thanks for the link on "rtsdemo". I haven't heard about it and I'll read it tomorrow.

As for the Meteor framework, I've heard about it, but I'm kinda sceptic about it, since it just got a couple of million dollars investments (if i'm not mistaken).

I'm usually sceptic about all the 'frameworks'.

Maybe because I just prefer writing my own :)

Jeremy Mikola

unread,

Sep 21, 2012, 4:07:28 PM9/21/12

to mongod...@googlegroups.com

On Friday, September 21, 2012 3:57:12 PM UTC-4, Николай Кучумов wrote:

Yay, first reply.

Let's say 'movies' is version 1.
Then I do 'movies'.update({}, { $set: { blah: 'blah' }}), and when this update applies the 'movies' version becomes 2.
If I was querying 'movies' while it was being updated, I'd get stale results, but I'd also have means to detect this stallness, because the 'movies'.find({}) query would also return { ..., version: 1 }.

Ah, so you're intending to store version information on the documents themselves? ElasticSearch does this (for optimistic concurrency control), which allows it to avoid updating stale data. In your case, I would consider using snapshotted queries if you are concerned about documents being updated while you're querying. That would at least ensure that your query returns all documents as they are at the time the query was executed (vs. changes occurring over the lifetime of the cursor).

Then, on the client, I'd periodically (say, once in 10 seconds) query the current version of 'movies', and if it's greater than mine, I'd request partial updates for missing versions.
In this case I'd request a partial update = " 'movies'.update({}, { $set: { blah: 'blah' }}) ".

Is that partial update something you intend to store within the document? It seems like something you could already obtain by tailing the local.oplog.$main collection.

You are right here - this technique would require parsing MongoDB expression language, but this task is manageable as long as the update queries stay simple.
So, then, in the browser, I would reapply this update on the <ul/> element of the movies displayed on the page, and, therefore, have the exact copy of the data which resides on the MongoDB server.

I have already thought about an "updatedAt" property, but, since I should use pagination on the client, the fetched batch of data can have old "updatedAt" values and some far-far-away 'movie' would have a fresh "updatedAt" value, and I would never know that since that freshly-updated movie just isn't in the query results for now.

In this case, it seems like you may need to update even staler versions of documents that are currently off-screen on other pages. I was under the impression would be keep all models in the frontend in sync, which would include updating documents that may not be visible on the current page.

Николай Кучумов

unread,

Sep 21, 2012, 4:12:44 PM9/21/12

to mongod...@googlegroups.com

Thank you very much, Jeremy.

I have so much to investigate now.

On Saturday, September 22, 2012 12:07:28 AM UTC+4, Jeremy Mikola wrote:

Николай Кучумов

unread,

Sep 22, 2012, 5:21:07 AM9/22/12

to mongod...@googlegroups.com

Ah, so you're intending to store version information on the documents themselves? ElasticSearch does this (for optimistic concurrency control), which allows it to avoid updating stale data. In your case, I would consider using snapshotted queries if you are concerned about documents being updated while you're querying. That would at least ensure that your query returns all documents as they are at the time the query was executed (vs. changes occurring over the lifetime of the cursor).

Not exactlly on the documents but on the collection itself. And this 'current collection version' info would be automatically added to any data retrieved by querying the collection.

Anyway, having read the link about snapshotting you provided, it seems to me that MongoDB is unable, at current stage of development, to provide such a feature as mirroring a database on the client. They also say that new data still can get in the way even with find().snapshot() - "Even with snapshot mode, items inserted or deleted during the query may or may not be returned; that is, this mode is not a true point-in-time snapshot.".

Maybe such isolation would come with a big cost, which would be unacceptable in highload environments.

In this case, it seems like you may need to update even staler versions of documents that are currently off-screen on other pages. I was under the impression would be keep all models in the frontend in sync, which would include updating documents that may not be visible on the current page.

The documents, which are currently not displayed on the screen, wouldn't be updated, because they are be queried from the database when the user scrolls down to them, therefore we get the (99.99%) newest versions of them.

The documents residing in the beginning of the page would get updated as intented, with partial updates.

Accessing the "oplog" is a very interesting capability, but I think it wouldn't help much since we wouldn't know which version does the queried data have: do we need to apply 'this' update, or 'that' update, to make our data 'up-to-date'?

The update operation version (already present in oplog) would help, I think, if it was also somehow automatically added to the queried data, so that we would know which lines from the oplog are needed to be applied to the data and which are not.

Anyway, this was a very interesting discussion for me (even assuming that I've given up the idea of 'real time data' because of its complexity for now and moved on to other tasks).

I even realized that I need to go and read MongoDB 'docs' section from title-page to colophon since there are so many interesting things described I didn't even know about.