Implementing transparent full document history

Jimmie Butler

unread,

Feb 15, 2018, 4:20:13 PM2/15/18

to mongodb-user

I'm curious if there are any projects out there doing this, if it's something that would be achievable with mongo.

Qualities

- Space efficient, it doesn't just copy full document.

- Speed isn't as critical for querying a point in time version of the document as the current document.

Essentially I'd like to be able to do something like

query({_id: "123", pointInTime: "2017-12-18 18:00:50"})

Guessing this is something that would largely happen at the software layer rather than via mongo config. What would the data structure and design of such a system look like?

Wan Bachtiar

unread,

Feb 19, 2018, 6:59:16 PM2/19/18

to mongodb-user

Guessing this is something that would largely happen at the software layer rather than via mongo config. What would the data structure and design of such a system look like?

Hi Jimmie,

Generally the document/schema design depends on the application usage of the data; how your application interacts with the database (i.e. queries, updates, and processing of the data). See also Data Model Examples and Patterns.

Based on the information you provided, one idea is to utilise a similar approach to event-driven document structure.
For example, if you have the following documents on a collection called events:

{"myid" : 1, "doc" : { "a" : 1, "b" : 2, "c" : 3 }, "t" : ISODate("2018-01-01T00:00:00Z") }
{"myid" : 1, "doc" : { "c" : 10 }, "t" : ISODate("2018-01-02T00:00:00Z") }
{"myid" : 1, "doc" : { "d" : 9 }, "t" : ISODate("2018-01-03T00:00:00Z") }
{"myid" : 2, "doc" : { "a" : 2, "b" : 8 }, "t" : ISODate("2018-01-01T00:00:00Z") }
{"myid" : 1, "doc" : { "a" : null }, "t" : ISODate("2018-01-04T00:00:00Z") }
{"myid" : 1, "doc" : { "c" : 0 }, "t" : ISODate("2018-01-05T00:00:00Z") }

You can perform the following MongoDB Aggregation Pipeline to retrieve the value of doc at a particular point in time:

db.events.aggregate([
    {"$match":{myid:1, t:{"$lte":ISODate("2018-01-04")}}}, 
    {"$sort":{"t":1}},
    {"$group":{_id:"$myid", events:{"$push":"$doc"}, t:{"$last":"$t"}}},
    {"$project": {
        "doc":{"$mergeObjects":"$events"}, 
        "t":1,
    }}]);

Note: $mergeObjects expression is new on MongoDB v3.6.
Which should give you a result as below:

{"_id": 1, "t": ISODate("2018-01-04T00:00:00Z"), "doc": {"a": null, "b": 2, "c": 10, "d": 9 }}

You may also be interested in the following:

Regards,
Wan.

Jimmie Butler

unread,

Feb 22, 2018, 11:19:23 AM2/22/18

to mongodb-user

Thank you.

This is what I was after. The aggregation pipeline would handle this fairly efficiently with the right indexes I assume?

Wan Bachtiar

unread,

Mar 4, 2018, 7:04:36 PM3/4/18

to mongodb-user

The aggregation pipeline would handle this fairly efficiently with the right indexes I assume?

Hi Jimmie,

Depending on your use case, with an appropriate indexes and hardware resources it should be fairly efficient.