I have the following use case:
an event tracking / analysis system that uses MongoDB as the central event store. Business requirements state that data should be kept for at least 2 years, but without a known retention policy.
The difference in regard with most of the case studies I've read so far regarding event analytics w. reporting using Mongo is that most of the cases showcase well defined event models with known attribute sets.
A sample event document is shown below
{
"_id" : ObjectId("5429776aa980524b8b7be8cf"),
"appid" : "app.one",
"uid" : "userX",
"group" : "Accounts",
"name" : "AccountCreationSuccess",
"time" : ISODate("2014-09-14T14:05:16.243Z"),
"device" : {
"type" : "fablet",
"manufacturer" : "DELL",
"model" : "",
"resolution" : "1280x800"
},
"geo" : {
"country" : "DE",
"city" : "Frankfurt",
"coordinates" : "37.42242,-122.08585"
},
"data" : {
"genre" : "jazz",
"artist" : "John Coltrane"
}
}
Unfortunately in my case, it's not easy to pre-aggregate statistical documents, as the only stable attributes are the 'appid','group' and 'name' but the requirements regarding the analysis part include dynamic filters on arbitrary metadata attributes - e.g. total events from group A and of type (name) B, during a given period of time and having attribute 'genre'='jazz', ... N.
The only viable path I can see for now, is keeping all event data in a single sharded collection and just run aggregation queries using the specified filters. Can't really see any patterns for pre-aggregating stuff that can cover most of the reporting scenarios to come. I'm waiting to implement 2-3 reporting features just to be able to start seeing some common ground for analysis.
Has anyone come across a similar requirement? I'd really like to see how others came around this type of 'design problem'.
Thanks a lot in advance!