Using Map/Reduce to replicate RRD Derive

42 views
Skip to first unread message

StevenE

unread,
Feb 10, 2012, 9:04:01 AM2/10/12
to mongodb-user
Hi

I was wondering if Map/Reduce could be used for replicate the RRD
derive feature. i.e:

I have a simple database structure for tracking a users time on a
website. The session_time entry is accumulative. i.e The first
session_time entry when you search and sort by session_id and
session_time with always be 0 last entry will always be the total
session time

This is how the data is saved:

{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 0,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 31,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 61,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 93,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 123,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 154,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 184,
created_at: DATETIME }

It would be very helpful to show the data like this when searching
according to a date range and user_id

{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 0,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 31,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 30,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 32,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 30,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 31,
created_at: DATETIME }
{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 30,
created_at: DATETIME }


Thanks and jolly salutes

Barrie

unread,
Feb 10, 2012, 11:57:11 AM2/10/12
to mongodb-user
Hey Steven,

Technically you can use map reduce for this, but I think it would be a
better decision architecturally to not re-process the data. Instead
set up your application to save documents with the information you
need to query for, i.e. delta (most recent session_time minus the one
before it). Does that make sense?

Barrie

Steven Eksteen

unread,
Feb 10, 2012, 12:10:29 PM2/10/12
to mongod...@googlegroups.com

It does yes. I did try that awhile ago. I got strange results with that method. Must have been a query caching issue cause those results thinking back now. We'll just go back for round 2 :)

Thanks for the reply and advice
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Steven Eksteen

unread,
Feb 11, 2012, 5:19:48 PM2/11/12
to mongod...@googlegroups.com

Ahh…. I just figured out now why that method didn't work for me last time

Using this example:

If a new session is started this would come in:

{ _id: "1", user_id: "xxx", session_id: "1234", session_time: 0, created_at: DATETIME }

All good.
The next one:

{ _id: "2", user_id: "xxx", session_id: "1234", session_time: 31, created_at: DATETIME }

Here i could just look up the last saved and save the difference. i.e: 31 - 0. So save as is.
Then this one

{ _id: "3", user_id: "xxx", session_id: "1234", session_time: 61, created_at: DATETIME }

This one too, 31 - 61 so save it as

{ _id: "3", user_id: "xxx", session_id: "1234", session_time: 30, created_at: DATETIME }

But then to properly get the rest I would have to look up all previous entries for the session, add the session_time and save the difference. Keeping in mind a session can last days. Looking up just the last saved would give me incorrect stuff. Like if this came in

{ _id: "xxx", user_id: "xxx", session_id: "1234", session_time: 93, created_at: DATETIME }

I would do 93 - 30 and save 63. Which is wrong.

It seems like a lot of overhead to add all the session_time up on every request seeing as 1 account can post as frequently as every minute and there are thousands of accounts rather than just take the calculation time hit once in a while using Map/Reduce when an account report is generated

Currently I am using Ruby to make the calculations but it's starting to get noticeably longer to generate.

Barrie

unread,
Feb 11, 2012, 6:46:31 PM2/11/12
to mongodb-user
I definitely see how that won't work past the first three records in
your collection. To correct for that, you can add another field to
each record, let's call it session_total, and let that value be the
cumulative session time. Then, when you're generating the most recent
session_time, you'd subtract session_total (from the prior record)
instead of the session_time. Make sense?

Barrie


On Feb 11, 5:19 pm, Steven Eksteen <ste...@secondimpression.net>
wrote:
> > > To post to this group, send email to mongod...@googlegroups.com (mailto:mongod...@googlegroups.com).
> > > To unsubscribe from this group, send email to mongodb-user...@googlegroups.com (mailto:mongodb-user...@googlegroups.com).
Reply all
Reply to author
Forward
0 new messages