Reporting database implementation

MattK

unread,

Dec 31, 2010, 2:42:12 AM12/31/10

to mongodb-user

For reporting off of OLTP databases with an RDBMS, it is common to
create a separate read-only database to separate transactional and
reporting workloads. What procedures can be utilized with MongoDB to
produce the equivalent?

Additionally, I have seen the map-reduce implementation in MongoDB
described as more for ETL processes, not as well suited for
transactional application queries (which is why MongoDB has a dynamic
query language). Are there examples of how ETL procedures are written
in map-reduce?

Eliot Horowitz

unread,

Dec 31, 2010, 9:24:16 AM12/31/10

to mongod...@googlegroups.com

> For reporting off of OLTP databases with an RDBMS, it is common to
> create a separate read-only database to separate transactional and
> reporting workloads. What procedures can be utilized with MongoDB to
> produce the equivalent?

What are you actually trying to achieve? Is it security, isolation,
perfomance?

> Additionally, I have seen the map-reduce implementation in MongoDB
> described as more for ETL processes, not as well suited for
> transactional application queries (which is why MongoDB has a dynamic
> query language). Are there examples of how ETL procedures are written
> in map-reduce?

Most map-reduces are ETL, so is there a specific example you're looking for?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

MattK

unread,

Dec 31, 2010, 3:06:31 PM12/31/10

to mongodb-user

> What are you actually trying to achieve? Is it security, isolation,
> perfomance?

Principally performance, isolating reporting workloads from incoming
application transactions.

> Most map-reduces are ETL, so is there a specific example you're looking for?

Since some other engines use m-r as the primary query mechanism, I am
looking for examples at how what the m-r implementation in MongoDB is
geared towards, such as pre-aggregating / summarizing data for
reports. One case might be activity counts by user/account per time
unit over a time range.

I have working examples of simple GROUP BY type queries, using both
map-reduce and the native grouping functions, but coming from an RDBMS
background, I am wondering what processes would be used when such
operations are too expensive to run on-demand.

On Dec 31, 9:24 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > For reporting off of OLTP databases with an RDBMS, it is common to
> > create a separate read-only database to separate transactional and
> > reporting workloads. What procedures can be utilized with MongoDB to
> > produce the equivalent?

> > Additionally, I have seen the map-reduce implementation in MongoDB
> > described as more for ETL processes, not as well suited for
> > transactional application queries (which is why MongoDB has a dynamic
> > query language). Are there examples of how ETL procedures are written
> > in map-reduce?

Andreas Jung

unread,

Dec 31, 2010, 3:07:24 PM12/31/10

to mongod...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

MattK wrote:
>> What are you actually trying to achieve? Is it security, isolation,
>> perfomance?
>
> Principally performance, isolating reporting workloads from incoming
> application transactions.

MongoDB does not support transactions.

- -aj

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJNHjf7AAoJEADcfz7u4AZj5hELv3xoB06cSlE1LkAAZIBXihPR
PC2FqWlckLRaTB5qCtwhUYHMmIy2rFKKK61psheeTp9bARq67xB8Lc94y+cUUf5g
3Y8Y2d0A0pqkqsHrF1FUGV3BXTvtUY1wSYdnMg292bfRNcEojSgrVtWNJBwGfrJ0
vfGIz746JP2OHKfBwqMI+UKDlCG6yQn3Vc412GD0Au1e3lQzQuGlQ9q2yOIM2pbB
7FDnnjYlPw1015u+4YDhStdNVgXcp7f3JT+FgbrfU642HV2B458pSXqeZiiubS6x
Hulht4JW2l+UZKFwmVEpmbSqH51aaVLCBkkFLg565hi89sQRX0z/HTvdYaJHFLOI
jJxowumaj8EKFTl/vbefd12T2wxPLx7q88OuY6c/DEIMdA1fgkA6ecbl3bcNUhvy
+A/3B5RCEQClI0bTWMqfBozckOLc85RAscnhlmMGNjNGoYHc5TUBBHOp5cG9Bfvw
LNWFqIKMs7uutMNK2b4/306waSZ5uPk=
=8p4p
-----END PGP SIGNATURE-----

lists.vcf

MattK

unread,

Dec 31, 2010, 3:53:31 PM12/31/10

to mongodb-user

On Dec 31, 3:07 pm, Andreas Jung <li...@zopyx.com> wrote:
> MongoDB does not support transactions.

Across multiple documents, no it does not. Multiple operations on a
single document, it does to an extent.

But my question is not related to _database_ transactions, but a
separation of workload: Isolating the application(s) and user
activity, from reporting / analytics.

Scott Hernandez

unread,

Dec 31, 2010, 3:57:21 PM12/31/10

to mongod...@googlegroups.com

That can easily be done using a replica or slave of the data.

Eliot Horowitz

unread,

Dec 31, 2010, 4:19:35 PM12/31/10

to mongod...@googlegroups.com

Generally what people do is run map/reduce with an "out" parameter.

The way that works is it builds up the entire collection, and then
atomically swaps out the old copy for the new one.

So if you want to generate charts and update once an hour, you always
have a copy around, and just have a background job creating new copies
hourly.

MattK

unread,

Dec 31, 2010, 5:04:20 PM12/31/10

to mongodb-user

On Dec 31, 3:57 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> That can easily be done using a replica or slave of the data.

On Dec 31, 4:19 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Generally what people do is run map/reduce with an "out" parameter.

Is there a mechanism to read from a replica/slave, and write the
results of a map/reduce "out" into another collection / database?

I know this could be accomplished by a script that reads a collection
from one database, and writes it into another, but I am wondering if
MongoDB supports cross-database queries.

Joseph Friesen

unread,

Jan 1, 2011, 1:11:05 PM1/1/11

to mongodb-user

If I may interject, Eliot, is there a way for mongo db to only do the
mapping of data only on the new/modified records in a collection? I
would be concerned that for instance on a fast growing append only
collection that, as in your example, the hourly map reduce job would
grow quickly in time and after a while take more than an hour.

Now I generally have the impression that map is much more expensive
than reduce. Likewise the mapping will logically not change in an
append only collection, assuming the map reduce spec doesn't change.
It would thus make sense if the mapping data could be saved and then
be added to, after which the reduction is done again, rather than
redoing the whole mapping. Although I've never worked with CouchDB,
from the gist I get of the documentation it works in this way. Can
MongoDB as well?

Regards,
Joseph Friesen

On Dec 31 2010, 2:19 pm, Eliot Horowitz <eliothorow...@gmail.com>
wrote:

Scott Hernandez

unread,

Jan 1, 2011, 1:45:17 PM1/1/11

to mongod...@googlegroups.com

On Sat, Jan 1, 2011 at 10:11 AM, Joseph Friesen <frie...@gmail.com> wrote:
> If I may interject, Eliot, is there a way for mongo db to only do the
> mapping of data only on the new/modified records in a collection? I
> would be concerned that for instance on a fast growing append only
> collection that, as in your example, the hourly map reduce job would
> grow quickly in time and after a while take more than an hour.

You can use a query with map/reduce to only process new documents.

>
> Now I generally have the impression that map is much more expensive
> than reduce. Likewise the mapping will logically not change in an
> append only collection, assuming the map reduce spec doesn't change.
> It would thus make sense if the mapping data could be saved and then
> be added to, after which the reduction is done again, rather than
> redoing the whole mapping. Although I've never worked with CouchDB,
> from the gist I get of the documentation it works in this way. Can
> MongoDB as well?

There are options which will reduce with the target collection; much
like you describe. (This is new in 1.7.4+)
http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions

> Regards,
> Joseph Friesen
>
> On Dec 31 2010, 2:19 pm, Eliot Horowitz <eliothorow...@gmail.com>
> wrote:
>> Generally what people do is run map/reduce with an "out" parameter.
>>
>> The way that works is it builds up the entire collection, and then
>> atomically swaps out the old copy for the new one.
>>
>> So if you want to generate charts and update once an hour, you always
>> have a copy around, and just have a background job creating new copies
>> hourly.
>>
>>
>>
>> On Fri, Dec 31, 2010 at 3:06 PM, MattK <bsg...@gmail.com> wrote:
>> > Since some other engines use m-r as the primary query mechanism, I am
>> > looking for examples at how what the m-r implementation in MongoDB is
>> > geared towards, such as pre-aggregating / summarizing data for
>> > reports. One case might be activity counts by user/account per time
>> > unit over a time range.
>>
>> > I have working examples of simple GROUP BY type queries, using both
>> > map-reduce and the native grouping functions, but coming from an RDBMS
>> > background, I am wondering what processes would be used when such
>> > operations are too expensive to run on-demand.
>

MattK

unread,

Jan 3, 2011, 7:01:34 AM1/3/11

to mongodb-user

Back to the thread topic...

Is there a mechanism to read from a replica/slave, and write the
results of a map/reduce "out" into another collection / database?

I know this could be accomplished by a script that reads a collection
from one database, and writes it into another, but I am wondering if

MongoDB supports some form cross-database queries?

Nat

unread,

Jan 3, 2011, 8:00:38 AM1/3/11

to mongodb-user

The mechanism is there but there is some caveat. Please watch for:
http://jira.mongodb.org/browse/SERVER-2286

It will map/reduce into another collection but as of current, it has
to be the same database.

MattK

unread,

Jan 3, 2011, 12:40:45 PM1/3/11

to mongodb-user

So, a m/r operation on a slave can currently write into that slave?

If so, it looks like SERVER-2286 might remove that behavior for inline
only results.

Reply all

Reply to author

Forward