> I don't really understand why this is necessary
The standard output of a Map/Reduce is in fact "reducible". It's
actually kind of a nice feature.I think this is best illustrated by an
example.
Let's assume that you are running an on-line widget sales site. You
want to roll up widget sales by state, by day. The output would look
something like this:
{ _id : { day : "2011-02-09", state : "NY" }, value : { num : 5,
revenue: 25 } }
{ _id : { day : "2011-02-09", state : "CA" }, value : { num : 7,
revenue: 40 } }
{ _id : { day : "2011-02-08", state : "NY" }, value : { num : 3,
revenue: 10 } }
Some benefits of this structure.
- It clarifies how the data is organized. If you look at the "key",
that value is effectively the "group by" columns in an SQL query. That
key clearly indicates which data that is static, while the values
indicate the data that was calculated.
- It makes merges easier to program and easier to understand (merges
are new in 1.7.4, 1.8.0)
> ... so that the permanent collection I'm building from m/r has proper indexes and normal query access
From an indexing perspective, you're going to automatically get an
index on the _id field. Given the you did roll-up "grouped by" day and
state, you're probably going to want to query by day and state. So you
probably already have the basic index that you want.
If you want additional indexes, you can also index into that object.
So you can add an index on _id.day.
> ... normal query access
I think that we need some clarity on "normal query access". The output
will be accessible by doing something like "value.num" and
"value.revenue".
I agree that this is not "normal", but it should be easy to work with.
If this does not clarify what's happening, would you be able to
provide a clear example of what you would like to do?
It may be possible to write a JIRA task, or there may be another way
to achieve what you're looking for.
- Gates