Aggregation support - let us make this easy!

26 views
Skip to first unread message

Ayende Rahien

unread,
Dec 19, 2010, 7:36:32 AM12/19/10
to ravendb
I have been working on this fancy surprise: https://github.com/ayende/ravendb/tree/auto-map-reduce

Please note, only Count is implemented, and there aren't any tests yet. 
I am not yet sure how to expose this in the client API, either via the Lucene query or (heavens forbid) the Linq provider.

Thoughts?


Given the following documents:

{"Name":"Oren Eini"}
{"Name":"Oren Eini"}
{"Name":"Ayende Rahien"}


Will result in index: Temp/AllDocs/GroupedByCount


from doc in docs
select new { Count = 1 }

from result in results
group result by new {  }
into g
select new
{
Count = g.Sum(x=>x.Count)
}

And the following result:

{
  "Results": [
    {
      "Count": "3"
    }
  ],
  "Includes": [
    
  ],
  "IsStale": false,
  "IndexTimestamp": "\/Date(1292760357925)\/",
  "TotalResults": 1,
  "SkippedResults": 0,
  "IndexEtag": "00000000-0000-1500-0000-00000000000b"
}

While this:

Resulted in the following Temp/AllDocs/ByName/GroupedByCount:

from doc in docs
select new { Name = doc.Name, Count = 1 }

from result in results
group result by result.Name
into g
select new
{
Name = g.Key,
Count = g.Sum(x=>x.Count)
}


{
  "Results": [
    {
      "Count": "1",
      "Name": "Ayende Rahien"
    },
    {
      "Count": "2",
      "Name": "Oren Eini"
    }
  ],
  "Includes": [
    
  ],
  "IsStale": false,
  "IndexTimestamp": "\/Date(1292760357925)\/",
  "TotalResults": 2,
  "SkippedResults": 0,
  "IndexEtag": "00000000-0000-1500-0000-00000000000b"
}


Which used the same index as before, but generates just:

{
  "Results": [
    {
      "Count": "2",
      "Name": "Oren Eini"
    }
  ],
  "Includes": [
    
  ],
  "IsStale": false,
  "IndexTimestamp": "\/Date(1292760357925)\/",
  "TotalResults": 1,
  "SkippedResults": 0,
  "IndexEtag": "00000000-0000-1500-0000-00000000000b"
}

Rob Ashton

unread,
Dec 19, 2010, 7:48:16 AM12/19/10
to ravendb
Okay - my initial thoughts - I was already thinking about this was
simply to (cry), and go look at the LINQ provider.

There is surely no reason we can't analyze the LINQ query

from blog in session.Query<Blog>()
group blog by blog.Category into g
select new
{
Category = g.Key,
Count = g.Count()
}


and convert that into the relevant map and reduce expression.

The only thing stopping me so far has been a fear of the LINQ provider
the knowledge that it would take me a bit more than the usual 6-8
hours that I normally give to Raven features.
> And finally, we havehttp://localhost:8080/indexes/dynamic?aggregation=count&fetch=Name&qu..."Oren

Rob Ashton

unread,
Dec 19, 2010, 7:50:49 AM12/19/10
to ravendb
(IE, it is unlikely I can do it in the Sunday afternoon I usually use
for such endeavours)

Ayende Rahien

unread,
Dec 19, 2010, 7:52:21 AM12/19/10
to rav...@googlegroups.com
I would say that the first step is to expose this via the Lucene query, then see what sort of info we need from the rest.

Rob Ashton

unread,
Dec 19, 2010, 8:00:13 AM12/19/10
to ravendb
I'll give it a few cycles over lunch - I think the problem is a bit
more complex because we need the ability to select more than one
field.

It will also likely have implications for future "auto" work (I
foresee us eventually exposing live projections automatically too).

I'd prefer if we didn't further butcher the lucene syntax in any of
these goals - lucene is for querying the created index.

What we really need is

LuceneQuery<Blog>()
.Count(x=> new {
x.Category
})
.Where("Category:RavenDB")


Or something like that, we might need to look at the HTTP interface a
bit more for this too

Rob Ashton

unread,
Dec 19, 2010, 8:03:13 AM12/19/10
to ravendb
Probably best off not doing anything clever with expressions here
though, as invoking that from the linq provider will be an ass.

LuceneQuery<Blog>()
.Count("Category", "SomeOtherProperty", "Some.Nested.Property")
.Where("Category:RavenDb")

Ayende Rahien

unread,
Dec 19, 2010, 8:03:25 AM12/19/10
to rav...@googlegroups.com
Rob,
Auto Live Projections is something that I intend to implement today/tomorrow :-)

How about?

LuceneQuery<Blog>()
   .SelectFields("Category")
   .Aggregation(AggregationOperation.Count)
   .Where("Category:RavenDB")

Ayende Rahien

unread,
Dec 19, 2010, 8:04:07 AM12/19/10
to rav...@googlegroups.com
Speaking of which, what sort of operations do we want to support for aggregation?

Count, Sum, Avgerage

Anything else?

Rob Ashton

unread,
Dec 19, 2010, 8:14:05 AM12/19/10
to ravendb
Great - even more stuff to support via the LINQ provider ;-)

I think what you propose via the Lucene query is sane, and if you're
already heading down the live projections route then you'll already
have a mind on how to combine the two going forward.

It's not really sellable until there is linq support though

Ayende Rahien

unread,
Dec 19, 2010, 8:18:26 AM12/19/10
to rav...@googlegroups.com
Actually, those features come for a client that wants to deal with things purely dynamically.
So those are actually quite attractive features for them.

Rob Ashton

unread,
Dec 19, 2010, 8:51:29 AM12/19/10
to ravendb

I want to demo it though!

Ayende Rahien

unread,
Dec 19, 2010, 8:59:19 AM12/19/10
to rav...@googlegroups.com
Yeah, obviously I care about making those features accessible :-)

Matt Warren

unread,
Dec 20, 2010, 9:57:06 AM12/20/10
to rav...@googlegroups.com
+1 for this feature, I was just wondering if calling it Map/Reduce would make more sense, i.e.

LuceneQuery<Blog>()
   .SelectFields("Category")
   .MapReduce(AggregationOperation.Count)
   .Where("Category:RavenDB")

Then it's a bit more obvious what it's doing, or do you want to deliberately hide this?

Also would you ever want Min/Max as supported aggregation options?

Ayende Rahien

unread,
Dec 20, 2010, 11:55:24 PM12/20/10
to rav...@googlegroups.com
Matt,
MapReduce is scary to some people, Aggregation is something that they are more familiar with.
Reply all
Reply to author
Forward
0 new messages