Transformer on large dataset using 2 different document types

61 views
Skip to first unread message

Chris Fellows

unread,
Sep 11, 2017, 1:04:03 PM9/11/17
to RavenDB - 2nd generation document database
I have the two C# classes below and I'm trying to create a transformer that takes a list of Parent instances (around 50,000), finds the relevant Child instances, performs a calculation and returns a list of totals. There are about 400,000 Parent instances and about 20 Child instances for each one.

Currently I'm loading batches of Parent and Child instances in to memory and performing the calculation on the client. I've created an index but I want to speed it up more and, if possible, reduce the amount of data transferred over the network.

I would like to do the Raven equivalent of this SQL:

SELECT C.CategoryID, SUM(P.Multiplier * C.Total) AS Total
FROM Parents P 
INNER JOIN Child C ON C.PeriodDate BETWEEN P.StartDate AND P.EndDate
WHERE P.StartDate >= '2017-01-01' AND P.EndDate <= '2017-06-01'
GROUP BY C.CategoryID


Can someone help me out to get started?

My C# classes:

public class Parent
{
public int ID { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
public double Multiplier { get; set; }
}

public class Child
{
public DateTime PeriodDate { get; set; }
public CategoryID { get; set; }
public double Total { get; set; }
}

Oren Eini (Ayende Rahien)

unread,
Sep 11, 2017, 5:04:49 PM9/11/17
to ravendb
Is there a reason you cannot do this as a map/reduce index?
You would need to also group by time, but that will reduce the total number of items by a lot.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Fellows

unread,
Sep 12, 2017, 10:16:23 AM9/12/17
to RavenDB - 2nd generation document database
I looked at map / reduce but I couldn't figure out how to do it. Could you point me in the correct direction please?

Michael Yarichuk

unread,
Sep 12, 2017, 1:17:27 PM9/12/17
to RavenDB - 2nd generation document database
Hi,
This SO answer has a nice explanation of what is a map/reduce index --> https://stackoverflow.com/a/4255207/320103

Also, in the following link you have some map/reduce index examples --> https://ravendb.net/docs/article-page/3.5/all/indexes/map-reduce-indexes

On Tue, Sep 12, 2017 at 5:16 PM, Chris Fellows <chrisfe...@gmail.com> wrote:
I looked at map / reduce but I couldn't figure out how to do it. Could you point me in the correct direction please?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Michael Yarichuk l RavenDB Core Team 

RavenDB paving the way to "Data Made Simple"   http://ravendb.net/  

Chris Fellows

unread,
Sep 13, 2017, 2:52:41 AM9/13/17
to RavenDB - 2nd generation document database
Thanks, but I've looked at lots of map reduce examples and it doesn't really help me. I'm not using a simple join as in most of the examples (For example, Parent.ForeignID = Child.ID). I'm joining on Child.PeriodDate between Parent.StartDate and Period.EndDate. I don't understand what the syntax is for identifying the Child documents to load.

Oren Eini (Ayende Rahien)

unread,
Sep 17, 2017, 4:44:30 AM9/17/17
to ravendb
You would be better off doing something like:

// map
from c in docs.Children
select new 
{
   Dates = new [] { c.PeriodDate }
   Totals = new [] { c.Total } ,
    c.CategoryID
}

// reduce
from result in results
group result by result.CategoryID into g
select new
{
   Dates = g.SelectMany(x=>x.Dates),
   Totals = g.SelectMany(x=>x.Totals),
   CategoryID = g.Key
}

This gives you half the answer, then you can query for all the parents in the date ranges require and do the multiplication there.

If you _really_ want to do it all on the server, you can use Scripted Index to generate a collection from the map/reduce result and have another map/reduce to finish the work.



Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

On Wed, Sep 13, 2017 at 9:52 AM, Chris Fellows <chrisfe...@gmail.com> wrote:
Thanks, but I've looked at lots of map reduce examples and it doesn't really help me. I'm not using a simple join as in most of the examples (For example, Parent.ForeignID = Child.ID). I'm joining on Child.PeriodDate between Parent.StartDate and Period.EndDate. I don't understand what the syntax is for identifying the Child documents to load.

--
Reply all
Reply to author
Forward
0 new messages