The New Transform Results

914 views
Skip to first unread message

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:01:59 AM1/25/13
to ravendb
Guys,
I have been spending some time thinking about the common issues that people have with Transform Results.
I would really like to bury that feature and kill it and maybe kick it a few times for good measure :-)

Here is what I had in mind:

public class ComboBoxTransformer : AbstractResultTransformerCreationTask<User>
{
public ComboBoxTransformer()
{
ResultsAreProjectionsFromIndex = false; // default
TransformPagedResultsOnly = true; // default

Transform = results =>
from result in results
select new
{
result.Id,
result.Name
}
}
}

This is the classic Transform Results, in which we just want to project a few fields out. Note that this will work on anything that has Id & Name, not just user.
Here is another one:

public class UsersFriendsTransformer : AbstractResultTransformerCreationTask<User>
{
public UsersFriendsTransformer ()
{
ResultsAreProjectionsFromIndex = true;
TransformPagedResultsOnly = true; // default

/*
Assumes an index like this:
from u in docs.Users
from f in u.Friends
select new
{
u.Id,
FriendId = f.Id,
f.Since
}
StoreAllFields(Yes);
*/

Transform = results =>
from result in results
let f = LoadDocument(result.FriendId)
select new
{
UserId = result.Id,
f.Name,
result.Since
}
}
}

This one is an example of showing how we can load related documents.

And this one is probably the one that you'll be drolling over:

public class FriendsByWeekTransformer : AbstractResultTransformerCreationTask<User>
{
public UsersFriendsTransformer ()
{
ResultsAreProjectionsFromIndex = true;
TransformPagedResultsOnly = false;


Transform = results =>
from result in results
group result by result.Since.Year + "-" + result.Since.DayOfYear / 7
into g
select new
{
WeekNum = g.Key,
Count = g.Count()
}
}
}

This one give you the ability to do a secondary group by on the results, and access the entire result set, no matter how large.

And finally, we can also have:

public class UsersFriendsWithDefaultFromClientTransformer : AbstractResultTransformerCreationTask<User>
{
public UsersFriendsWithDefaultFromClientTransformer ()
{
ResultsAreProjectionsFromIndex = true;
TransformPagedResultsOnly = true; // default

Transform = results =>
from result in results
let f = LoadDocument(result.FriendId)
select new
{
UserId = result.Id,
Name = f.Name ?? Query["DefaultFriendName"],
result.Since
}
}
}

Note that this allows us to access values from the client, so you can send params to the transformer for it to decide what to do.

Thoughts?

Troy

unread,
Jan 25, 2013, 2:44:16 AM1/25/13
to rav...@googlegroups.com
Me like! So with the ability to use TransformResults to do additional reduce, would this be in addition to, or the way to do multi-step reduce? And yes, drooling over the UsersFriendsTransformer example!

I also assume these would still be transformed runtime not pre-calculated like a Map/Reduce.

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:50:51 AM1/25/13
to ravendb
Multi step reduce will probably be part of the indexed document bundle. We will allow you to create a new document from the indexed results.
And yes, all of that happens at query time, not indexing time.


--
 
 

Troy

unread,
Jan 25, 2013, 3:05:32 AM1/25/13
to rav...@googlegroups.com
Wonderful. I saw in a different post a new stable may be coming in the next few days... will this next stable be 2.1 or a smaller stable release? Will this new Transformer by in the next stable build?

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 3:06:51 AM1/25/13
to ravendb
No, it will be 2.01, mostly bug fixed, maybe with the sql replication in it.
The transform results is something big, and will be in 2.1


--
 
 

Matt Warren

unread,
Jan 25, 2013, 6:38:32 AM1/25/13
to ravendb
Ah nice, I didn't think of using the Index Properties bundle for this!

--
 
 

Jeremy Holt

unread,
Jan 25, 2013, 9:19:54 AM1/25/13
to rav...@googlegroups.com
Have I understood this correctly - this is the solution to my previous message on getting "OpeningBalances" ??

I assume that you will be able to perform other aggregate functions other than Count(), such as Sum(c=>c.Weight * c.Price)

I love the ComboBoxTransformer, but how would you tell it work on *anything* - would you just pass an interface to AbstractResultTransformerCreationTask<IComboBox>  ?

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 9:30:22 AM1/25/13
to ravendb
That is the idea, since it has access to the entire result set. Although I think that indexed document might be a better option.

And remember that on the server, we don't actually _have_ any things, anything that match the shape (doesn't have to be an inteface) would work.


--
 
 

Matt Johnson

unread,
Jan 25, 2013, 9:49:28 AM1/25/13
to rav...@googlegroups.com
Oren, a few questions:

- I assume that a "Result Transformer" will be a new item separate from the index, with it's own tab in the studio, etc.  Does that mean you can apply a transformer to any index? What about dynamic indexes?  What's the linq/lucene look like when you query?

- You say we will have access to the entire result set, no matter how large (when TransformPagedResultsOnly = false).  But you also say this happens at query time, not indexing time.  That's good, but does that mean that it's always doing a full index scan with O(n) results?  Won't counting or summing a million documents at query time be very expensive?  Or do you have some magic to precalc *some* of it?

- Instead of a boolean for ResultsAreProjectionsFromIndex, what if you just supplied both the document and the index explicitly and let us choose?  That would hopefully keep people from over-indexing just to have a field available to the transform.
result.document, result.indexFields, etc.
or perhaps
Transform = (documents, indexFields) => ...

- Regarding the running totals, I would want to be able to do one (or both) of these:

Transform = results =>
from result in results
select new
{
result.Timestamp,
result.Amount,
Balance = results.Where(x=> x.Timestamp <= result.Timestamp).Sum(x=> x.Amount)
}

Transform = results =>
from result in results
orderby result.Timestamp
let i = IndexOf(result)
select new
{
result.Timestamp,
result.Amount,
Balance = (i == 0 ? 0 : result[i-1].Balance) + result.Amount

Matt Warren

unread,
Jan 25, 2013, 10:05:07 AM1/25/13
to ravendb
> You say we will have access to the entire result set, no matter how large (when TransformPagedResultsOnly = false).  
> But you also say this happens at query time, not indexing time.  That's good, but does that mean that it's always doing 
> a full index scan with O(n) results?  Won't counting or summing a million documents at query time be very expensive?  
> Or do you have some magic to precalc *some* of it?

Yeah I was wondering about this as well?

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 10:08:17 AM1/25/13
to ravendb
inline


On Fri, Jan 25, 2013 at 4:49 PM, Matt Johnson <mj1...@hotmail.com> wrote:
Oren, a few questions:

- I assume that a "Result Transformer" will be a new item separate from the index, with it's own tab in the studio, etc.  Does that mean you can apply a transformer to any index? What about dynamic indexes?  What's the linq/lucene look like when you query?

Yes, that is the idea.
Yes, you could apply result transformer to a dynamic index.

I haven't thought about the client API yet, but probably something like:

session.Query<User>()
   .UsingResultTransformer<ComboBoxTransformner>()
   .Where(x=>x.IsActive) 
   .ToList();


Not sure that I like it, and it has issues with types, but that is the idea.
 

- You say we will have access to the entire result set, no matter how large (when TransformPagedResultsOnly = false).  But you also say this happens at query time, not indexing time.  That's good, but does that mean that it's always doing a full index scan with O(n) results?  Won't counting or summing a million documents at query time be very expensive?  Or do you have some magic to precalc *some* of it?

No, it is going to get all the _query results_. For example, if you want to do secondary reduce, or want to do something like a report.
 

- Instead of a boolean for ResultsAreProjectionsFromIndex, what if you just supplied both the document and the index explicitly and let us choose?  That would hopefully keep people from over-indexing just to have a field available to the transform.
result.document, result.indexFields, etc.
or perhaps
Transform = (documents, indexFields) => ...

That would be pretty hard to do from implementation perspective, you would have to load both in order to give them to the transformation function.
And the users might try to do things like joining on that, etc.
I would rather not get there.
 

- Regarding the running totals, I would want to be able to do one (or both) of these:

Transform = results =>
from result in results
select new
{
result.Timestamp,
result.Amount,
Balance = results.Where(x=> x.Timestamp <= result.Timestamp).Sum(x=> x.Amount)
}

I would recommend against this, it is on O(N!) operation.
 

Transform = results =>
from result in results
orderby result.Timestamp
let i = IndexOf(result)
select new
{
result.Timestamp,
result.Amount,
Balance = (i == 0 ? 0 : result[i-1].Balance) + result.Amount
}

As I said, here it is better to use the indexed documents.

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 10:08:33 AM1/25/13
to ravendb
You only have access to the query results, not the entire index.

Matt Warren

unread,
Jan 25, 2013, 10:39:39 AM1/25/13
to ravendb
That makes sense, but if you're talking about all results, not just 1 page worth, couldn't that still be pretty large?


--
 
 

Matt Johnson

unread,
Jan 25, 2013, 10:58:19 AM1/25/13
to rav...@googlegroups.com
Don't get me wrong - I like the general idea.  As usual, the devil is in the details...

I'm not sure how you'd want to express it, but I think it's important that you can project fields from both the document and the index simultaneously.  This is what happens client-side right now when you use AsProjection.  You only need to store the index fields you don't have in the document, but all the time I see people indexing and storing everything even when its unnecessary for the result they are after.

The most common example is mapping the Id, even though it is already mapped implicitly.  Every so often we see the error about "__document_id is already mapped...", and the solution is to call it something other than Id - like UserId.  But then you get the same data in the index twice.  It would be great if the API guided you to not set up things like this.

I'll I'm saying is that with ResultsAreProjectionsFromIndex = true, one should still be able to get at the document properties.  Otherwise, people will index all fields in the document when they should just be indexing a few fields.

Regarding using the Indexed Properties bundle - you called it Indexed Document, is that different/new/upcoming?  If so, it would be great to put some of the same techniques you have for scripted patching to use. Rather than specifying specific fields to dump the results to, you could supply a bit of javascript that is used to write the index results to a document.  Or to a NEW document, or to multiple documents, or added to arrays or dictionaries within a document. Etc.

Then you could get multi-reduce just by having a new map/reduce index over the resulting document.  One would have to be careful about infinite loops in the chain of Doc-Index-Doc-Index-Doc-Index, etc.

Matt Warren

unread,
Jan 25, 2013, 11:19:49 AM1/25/13
to ravendb
> Regarding using the Indexed Properties bundle - you called it Indexed Document, is that different/new/upcoming?  If so, it  
> would be great to put some of the same techniques you have for scripted patching to use. Rather than specifying specific 
> fields to dump the results to, you could supply a bit of javascript that is used to write the index results to a document.  
> Or to create a NEW document, or to multiple documents, or added to arrays or dictionaries within a document. Etc.

> Then you could get multi-reduce just by having a new map/reduce index over the resulting document.  One would have to be > careful about infinite loops in the chain of Doc-Index-Doc-Index-Doc-Index, etc.

;-) that's exactly what's planned, see http://issues.hibernatingrhinos.com/issue/RavenDB-494 and http://issues.hibernatingrhinos.com/issue/RavenDB-495. I plan to work on in next week. I think that's where the new name has come from.


--
 
 

Troy

unread,
Jan 25, 2013, 11:31:49 AM1/25/13
to rav...@googlegroups.com
I thought the Indexed Document Bundle (New) was the ability to write out map/reduce results to separate docs... so you could right an index against those reduce docs?

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:19:21 PM1/25/13
to ravendb
Yes, it could be pretty large. The whole idea is that we want to allow it when you explicitly want it and are aware of what you are doing.


--
 
 

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:32:13 PM1/25/13
to ravendb
inline


On Fri, Jan 25, 2013 at 5:58 PM, Matt Johnson <mj1...@hotmail.com> wrote:
Don't get me wrong - I like the general idea.  As usual, the devil is in the details...

Yeah, this is probably one of the three big ticket items for 2.1
 

I'm not sure how you'd want to express it, but I think it's important that you can project fields from both the document and the index simultaneously.  This is what happens client-side right now when you use AsProjection.  You only need to store the index fields you don't have in the document, but all the time I see people indexing and storing everything even when its unnecessary for the result they are after.

Good point, I'll make sure to enable that, although I do want to make sure that if you don't access the document, it isn't loaded.
Not sure how possible that would be.
 

The most common example is mapping the Id, even though it is already mapped implicitly.  Every so often we see the error about "__document_id is already mapped...", and the solution is to call it something other than Id - like UserId.  But then you get the same data in the index twice.  It would be great if the API guided you to not set up things like this.

How do you get this error? This should be happening.
 

I'll I'm saying is that with ResultsAreProjectionsFromIndex = true, one should still be able to get at the document properties.  Otherwise, people will index all fields in the document when they should just be indexing a few fields.

Regarding using the Indexed Properties bundle - you called it Indexed Document, is that different/new/upcoming?  If so, it would be great to put some of the same techniques you have for scripted patching to use. Rather than specifying specific fields to dump the results to, you could supply a bit of javascript that is used to write the index results to a document.  Or to a NEW document, or to multiple documents, or added to arrays or dictionaries within a document. Etc.

That is exactly the idea that we had in mind.
 

Then you could get multi-reduce just by having a new map/reduce index over the resulting document.  One would have to be careful about infinite loops in the chain of Doc-Index-Doc-Index-Doc-Index, etc.


That is pretty much it, the infinite loop thing is complex, yes.
 
--
 
 

Oren Eini (Ayende Rahien)

unread,
Jan 25, 2013, 2:32:52 PM1/25/13
to ravendb
No, it is the ability to have a JS script that can output stuff out to docs if you want.


--
 
 

Matt Johnson

unread,
Jan 25, 2013, 2:39:12 PM1/25/13
to rav...@googlegroups.com
At the moment, when you have a transform an index, and you use a where clause in your query, the where is applied before the transform, right?  I think this is confusing because the results can be different than what you queried.

The example you gave of the possible new syntax:

session.Query<User>()
   .UsingResultTransformer<ComboBoxTransformner>()
   .Where(x=>x.IsActive) 
   .ToList();

Wouldn't it be more appropriate to use

session.Query<User>()
   .Where(x=>x.IsActive) 
   .UsingResultTransformer<ComboBoxTransformner>()
   .ToList();

If the where comes after the transform, then I expect the transform to pick up the query filter.  I don't think it will based on what you have shown so far - or will it?

regarding the "__document_id is already mapped..." error, I'll send you an issue with a repro.

Matt Johnson

unread,
Jan 26, 2013, 10:31:15 AM1/26/13
to rav...@googlegroups.com
My memory was a bit off on the error.  This was what I was referring to:

Itamar Syn-Hershko

unread,
Jan 26, 2013, 12:08:58 PM1/26/13
to rav...@googlegroups.com
On Fri, Jan 25, 2013 at 5:08 PM, Oren Eini (Ayende Rahien) <aye...@ayende.com> wrote:

I haven't thought about the client API yet, but probably something like:

session.Query<User>()
   .UsingResultTransformer<ComboBoxTransformner>()
   .Where(x=>x.IsActive) 
   .ToList(); 
 

 I would probably go with something like:

sessio.Query<User>()
    .Where(x => x.IsActive)
    .TransformResults<ResultTransformerClass, ResultingType>()
    .ToList();

And seconding Matt's comment about the order of things (Where() before transformation)
Reply all
Reply to author
Forward
0 new messages