Struggling with complex index

60 views
Skip to first unread message

Ben

unread,
May 10, 2012, 6:18:07 PM5/10/12
to rav...@googlegroups.com
It may be that I'm trying to achieve too much with one index. Here are my documents:


clients/1
{
"Name" : "Fabrikam"
}

vacancies/2
{
"Position" : "Developer Genius Guy",
"ClientId" : "clients/1"
}

candidates/1
{
"Name": "John Doe"
}

candidates/2 
{
"Name": "Bill Jones"
}

vacancyapplications/3
{
"VacancyId" : "vacancies/2",
"CandidateId" : "candidates/1",
"State" : "Unapproved"
}

vacancyapplications/4
{
"VacancyId" : "vacancies/2",
"CandidateId" : "candidates/2",
"State" : "Approved"
}

You can see that:

Vacancy holds a reference to Client. 
VacancyApplication holds a reference to Vacancy and Candidate

Originally I wanted to:

GET ALL VACANCY APPLICATIONS THAT ARE UNAPPROVED

I want the following information:

  • ApplicationId
  • VacancyId
  • CandidateId
  • ClientId
  • ClientName
  • Position
  • CandidateName
  • StateId
Originally I was doing this with a regular index on State (since this was the only thing I was actually querying) and then getting all the other information in the transform results: This allowed me to effectively get from VacancyApplication to Client (via Vacancy):

    public class VacancyApplications_Summary : AbstractIndexCreationTask<VacancyApplication, VacancyApplicationSummary>
    {
        public VacancyApplications_Summary()
        {
            Map = applications => from a in applications
                                  select new
                                  {
                                      StateId = a.StateId,
                                  };

            TransformResults = (store, results) => from result in results
                                                   let candidate = store.Load<Candidate>(result.CandidateId)
                                                   let vacancy = store.Load<Vacancy>(result.VacancyId)
                                                   let client = store.Load<Client>(vacancy.ClientId)
                                                   orderby result.ReceivedDate descending
                                                   select new
                                                   {
                                                       Id = result.Id,
                                                       VacancyId = vacancy.Id,
                                                       CandidateId = candidate.Id,
                                                       ClientId = client.Id,
                                                       ClientName = client.Profile.Name,
                                                       Position = vacancy.Position,
                                                       CandidateName = candidate.ContactDetails.FullName,
                                                       ReceivedDate = result.ReceivedDate,
                                                       StateId = result.StateId
                                                   };

        }
    }

The problem came when I needed to answer the following:

GET ME ALL VACANCIES FOR CLIENT 1

Of course if I try and query on ClientId I get:

System.ArgumentException: The field 'ClientId' is not indexed, cannot query on fields that are not indexed


because quite rightly, I am not indexing ClientId.

So I figured that maybe a multi map index is the way to go for this, something like:

            AddMap<VacancyApplication>(applications => from a in applications
                                                       select new
                                                       {
                                                           Id = a.Id,
                                                           VacancyId = a.VacancyId,
                                                           CandidateId = a.CandidateId,
                                                           ClientId = (string)null,
                                                           StateId = a.StateId,
                                                           ReceivedDate = a.ReceivedDate,
                                                           Position = (string)null
                                                       });

            AddMap<Vacancy>(vacancies => from v in vacancies
                                         select new
                                         {
                                             Id = (string)null,
                                             VacancyId = v.Id,
                                             CandidateId = (string)null,
                                             ClientId = v.ClientId,
                                             StateId = (string)null,
                                             ReceivedDate = DateTime.MinValue,
                                             Position = v.Position
                                         });

The thing is, I'm not sure how to group / reduce these results.

With the above documents I would end up 2 VacancyApplications and 1 Vacancy in the results. How can I reduce them so that I get 2 results (the applications) with the information I need from the Vacancy?




Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 4:42:18 AM5/11/12
to rav...@googlegroups.com
Why not just use:

     Map = applications => from a in applications
                                  select new
                                  {
                                      StateId = a.StateId,
                                      a.ClientId
                                  };

Ryan Heath

unread,
May 11, 2012, 5:38:53 AM5/11/12
to rav...@googlegroups.com
Oren,

This assumes ClientId within the application document.
Would it be possible to use Load in the map when the document did not
have the clientid?

// Ryan

Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 5:53:56 AM5/11/12
to rav...@googlegroups.com
Oh, I see, I was confused because I thought CandidateId and ClientId where the same.

AddMap<VacancyApplication>(applications => from a in applications
                                           select new
                                           {
                                               VacancyId = a.VacancyId,
                                               CandidateIds = new[] {a.CandidateId},
                                               ClientId = (string)null,
                                           });

AddMap<Vacancy>(vacancies => from v in vacancies
                             select new
                             {
                                 VacancyId = v.Id,
                                 CandidateIds = new string[0],
                                 ClientId = v.ClientId,
                               
                             });

Reduce = results => from result in results
                    group result by result.VacancyId into g
                    select new 
                    {
                      VacancyId = g.Key,
                      CandidateIds = g.SelectMany(x=>x.CandidateIds),
                      ClientId = g.Select(x=>x.ClientId).Where(x=>x != null).FirstOrDefault()

Ryan Heath

unread,
May 11, 2012, 6:22:46 AM5/11/12
to rav...@googlegroups.com
Ah I see what you are doing, but now we are grouping the candidates
into one result per vacancy, isn't it?
How can we solve it when we still want separate documents per
vacancyapplication like it was in the original case?

// Ryan

On Fri, May 11, 2012 at 11:53 AM, Oren Eini (Ayende Rahien)

Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 6:39:52 AM5/11/12
to rav...@googlegroups.com
Why do you care? You can search on this, and include the applications as well

Ben

unread,
May 11, 2012, 7:02:08 AM5/11/12
to rav...@googlegroups.com
The point is that I want to show a list of vacancy applications for a client.

If we group on VacancyId and the vacancy has 2 applications, we will actually only get one result.

It may be that this can't be achieved with an index and that I'll have to do something like this instead (loading the vacancies, then doing an "In" query to get the applications):

            var vacancies = (from v in session.Query<Vacancy>()
                             where v.ClientId == ctx.User.BusinessEntityId
                             orderby v.DateSubmitted descending
                             select new VacancySummary
                             {
                                 Id = v.Id,
                                 Position = v.Position,
                                 DateSubmitted = v.DateSubmitted,
                                 StateId = v.StateId,
                             }).Take(5).ToList(); // 5 latest vacancies

            var applicationCounts = session
                .Query<Vacancies_ApplicationCount.ReduceResult, Vacancies_ApplicationCount>()
                .Where(a => a.VacancyId.In(vacancies.Select(v => v.Id)));

            foreach (var vacancy in vacancies)
            {
                var applicationCount = applicationCounts.FirstOrDefault(x => x.VacancyId == vacancy.Id);
                vacancy.ApplicationCount = applicationCount != null ? applicationCount.ApplicationCount : 0;
            }

Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 7:19:49 AM5/11/12
to rav...@googlegroups.com
Let us go back to the drawing board.
What are the input & output that you actually need?

Ben

unread,
May 11, 2012, 7:59:48 AM5/11/12
to rav...@googlegroups.com
Input is the documents I posted at the beginning of the thread.

Output is:

  • ApplicationId
  • VacancyId
  • CandidateId
  • ClientId
  • ClientName
  • Position
  • CandidateName
  • StateId

Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 8:03:18 AM5/11/12
to rav...@googlegroups.com
What are you search by?

Ben

unread,
May 11, 2012, 8:07:34 AM5/11/12
to rav...@googlegroups.com
Searching by StateId and ClientId.

This is why I had a problem with my original index as ClientId was not available in VacancyApplication so I could not search on it. I need some way of including it in the index.

Oren Eini (Ayende Rahien)

unread,
May 11, 2012, 9:00:42 AM5/11/12
to rav...@googlegroups.com
Okay, here it is:


The index:

AddMap<VacancyApplication>(applications => from a in applications
                                           select new
                                           {
                                               VacancyId = a.VacancyId,
                                               a.State,
                                               ApplicationIds = new []{ a.Id },
                                               CandidateIds = new[] {a.CandidateId},
                                               ClientId = (string)null,
                                           });

AddMap<Vacancy>(vacancies => from v in vacancies
from state in new[]{"Approved", "Unapproved"}
                             select new
                             {
                                 VacancyId = v.Id,
                                 State = state
                                 ApplicationIds = new string[0],
                                 CandidateIds = new string[0],
                                 ClientId = v.ClientId,
                               
                             });

Reduce = results => from result in results
                    group result by new {result.VacancyId, result.State} into g
                    select new 
                    {
                      g.Key.VacancyId,
                      g.Key.State
                      ApplicationIds = g.SelectMany(x=>x.ApplicationIds),
                      CandidateIds = g.SelectMany(x=>x.CandidateIds),
                      ClientId = g.Select(x=>x.ClientId).Where(x=>x != null).FirstOrDefault()



The query:

var results = session.Query<ReduceResult, Index>()
                .Include(x=>x.ApplicationIds)
                .Include(x=>x.VacancyId)
                .Include(x=>x.CandidateIds)
                .Include(x=>ClientId)
                .Where(x=>x.ClientId == clientId && x.State == "Approved")
                .ToList();

foreach(var result in results)
{
  foreach(var appId in result.ApplicationIds)
  {
    var v = session.Load<Vacancy>(result.VacancyId);
    var va = session.Load<VacancyApplication>(appId);
    yield return new
    {
      ApplicationId = appId,
      VacancyId = result.VacancyId,
      CandidateId = va.CandidateId,
      ClientId = result.ClientId, 
      ClientName = session.Load<Client>(result.ClientId),
      Position = v.Position,
      CandidateName = session.Load<Candidate>(va.CandidateId).Name,
      StateId = result.State
    }
  }
}


Note that you can also do this in the TransformResults, instead of on th eclient.

Ben

unread,
May 14, 2012, 4:29:02 PM5/14/12
to rav...@googlegroups.com
Thanks, this works great.
Reply all
Reply to author
Forward
0 new messages