Suggestion for search index

36 views
Skip to first unread message

Ben

unread,
May 8, 2012, 8:04:25 PM5/8/12
to rav...@googlegroups.com
I'm working on a search page for my application where the user can search for Jobs using a keyword along with some additional "advanced" criteria.

The following document is a simplified version of the "Vacancy" aggregate:

{
"Position": "ASP.NET Developer",
"ClientId": "clients/10",
"Salary": {
From: 30000.00,
To: 40000.00
}
"Campaigns": [
{
"Title": "ASP.NET MVC Web Developer",
"IsActive": true
"Summary": "Some summary",
"Keywords": [
"ASP",
"MVC",
"C#"
]
},
{
"Title": "ASP.NET MVC Web Developer",
"IsActive": false
"Summary": "Some old summary",
"Keywords": [
"ASP",
]
},
]
}

Each vacancy can have multiple campaigns, but only one campaign can be active at any one time (this rule is enforced by my domain layer). The keyword search should be applied to the "Active" campaign's TItle, Summary and Keywords. The advanced search will allow the user to search using a salary range.

Here's what I'm doing currently:

    public class Jobs_Search : AbstractIndexCreationTask<Vacancy, Jobs_Search.ReduceResult>
    {
        public Jobs_Search()
        {
            Map = vacancies => from v in vacancies
                               from c0 in v.Campaigns
                               let c = (WebsiteCampaign)c0
                               select new
                               {
                                   Id = v.Id,
                                   StartDate = c.CampaignPeriod.StartDate,
                                   EndDate = c.CampaignPeriod.EndDate,
                                   Salary = v.Salary.From, // always search against the lower range
                                   Query = new object[] {
                                        c.Title,
                                        c.Description,
                                        c.Summary,
                                        c.Description
                                   }
                               };

            Index(x => x.Query, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
        }

        public class ReduceResult
        {
            public DateTime StartDate { get; set; }
            public DateTime EndDate { get; set; }
            public decimal Salary { get; set; } 
            public string Query { get; set; }
            public int Id { get; set; }
        }        
    }

        public ActionResult Search(string q, decimal? salaryFrom, decimal? salaryTo, int? page)
        {
            var query = session.Query<Jobs_Search.ReduceResult, Jobs_Search>()
                              .Where(x => x.StartDate <= DateTime.UtcNow && x.EndDate >= DateTime.UtcNow)
                              .Search(x => x.Query, q);

            if (salaryFrom.HasValue)
                query = query.Where(x => x.Salary >= salaryFrom);

            if (salaryTo.HasValue)
                query = query.Where(x => x.Salary <= salaryTo);

            var jobs = query.As<Vacancy>().ToList();           

            return View(jobs);
        }

This is working fine.

What I'm not particularly happy with is that I then have to flatten out my vacancy, finding the active campaign and building my viewmodel.

Something I'm still struggling with is whether I can/should denormalize my data within an index. It seems a good way of encapsulating a query, as with the above example, I always want a Vacancy with it's Active campaign. It would be nice if I could return something like the following from the server:

{
"Title": "ASP.NET Developer",
"SalaryFrom": 30000.00,
"SalaryTo": 40000.00
"CampaignTitle": "ASP.NET MVC Web Developer",
"CampaignSummary": "Some summary",
"Keywords": [
"ASP",
"MVC",
"C#"
}

Ben

unread,
May 8, 2012, 8:08:42 PM5/8/12
to rav...@googlegroups.com
P.S.

You can ignore this mess:

                               from c0 in v.Campaigns
                               let c = (WebsiteCampaign)c0

I was testing storing different subclasses within the same document. You can assume they will all be of the same type.

Matt Warren

unread,
May 9, 2012, 5:49:00 AM5/9/12
to rav...@googlegroups.com
Just a couple of things:

What I'm not particularly happy with is that I then have to flatten out my vacancy, finding the active campaign and building my viewmodel.

You can use TransformResults to help with this, see http://ayende.com/blog/4661/ravendb-live-projections-or-how-to-do-joins-in-a-non-relational-database for more info.

For instance something like this:
                  TransformResults = (database, results) => from result in results
                               let activeCampaign = result.Campaigns.FirstOrDefault(x => x.IsActive)
                               select new {
                                            Position = result.Position, 
                                            CampaignTitle = activeCampaign.Title,
                                            Keywords = activeCampaign.Keywords,
                                            ....
                                          }; 

> from c0 in v.Campaigns
>    let c = (WebsiteCampaign)c0
 
I know you said ignore this, but a more general point is that casts like that are never going to work. The index is executed on the server which knows nothing about your CLR types. At that point everything is dynamic, so it should just work.

Ben

unread,
May 9, 2012, 7:31:11 AM5/9/12
to rav...@googlegroups.com
Matt, I had to do the cast in order to access properties specific to that campaign type, in order to build the index.

I'll give transform results a shot. Thanks.

Matt Warren

unread,
May 9, 2012, 7:45:49 AM5/9/12
to rav...@googlegroups.com
Okay, in that case the casting does make some sense.

Ben

unread,
May 9, 2012, 12:54:40 PM5/9/12
to rav...@googlegroups.com
Matt, 

This is what I went for in the end. Unfortunately your example did not work exactly as I had to provide a reduce result in order to search against the multiple keyword properties (title, summary etc.). This meant I had to load the vacancy explicitly within the transform results. Not sure if this is the most efficient approach but it seems pretty quick:

    public class Jobs_Search : AbstractIndexCreationTask<Vacancy, Jobs_Search.Index>
    {
        public Jobs_Search()
        {
            Map = vacancies => from v in vacancies
                               from c in v.Campaigns
                               select new
                               {
                                   Id = v.Id,
                                   JobCategoryId = v.JobCategoryId,
                                   Salary = v.Salary.From,
                                   Location = new object[] {
                                        v.Location.TownCity,
                                        v.Location.County
                                   },
                                   Keywords = new object[] {
                                        c.Title,
                                        c.Keywords,
                                        c.Summary,
                                        c.Description
                                   }
                               };

            Index(x => x.Location, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
            Index(x => x.Keywords, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);

            TransformResults = (store, results) =>
                               from result in results
                               let vacancy = store.Load<Vacancy>(result.Id)
                               let activeCampaign = vacancy.Campaigns.FirstOrDefault() // need to get active campaign
                               where activeCampaign != null
                               select new
                               {
                                   Id = result.Id,
                                   Slug = activeCampaign.Slug,
                                   JobCategoryId = vacancy.JobCategoryId,
                                   Title = activeCampaign.Title,
                                   Summary = activeCampaign.Summary,
                                   TownCity = vacancy.Location.TownCity,
                                   County = vacancy.Location.County,
                                   SalaryFrom = vacancy.Salary.From,
                                   SalaryTo = vacancy.Salary.To
                               };
        }

        public class Index
        {
            public string Id { get; set; }
            public string JobCategoryId { get; set; }
            public decimal Salary { get; set; }
            public string Location { get; set; }
            public string Keywords { get; set; }
        }
    }

I'm transforming this to:

    public class JobSummary
    {
        public string Id { get; set; }
        public string Slug { get; set; }
        public string JobCategoryId { get; set; }
        public string Title { get; set; }
        public string Summary { get; set; }
        public string TownCity { get; set; }
        public string County { get; set; }
        public decimal SalaryFrom { get; set; }
        public decimal SalaryTo { get; set; }
    }

And here's the query:

        [HttpGet]
        public ActionResult Search(JobsSearchCriteria criteria)
        {
            if (!criteria.HasCriteria)
                return View(new JobsSearchModel());
            
            var query = session.Query<Jobs_Search.Index, Jobs_Search>()
                .Search(x => x.Keywords, criteria.q)
                .Search(x => x.Location, criteria.q);

            var model = new JobsSearchModel
            {
                Criteria = criteria,
                Results = query.As<JobSummary>().ToList()
            };

             return View(model);
        }

I chose to index the location and keywords separately as I may want to provide additional boosting to the location.

Still amazed that this works :)
Reply all
Reply to author
Forward
0 new messages