index boost

40 views
Skip to first unread message

Michael Weber

unread,
Sep 26, 2012, 8:37:19 AM9/26/12
to rav...@googlegroups.com
I am trying to implement some boosts for fields and documents in a multimap index.  I get a compilation error on any index where I try to boost a term value and then also boost the entire document itself  (one or the other works).  Like the following:

docs.Products
.Select(x => new () {
Content = new System.Object []{ x.Name, x.Tags.Boost(1) },
Category = x.Category,
}.Boost(5)
)

I would assume that the indexer would just add the boosts together for the double boosted terms.  Is this a bug or a missing feature?

mike

Chris Marisic

unread,
Sep 26, 2012, 9:20:29 AM9/26/12
to rav...@googlegroups.com
I suppose it could add them together, but semantically it's the same as

Tags Boost 6
Category Boost 5

Michael Weber

unread,
Sep 26, 2012, 9:28:11 AM9/26/12
to rav...@googlegroups.com
That's pretty funny, I didn't even think of that.

In the real index the 5 is really a formula based on other fields from the document, I guess I could just copy that formula to each of the terms.

Chris Marisic

unread,
Sep 26, 2012, 9:34:12 AM9/26/12
to rav...@googlegroups.com
Could you post any information about what you're doing (and why) to dynamically boost content?

Michael Weber

unread,
Sep 26, 2012, 9:38:47 AM9/26/12
to rav...@googlegroups.com
Sure -- the basic idea is that if you search for a "product" then the score is based on the number of times that the product has been viewed, the number of times it's been purchased,  the number of ratings of the product and the rating of the product.  We haven't tuned the algorithm yet, but a sample would be something like

docs.Products
.Select(x => new () {
Content = new System.Object []{ x.Name },
Category = x.Category,
}.Boost((float)Math.Log10(x.ProductViewCount + 1) + (float)Math.Log10(x.ProductReviewCount*10 + 1))
)

Chris Marisic

unread,
Sep 26, 2012, 10:14:37 AM9/26/12
to rav...@googlegroups.com
That is unbelievably slick.

As a side note, you could use

let boost = ......

and then you could use a single variable instead of having the math calculated multiple times if you want to apply that initial boost to multiple things.

I wonder if there would be anyway to introduce entropy into this, such that product X that came out 1 year ago and sold 5000 items 1 year ago,  doesn't supersede product Y that came out last week and has sold 500 items. I'm not sure off hand if this would be possible inside raven, although theoretically you could refilter or reorder the results in memory to account for entropy at the client.

Kijana Woodard

unread,
Sep 26, 2012, 10:24:54 AM9/26/12
to rav...@googlegroups.com
I like this.

I'm wondering if it would be better to have a separate "product stats" document to gather all the stats and do the searching and boosting against and then include the product doc when docs are returned from the search. I say this so you can thrash your stats model without thrashing the product docs. Like if you decide to add UnitsSoldCount or whatever, you don't have to change your Product document.

Chris Marisic

unread,
Sep 26, 2012, 10:33:33 AM9/26/12
to rav...@googlegroups.com
I would likely do something along those lines if I ran a shopping site where search is incredibly live or die for your site. If users can't find stuff basically in one try, the drop off rate has to be astronomical.

Michael Weber

unread,
Sep 26, 2012, 10:39:00 AM9/26/12
to rav...@googlegroups.com
I didn't know about the let... I will have to try that.

Yeah -- we will have to worry about that in the future.  All of the the actual orders are backed by a SQL db, so it wouldn't be a problem for us to update the stats via a monthly cron job or something.  That would allow us to generate a ProductsSold and a ProductsSoldHistoric.  Then we can boost the two value differently.

Michael Weber

unread,
Sep 26, 2012, 10:40:30 AM9/26/12
to rav...@googlegroups.com
RavenDB is a document database.  We can add and remove properties from the product document whenever we want.


On Wednesday, September 26, 2012 10:24:56 AM UTC-4, Kijana Woodard wrote:

Chris Marisic

unread,
Sep 26, 2012, 10:49:07 AM9/26/12
to rav...@googlegroups.com
A cron job that updates data trends with information from a real reporting database would likely be one of the most straight forward solutions, albeit creating the dependency to that foreign db & job.

Michael Weber

unread,
Sep 26, 2012, 10:57:16 AM9/26/12
to rav...@googlegroups.com
Well for our application the dependency runs way deeper than that already :-)

Kijana Woodard

unread,
Sep 26, 2012, 11:21:11 AM9/26/12
to rav...@googlegroups.com
Yeah, but you're forgetting transaction boundaries.

You wouldn't want someone updating the product description to keep hitting 409 errors because the product is selling like hot cakes and the stats are updating....and you certainly wouldn't want to overwrite those stats with old data.
Reply all
Reply to author
Forward
0 new messages