14 views

Skip to first unread message

Jan 18, 2008, 11:41:06 AM1/18/08

to view.theinfo

In many applications, you have a long list of things of which you want

to highlight some interesting ones to the user. As a concrete example,

let's imagine you have a series of photos and you want to show the

user the most interesting photos in any section.

to highlight some interesting ones to the user. As a concrete example,

let's imagine you have a series of photos and you want to show the

user the most interesting photos in any section.

One traditional way of doing this is to let users vote on photos, then

sort photos by the number of votes. This is the solution employed by,

for example, bash.org. Unfortunately, it has a serious flaw: the

average user looks at the things with the most votes, likes most of

them, and gives them more votes. The result is a classic Matthew

effect: whichever items happen to end up on top stay there for just

about ever. (Furthermore, if a popular site links to one particular

item on your site, everyone votes that item up, giving it an absurd

number of votes.)

One solution to this is to rank not just on votes but on recency of

those votes. This is roughly what Reddit does. This has the advantage

that there are basically new things every day, but the disadvantage

that intra-day comparisons are basically worthless. Because traffic to

the site is growing over time, new stories always tend to have more

votes than old ones; thus the all-time hits on Reddit are totally

uninteresting.

So here's my new idea and I want to hear your thoughts:

What you really want to rank things by is the probability someone will

like it, which, if we leave personalization out for the moment, is

just the percentage of people who like it. Now obviously it's

impossible to know the percentage of people who will like an item in

advance, but each vote gives you more information about that. So

instead of sorting by the number of votes, you sort by the estimated

percentage of people who like something (which you calculate from the

votes so far using Bayes' theorem).

Now by default, this would just mean that new stories would tend to

settle in the middle of the pack, where they're unlikely to get voted

on. So when you're drawing top pages, randomly increase the expected

probability of items you're unsure about (i.e. have few votes). That

way, when users view the top stories, they'll also get a few

could-be-top-stories mixed in; they'll vote on those, you'll know

whether they're good or not, and that'll improve the rankings for next

time.

Thoughts on the concept? Help with the math?

Jan 18, 2008, 11:53:45 AM1/18/08

to view-t...@googlegroups.com

On 18/01/2008, Aaron Swartz <m...@aaronsw.com> wrote:

>

> In many applications, you have a long list of things of which you want

> to highlight some interesting ones to the user. As a concrete example,

> let's imagine you have a series of photos and you want to show the

> user the most interesting photos in any section.

>

> In many applications, you have a long list of things of which you want

> to highlight some interesting ones to the user. As a concrete example,

> let's imagine you have a series of photos and you want to show the

> user the most interesting photos in any section.

Or abstract it to 'I've selected n items given your search criteria'.

>

> One traditional way of doing this is to let users vote on photos, then

> sort photos by the number of votes.

it has a serious flaw: the

> average user looks at the things with the most votes, likes most of

> them, and gives them more votes. The result is a classic Matthew

> effect: whichever items happen to end up on top stay there for just

> about ever.

> So here's my new idea and I want to hear your thoughts:

>

> What you really want to rank things by is the probability someone will

> like it,

So

> instead of sorting by the number of votes, you sort by the estimated

> percentage of people who like something (which you calculate from the

> votes so far using Bayes' theorem).

Comparing the above... it seems to me that you're going to finish up

with the same items (excepting your random insertions)?

Votes vs 'votes so far using Bayes' ?

Is there a biggish difference I'm missing Aaron?

regards

--

Dave Pawson

XSLT XSL-FO FAQ.

http://www.dpawson.co.uk

Jan 20, 2008, 10:42:48 AM1/20/08

to view-t...@googlegroups.com

> > instead of sorting by the number of votes, you sort by the estimated

> > percentage of people who like something (which you calculate from the

> > votes so far using Bayes' theorem).

>

> Comparing the above... it seems to me that you're going to finish up

> with the same items (excepting your random insertions)?

>

> Votes vs 'votes so far using Bayes' ?

>

> Is there a biggish difference I'm missing Aaron?

> > percentage of people who like something (which you calculate from the

> > votes so far using Bayes' theorem).

>

> Comparing the above... it seems to me that you're going to finish up

> with the same items (excepting your random insertions)?

>

> Votes vs 'votes so far using Bayes' ?

>

> Is there a biggish difference I'm missing Aaron?

There are two big differences:

1. The randomness

2. The fact that it's a percentage and not a flat number

Votes leads to the runaway Matthew effect I describe, whereas

percentages cannot go above 100. And depending on whether you make it

percent-of-votes-that-are-positive or

percent-of-views-that-led-to-a-positive-vote, I think you'll see some

very different results.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu