Building Recommendation System

Anil

unread,

Mar 25, 2013, 5:49:47 AM3/25/13

to wncc...@googlegroups.com

Scenario :

I have set of users and a set of items.

To Do:

When user is viewing any item - I wish to recommend to him a set of items to also take a look at.

One approach is content based recommendation - Essentially use the entire text content about item and a search engine like elastic search to find similar to title given ?.

- Does anyone know about how exactly do real world applications like Amazon ? Something smaller like slideshare implement their recommendation system ?

- Does anyone know of package / library for this ? ( I found crab , pysuggest , django-recommends but none of them are really popular )

Adwait Dongare

unread,

Mar 25, 2013, 5:53:21 AM3/25/13

to wncc...@googlegroups.com

Have a look at the NetFlix challenge which had the exact same problem
statement. The winning team was required to publish their findings in
the end for a million dollars. I think a team from AT&T Bell won it.

I'm not sure if there is a direct library for it.

> --
> --
> The website for the club is http://stab-iitb.org/wncc
> To post to this group, send email to wncc...@googlegroups.com
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Web and Coding Club IIT Bombay" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to wncc_iitb+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

munish minia

unread,

Mar 25, 2013, 5:53:50 AM3/25/13

to wncc...@googlegroups.com

you can suggest him the previously seen items by him (you might have to make a record of that), or

do boolean "OR" search of the title of the item he is viewing. give the suggestion based on the result you get.

Regards,

Munish Minia

On Mon, Mar 25, 2013 at 3:19 PM, Anil <anilas...@gmail.com> wrote:

--

aayush singhal

unread,

Mar 25, 2013, 5:58:28 AM3/25/13

to wncc...@googlegroups.com

'collaborative filtering' is the term from machine learning

Anil Shanbhag

unread,

Mar 25, 2013, 6:01:59 AM3/25/13

to wncc...@googlegroups.com

@Munish - I think there is a difference between a recommendation system which sorts entities vs full text search which is comparing on the basis on just plain text.

@Adwait - Sounds interesting - Reading it now

http://www.netflixprize.com/

@Aayush - Yes there are two approaches - content based / collaborative filtering. This is however more on the theoretical side.

The question was more of - how does one in real life - implement these systems ?

--

Anil Shanbhag

Manager, WnCC

Avishek Dan

unread,

Mar 25, 2013, 6:03:06 AM3/25/13

to wncc...@googlegroups.com

Try this

http://research.yahoo.com/files/korenBellChapterSpringer.pdf

On 25 March 2013 15:31, Avishek Dan <avish...@gmail.com> wrote:

Collaborative filtering

--
Avishek Dan
M. Tech. II

PG Coordinator, Web n Coding Club

Center for Indian Language Technology
Dept. of Computer Science and Engineering
Indian Institute of Technology Bombay
Email: avish...@gmail.com

http://www.cse.iitb.ac.in/~avishekdan/

--

Avishek Dan

M. Tech. II

PG Coordinator, Web n Coding Club

Center for Indian Language Technology

Dept. of Computer Science and Engineering

Indian Institute of Technology Bombay

Email: avish...@gmail.com

http://www.cse.iitb.ac.in/~avishekdan/

Tanuj Bhojwani

unread,

Mar 25, 2013, 7:05:48 AM3/25/13

to wncc...@googlegroups.com

@Anil :

Your problem as it stands is pretty open ended.

If you look at Reddit's algorithm (which btw was designed by Randall Munroe of xkcd fame) to choose what goes on the front page, or Hacker News' algorithm, they operate on a few simple characteristics. Submission times and number of upvotes/downvotes. They realize that this is all the data they can gather and use reliably. The math here is pretty simple, and you can see the code on the link. And if you're a redditor, you know that it works.

On the other end of the spectrum, Netflix had a very different challenge. They had tremendous amount of usage data and also very rich meta data on each movie, which meant hundreds of characteristics. A bunch of people from Opera Solutions won second place in the Netflix Challenge, and recently acquired the firm that came first. I wasn't in Opera at the time, but had the pleasure of attending a talk by a bunch of people who were on the team that executed the Netflix Prize.

The first step as you can imagine is a bit of modelling on what possible characteristics actually contribute to the result, and what are noise. This part is an art more than science. The sub-teams that were working on the problem had differing opinions on what worked and what didn't. What they eventually did was very interesting, "The Ensemble" was basically just that, an ensemble of all possible algorithms that showed some promise. It let each algorithm work on the test set and come up with a result. The result of each algorithm was weighted and summed. The weights themselves were decided by a Machine Learning algorithm.

Also, there's an interesting idea that Stephen Fry(no link here, if you don't know who he is, please die)
http://youtu.be/4byn2CIwec0?t=8m20s

Why recommend only what you know people will like?

Regards,

Tanuj Bhojwani
+91 98671 04169

Anil

unread,

Mar 25, 2013, 11:57:55 AM3/25/13

to wncc...@googlegroups.com

@Tanuj : Awesome links. Liked the "Why recommend only what you know people will like?" - also mentioned in Stephen Fry's video.

I was aware of the ranking algorithm for hacker news and reddit. The main question was how does one do the implementation. Unlike in reddit - there is metadata + it is recommendation per item instead of ranking the entire index (More like NetFlix).

On the implementation notes - I will have a peek inside reddit source. However I was wondering if there exists - maybe even slightly naive implementation of Netflix style recommendation system's source code to look at.

Mayank Singhal

unread,

Mar 25, 2013, 3:24:35 PM3/25/13

to wncc_iitb

Did you have a look at Weka and Mahout (Part of Hadoop project)?

Mayank Singhal

--

zubin mehta

unread,

Mar 25, 2013, 4:35:38 PM3/25/13

to wncc...@googlegroups.com

You can look at this one. Not sure if it is actively developed anymore, but it has a very clean API.

http://muricoca.github.com/crab/tutorial.html

Also, take a look at the recommender systems chapter from the coursera Andrew Ng's ML course. Requires some lin alg.

You can easily code it up using the math shown in the videos. Extra points for a completely vectorised implementation!

--
Zubin Mehta
http://zubinmehta.wordpress.com

--
"For millions of years
mankind lived just like the animals.
Then something happenend
which unleashed the power of our imagination.
We learned to talk."

Reply all

Reply to author

Forward