Recommendation Algorithm Contest

30 views
Skip to first unread message

Christian Winkelmann

unread,
Nov 18, 2011, 6:19:52 AM11/18/11
to gensim
Hej all,
a while ago I evaluated the gensim simserver for a web based
recommendation engine and the progress of doing was interesting, the
results were not that good but that was not gensims simservers fault.
As of this week I want to announce that contest.plista.com has
launched, which is an algorithm contest in the tradition of the
netflix prize.
The goal is to recommend news articles with the highest relevance to
users who currently browse a website. If such a user is interested in
the article he might click on it and will read it and of cource will
receive another recommendation.

Currently there is a weekly prize money of 100EUR for those with the
highest ratio between recommended items and clicked items. As of now
there is no team participating who is not recruited from plista
employees who are of course excluded from winning anything there.

To implement a recommendation engine a participant only needs to write
a server ( Flaskserver ) is one of the easiest starting points. Then
parse incomming json request and return json again for the
recommendation.
If there is interest and the management is ok with that we can provide
a quite simple Flask server Wrapper for the simserver which allows
training, indexing and querying over http.

There is a reference implementation at git ( see the contest website )
in php which works out of the box but lacks an actual recommender.

I hope I got you interested. If there are any questions feel free to
ask me.
Regards
Christian

Radim

unread,
Nov 18, 2011, 6:58:36 AM11/18/11
to gensim
Hi Christian,

that's pretty cool! I love those contests :)


> The goal is to recommend news articles with the highest relevance to
> users who currently browse a website. If such a user is interested in
> the article he might click on it and will read it and of cource will
> receive another recommendation.

I haven't read through the rules completely, but isn't this the
standard case that sensationalist articles (inv. gore/nudity/pop) will
produce better conversions than "relevant" articles? Or do you equate
relevancy with CTR? Increasing short-term CTR (short-term profit++)
often ruins brand trust by providing poor value and annoyance to the
majority of (non-clicking) people (long-term profit--), if not done
carefully.


> If there is interest and the management is ok with that we can provide
> a quite simple Flask server Wrapper for the simserver which allows
> training, indexing and querying over http.

Sure. I found the reference implementation, but it's all PHP =)

Best,
Radim

Christian Winkelmann

unread,
Nov 18, 2011, 9:50:09 AM11/18/11
to gensim
Hi Radim,
then it would be great if you participate. As soon as I am done with
the python "transport" layer which handles the requests I will push
that into a git repository. Then everyone who's in favour of python
over php should have a starting point.
If relevancy equals ctr is the big question. A team just pushing
articles which are closest to the one currently beeing read is the
closest route to a recommendation, but that wasn't that successfull so
far, but we never used a large training corpus. Doing categorization
and then doing a regression like: category x matches best to category
y would be the next step.
Long term studies to measure the trust in the so called onsite-
recommendations can't work here because if you have to teams and one
is doing recommendations which makes users dislike the whole
recommendation widget so far would just decrease overall ctr
performance for both teams.
By the way, the website currently testing the contest ist www.ksta.de
A german local newspaper. If you browse any article you will find "Das
könnte Sie auch interessieren" and there is the widget which sometimes
gets filled by recommendations made by the teams.

Regards
Christian

Radim

unread,
Nov 18, 2011, 11:22:09 AM11/18/11
to gensim
Hi Christian,

unfortunately I don't have the time to participate, but I worked on
creating an ad targetting system (for a search engine) in the past, so
the topic interests me :)


> If relevancy equals ctr is the big question. A team just pushing
> articles which are closest to the one currently beeing read is the
> closest route to a recommendation, but that wasn't that successfull so
> far, but we never used a large training corpus. Doing categorization
> and then doing a regression like: category x matches best to category
> y would be the next step.

In my experience, content match is only good for seeding (if you have
no click data, and want to show something better than purely random
recommendations/ads). The most important signal is the past click data
for that item, possibly nuanced by aggregating the click data among
users with similar background: similar interests, similar past click
patterns, etc. From what I saw, you provide a "user id" for each
request, so perhaps this path is also viable.

tl;dr: content match is not necessarily a good click predictor (but I
never worked on news article data). I'm looking forward to following
the competition, to see what people come up with :)

Best,
Radim

Reply all
Reply to author
Forward
0 new messages