You cannot post messages because only members can post, and you are not currently a member.
Description:
Development of recommender for reddit
|
|
|
A visualization of this data
|
| |
Hi Guys, I downloaded the data you have posted here to create a visualization oh how closely different subreddits are linked to one another. Essentially what I did is for every pair of subreddits I looked for if users voted in both more often than would be expected by chance, and if so, made a link (using... more »
|
|
A topic-based subreddit recommender
|
| |
Hi all, A few of us at XPLR have released a first version of a subreddit recommender, [link] This recommender builds on top of a clustering algorithm. It is directly usable in your browser and on all Reddit pages, through a user script. See [link] for both code,... more »
|
|
Mahout
|
| |
Apache Mahout: Scalable machine learning and data mining
[link]
This is very relevant to what we're doing since it implements many
algorithms which can be used for collaborative filtering:
(Full list of algorithms: [link])... more »
|
|
plans
|
| |
hi
So it looks like basic infrastructure for link-level recommendations
is ready. It only lacks quality-measuring tool, but it isn't hard to
implement.
There are things which should be improved (like, a bigger data set),
but I think they can wait.
I plan to work on infrastructure for subreddit-level recommendations,... more »
|
|
LSI Test #1 -- Success
|
| |
So I ran my LSI code on this new data set in a 'proper' fashion: with
test-links, test-users and independent check of predictions.
So far it looks like it kinda works, although I'm now to sleepy to
interpret data properly.
Here's data for 100 users. First number is number of votes in test
set, second is number of predicted links, i.e. top 100 links, third is... more »
|
|
SVD Test #1 - Fail
|
| |
Hello Everyone,
I've loaded Alex's data into an SVD-based recommender with little
success. This is my first SVD-based project and I think I will need
to approach it a bit differently. Right now I'm doing the following
(using Python with numpy/scipy):
Create Matrix A using the training data (links as columns, users as... more »
|
|
test data set is available
|
| |
hi I've implemented a simple algorithm which splits whole data set into training and test subsets. All links are split into training and test sets. Users are not split, thus you need to make predictions for all users. Votes for links in training set are represented as is. Votes for links in test set are split into two parts -- "info" part is... more »
|
|
some data available
|
| |
Hi
I've made an extract out of public data, as assembled by Quentin's
dumper.py. Only part of data was used (I haven't collected everything
yet), but it should be enough for the start. (I'll release update
later.)
There are 5260381 votes, 2337323 links and 17261 users in this set.
[link]... more »
|
|
issue tracker
|
| |
hi
Some people asked for issue tracker.
Well, it looks like github can do that. (Although I haven't tried it
yet so I don't know whether it is convenient enough. If it's not we
can try something else. Maybe Trac.)
So here it is:
[link]
It's empty, though. As is repository. Well, somebody has to write... more »
|
|
|