MLcomp seems awesome, why isn't it more popular?

101 views

Skip to first unread message

rrenaud

unread,

Jul 28, 2011, 12:47:13 AM7/28/11

to MLcomp

The idea seems great, the execution is good. Why isn't this more
popular?

Do you suspect that lots of people developing ML algs "cheat" by over
tuning hyper parameters or under tuning competitors, and keeping them
honest would mean less published research?

Is the 200 MB limit just too small for "real" datasets?

Percy Liang

unread,

Aug 1, 2011, 10:58:02 AM8/1/11

to mlc...@googlegroups.com

These are all good questions.

On Thu, Jul 28, 2011 at 12:47 AM, rrenaud <rre...@gmail.com> wrote:
> The idea seems great, the execution is good. Why isn't this more
> popular?

I wish I knew the answer to this. Here are some thoughts:

First, we simply haven't done the proper evangelism - writing about
it, giving tutorials and talks, etc. I think many people just don't
know about it. Please help spread the word. :-)

Second, people who design new ML algorithms (e.g., those who go to
ICML, NIPS, etc.) often work on problems (e.g., sparsity, topic
modeling) which are not just standard classification or fit one of our
standard "MLcomp domains". However, MLcomp is extensible and it would
be easy to create a new domain for that non-standard problem (just ask
Benoit Favre, one of our serious users). If people want something
like this, let us know, and we'll make it happen.

This all being said, several people who have used MLcomp say that it
has been very helpful - that it's actually easier to run experiments
using MLcomp. This should be a strong incentive - you don't have to
find/process your own datasets - everything is already on MLcomp in a
standard format. And as Jake mentioned, many people are using MLcomp
as an algorithm/dataset repository, and this activity isn't visible on
the site.

I just want to mention one thing that I thought was very cool:
recently, Christian Raymond (pepin_de_landen) uploaded an algorithm
(bonzaiboost) that works really well across over a hundred datasets.
Meanwhile, several people had independently uploaded datasets, and
Christian's bonzaiboost worked the best on them. This really is a
paragon of how MLcomp was meant to do: serve as a marketplace for
people with algorithms and people with datasets to "meet" and
accomplish more immediately than if they had just worked separately.

> Do you suspect that lots of people developing ML algs "cheat" by over
> tuning hyper parameters or under tuning competitors, and keeping them
> honest would mean less published research?

Sometimes it's not even blatant cheating/unfair hyperparameter tuning,
but bias in the choice of the dataset or training conditions - people
choose conditions under which their algorithm tends to work better.
While there is nothing wrong with this per se (because after all,
people should show when their algorithm works), pretending/implying
that the algorithm is better in all settings is a bit misleading.
It's unreasonable to expect that a single algorithm is uniformly
better than everything else in every single way - I think the
interesting question is understanding which regimes one algorithm is
better than another.

> Is the 200 MB limit just too small for "real" datasets?

Many "real" datasets are indeed larger than 200 MB, and in the future,
we'd like to increase our limit, especially if there's demand for it.
For now, one can always subsample and upload a smaller dataset. Then
the real question is whether the subsample is representative in the
following sense: suppose I have two algorithms A1, A2, a large dataset
D and a subsample d. Does error(A1,d) < error(A2,d) imply error(A1,D)
< error(A2,D)? Of course the answer depends on properties of
A1,A2,D,d (and one probably can even say something formal about this),
but the point is that if the implication holds, then d is in some
sense a good substitute for D. Sometimes a dataset is really huge,
but the intrinsic complexity isn't really that large because there's a
lot of redundancy, in which case the size makes things appear scarier
than they really are.