Folks,
Rdatasets looks like a worthy effort. Vincent, would you know which csvs are for regression ?
csv/MASS/Boston.csv looks familiar .
Fwiw, some sizes --
# nrow ncol file colnames
2783 1 csv/MASS/SP500.csv r500
1000 3 csv/MASS/synth.te.csv xs ys yc
506 14 csv/MASS/Boston.csv crim zn indus chas nox rm age dis rad tax ptrati
365 9 csv/MASS/gilgais.csv pH00 pH30 pH80 e00 e30 e80 c00 c30 c80
314 2 csv/MASS/GAGurine.csv Age GAG
299 2 csv/MASS/geyser.csv waiting duration
250 3 csv/MASS/synth.tr.csv xs ys yc
205 7 csv/MASS/Melanoma.csv time status sex age year thickness ulcer
189 10 csv/MASS/birthwt.csv low age lwt race smoke ptl ht ui ftv bwt
133 2 csv/MASS/mcycle.csv times accel
114 4 csv/MASS/beav1.csv day time temp activ
100 4 csv/MASS/beav2.csv day time temp activ
...
I was looking for > say 100 observations
so that I could split them into say 2/3 train, 1/3 test,
regress Xytrain, test that on Xytest;
my goal was just to get an overview of the many *Regress in sklearn,
see
github.com/denis-bz/sklearn-regress .
But that may be silly -- I'm not a statistician, hardly know R, terrible.
Somewhat offtopic but funny (wry-funny):
http://stats.stackexchange.com/questions/1164/why-havent-robust-and-resistant-statistics-replaced-classical-techniquescheers
-- denis