Question: HW1 3.5

Yitong Zhou

unread,

Feb 4, 2013, 7:29:26 PM2/4/13

to 10-701-spri...@googlegroups.com

This section requires " picking 1%, 2%, 5% ~~~ 100% of data and compare performances".

But referring to a certain percentage of data, how should I separate training and testing dataset accordingly? Does it mean I should leave out 1%,2%,....100% data for test and the rest for train, or it means I should pick out 1% of my data for testing and training in total and maybe do a 10-fold in it?

Thanks,

Yitong Zhou

David Fouhey

unread,

Feb 5, 2013, 2:53:09 PM2/5/13

to Yitong Zhou, 10-701-spri...@googlegroups.com

I'm curious about this as well -- is either interpretation of this
question ok, or is there a preferred interpretation?

Thanks,
David

> --
> http://alex.smola.org/teaching/cmu2013-10-701 (course website)
> http://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYwKreGu4b4kMU9
> (YouTube playlist)
> ---
> You received this message because you are subscribed to the Google Groups
> "10-701 Spring 2013 CMU" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 10-701-spring-201...@googlegroups.com.
> To post to this group, send email to
> 10-701-spri...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

zhuoc

unread,

Feb 5, 2013, 8:14:44 PM2/5/13

to 10-701-spri...@googlegroups.com

Anyone answering this question?

matin...@gmail.com

unread,

Feb 5, 2013, 8:15:15 PM2/5/13

to zhuoc, 10-701-spri...@googlegroups.com

The way I and some of my friends interpret this was that the percentages refer to the whole number of data you will use, and then applying 10-fold CV on that fraction of data.

Best,
Matineh

Sent via BlackBerry from T-Mobile

From: zhuoc <czx...@gmail.com>

Sender: 10-701-spri...@googlegroups.com

Date: Tue, 5 Feb 2013 17:14:44 -0800 (PST)

To: <10-701-spri...@googlegroups.com>

Subject: [10-701-spring-2013 QA] Re: Question: HW1 3.5

--

Vagelis Papalexakis

unread,

Feb 5, 2013, 8:28:01 PM2/5/13

to matin...@gmail.com, 10-701-spri...@googlegroups.com, zhuoc

I independently assumed the same thing.

So I guess that since more people did it this way, it's a reasonable thing to do (although this conclusion is not always right :-) )

Vagelis

milad memarzadeh

unread,

Feb 5, 2013, 8:30:26 PM2/5/13

to Vagelis Papalexakis, Matineh Eybpoosh, 10-701-spri...@googlegroups.com, zhuoc

I hope this is correct, because I'm not gonna run the entire thing again :) That means first choose 1% of your data and do the k-fold cross validation on them, then repeat it with more data until you do it for the full dataset.

Hope it helps

Milad Memarzadeh, M.Sc.
Doctoral Candidate, Advanced Infrastructure Systems
Department of Civil and Environmental Engineering
Carnegie Mellon University

Leila Wehbe

unread,

Feb 5, 2013, 8:31:17 PM2/5/13

to Vagelis Papalexakis, matin...@gmail.com, 10-701-spri...@googlegroups.com, zhuoc

Sorry guys I replied with my cs address so the email didn't go through:

You can do both. The better option is to split your original data into train and test set (for example 2000 and 1000 points) and use1%, 2% ...100%. But you can also do the entire cross validation data on 1%, 2% etc...

Thanks

Leila

On Tue, Feb 5, 2013 at 8:28 PM, Vagelis Papalexakis <vagelis.p...@gmail.com> wrote:

Reply all

Reply to author

Forward