Lesson 4.3 Activities 8,9,10

84 views
Skip to first unread message

Juan Sierra Pons

unread,
Mar 26, 2014, 11:56:25 AM3/26/14
to wekamooc...@googlegroups.com
Hi

I am following the course without any problem till the activities 8, 9 and 10 of the lesson 4.3

I cannot follow what the questions are asking related with what I have understand from the lesson.

Any hint is welcomed

Thanks

Best regards

--------------------------------------------------------------------------------------
Juan Sierra Pons                                 ju...@elsotanillo.net
Linux User Registered: #257202      
Web: http://www.elsotanillo.net Git: http://www.github.com/juasiepo
GPG key = 0xA110F4FE
Key Fingerprint = DF53 7415 0936 244E 9B00  6E66 E934 3406 A110 F4FE
--------------------------------------------------------------------------------------

Birone L

unread,
Mar 26, 2014, 6:14:59 PM3/26/14
to wekamooc...@googlegroups.com
I remember these were trickier than almost of the other questions up to that point. I'll look at that part again & see if I can offer a hint of some kind tomorrow.

Birone L

unread,
Mar 27, 2014, 5:48:23 AM3/27/14
to wekamooc...@googlegroups.com
Ok,  the questions actually do a good job leading people through a somewhat convoluted thought process - otherwise I wouldn't have followed. But it's easy to get lost; maybe this will help.

First, click the edit tab& look at the data. What order are the nominal classes in? Then do 1-7 (during 3 5 & 7, be sure to switch on output predictions - more options in classifiers tab, I just forgot...). After applying the filter each time look at the class value in the data...

After doing 1-7 review the two types of multiresponse methods described on p19 of the slides:
Training: perform a regression for each class
– Set output to 1 for training instances that belong to the class,
0 for instances that don’t
 Prediction: choose the class with the largest output.
... or use “pairwise linear regression”, which performs a regression for 
every pair of classes
Which type have we done (implicitly) to get at the answers for qs 3 5 & 7? (Notice that because the MakeIndicator filter defaults to the last class value first, the questions have asked us to do regression use class values of 1 for virginica first, then versicolor, then setosa  - the reverse of the order they appear in the dataset.)

So...

Q8: how will the multiresponse method we've done the groundwork/preparation for predict the class?  Based on the results summaries, which class is being predicted (fitted) worst by the model we've fitted? The one that's being predicted worst is more likely to have wildly higher/lower predicted values than it 'should'...

For Q9 & Q10, it's helpful to copy-paste the rows for the first four predictions for each regression into a text file & with the nominal class value that's getting a 1 numeric value identified.

Q9: which column in the table is used in the multiresponse method?  (See brief description above.) How will the figure in this column in the table for multiresponse's predicted outcome compare to the equivalent figures for the ones it doesn't predict? (Hint - see description of the multiresponse method.) & How do we then know if the prediction was right? (Hint - do your three tables have ones in same rows of the actual column?)

Q10: We need to use filter that adds an an attribute identifying each instance. And we need to select an option that outputs that additional attribute when results are reported...

HTH!

(Bonus Q: check your answer to Q10 by outputing & finding the number of the instance we worked out would be incorrectly classified in Q9. You'll need to select an option by flling in a field with a number; the name causes Weka to hang on my system.)

Juan Sierra Pons

unread,
Mar 27, 2014, 12:48:58 PM3/27/14
to wekamooc-general
Hi,

Finally I got the Q.8. by myself,

The problem with the Q.9 was that with the cross-validation set to 10
gives 10 iterations of the predictions and I wasn't sure which one to
choose so I was confused in the copy & paste part.

And still figuring the Q.10 because when I add the attribute using the
AddID ,the predictions change and I don't see the point in here.

Thanks four your time.

Best regards
--------------------------------------------------------------------------------------
Juan Sierra Pons ju...@elsotanillo.net
Linux User Registered: #257202
Web: http://www.elsotanillo.net Git: http://www.github.com/juasiepo
GPG key = 0xA110F4FE
Key Fingerprint = DF53 7415 0936 244E 9B00 6E66 E934 3406 A110 F4FE
--------------------------------------------------------------------------------------


> --
> You received this message because you are subscribed to the Google Groups
> "WekaMOOC-general" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to wekamooc-gener...@googlegroups.com.
> To post to this group, send email to wekamooc...@googlegroups.com.
> Visit this group at http://groups.google.com/group/wekamooc-general.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/wekamooc-general/23e4e2ad-f4db-4f01-a035-4a5d28aed42f%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Birone L

unread,
Mar 27, 2014, 5:34:24 PM3/27/14
to wekamooc...@googlegroups.com
The problem with the Q.9 was that with the cross-validation set to 10
gives 10 iterations of the predictions and I wasn't sure which one to
choose so I was confused  in the copy & paste part.
Maybe the question would be clearer phrased like this: "Multiresponse linear regression will make just one error on the first four instances (of the total of 150 used in the 10 fold cross-validation)?
 
And still figuring the Q.10 because when I add the attribute using the
AddID ,the predictions change and I don't see the point in here.
Oops - I didn't notice the predictions changed before. (I guess this is an unintentional example of spurious correlation, arising because the difference between the three species is basically petal size, and the observations happen to be grouped with the larger species having higher ID numbers...) But I don't think the change in the predictions matters - we're just running the regression with the ID so we can identify the instance mistakenly identified in Q9; the order depends on the random number seed, so it's fixed between runs on the datasets with the same number of instance I think. If you check the order of 1s & 0s with and without the ID in the dataset, they're the same - at least the first few and last are...

Tyler Neill

unread,
Feb 24, 2017, 10:30:46 AM2/24/17
to WekaMOOC-general
Activity 4.3.9:
"Weka outputs predictions in the shuffled order that is used by the cross-validation, not in the instances' original order..."

Birone:
"...the order depends on the random number seed, so it's fixed between runs on the datasets with the same number of instance I think."

Exactly. This was what caused me a lot of confusion in answering question 4.3.9. I forgot that the "shuffling" is remarkably consistent, as long as the "random seed" value remains the same. Because it is, it is then possible, in this case, to ask about "the first four instances (of the total of 150 instances used in the 10-fold cross-validation)" and actually mean the first four instances in the first (consistently shuffled) "fold" for each of the three models. My more naive assumption was that this "first four instances (out of the total...)" was referring to the original first four, unshuffled instances in the dataset (which are, then, all setosas), and it took me a long time to figure out that this was not the point of the question—nor necessarily possible to answer. A wild goose chase, that is, because of the wording of the question.

First major hang-up in the entire course—with which I'm largely very happy, btw!

Cheers
-Tyler

Ian Witten

unread,
Mar 1, 2017, 10:54:59 PM3/1/17
to wekamooc...@googlegroups.com
Thanks! I’ve clarified this for the upcoming version of the course.
ian

--
You received this message because you are subscribed to the Google Groups "WekaMOOC-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wekamooc-gener...@googlegroups.com.
To post to this group, send email to wekamooc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages