Here's a link to the paper, for those who are interested:
A. Kowalczyk and O. Chapelle. An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron, in Sanjay Jain, Hans Ulrich Simon, Etsuji Tomita , edts., Algorithmic Learning Theory: 16th International Conference, ALT 2005, Springer , 2005.
http://kyb.mpg.de/publications/attachments/alt_05_%5B0%5D.pdf
And a link to a video of a lecture by Dr. Kowalczyk about this issue:
"In the talk we shall analyze and theoretically explain some counter-intuitive experimental and theoretical findings that systematic reversal of classifier decisions can occur when switching from training to independent test data (the phenomenon of anti-learning). We demonstrate this on both natural and synthetic data and show that it is distinct from overfitting."
http://videolectures.net/mlss06au_kowalczyk_al/
doing something stupid now and taking your email and ...
> grep '^\s*desireds:.*\]$' /tmp/1.txt | sed -e 's/.*\[\(.*\)\].*/\1/g' | tr '\n' ' ' | PYTHONPATH=$PWD python -c "import sys; from mvpa.datasets.miscfx import *; SequenceStats(sys.stdin.readlines()[0].strip().split(' '));"
Original sequence had 114 entries from set ['1', '2']
Counter-balance table for orders up to 2:
Labels/Order O1 | O2 |
1: 23 34 | 27 29 |
2: 34 22 | 29 27 |
Correlations: min=-0.19 max=0.23 mean=-0.0088 sum(abs)=8.8
and something which might be related:
*In [2]:(23.0/34)/2
Out[2]:0.33823529411764708
so, may be, if a classifier for some reason manages to learn something
related to the precedence/order effect of the conditions -- you would obtain
your antilearning instead of pure chance distribution.
I wonder, why did you have such disproportion of trials in the splits? are
those subject responses matching some criterion (e.g. correct answers)?
On Fri, 25 Sep 2009, Jesse Rissman wrote:
> Hi Al-Rawi & Francisco,
> Thanks for your interest in helping me out with this quandary. Here's
> one example of a subject where classification performance is
> below chance. Across 10 cross-validation folds (which constitute 10
> scanning runs), the classifier makes a total of 114 guesses (57
> exemplars from each class). Each exemplar constitutes the data from
> one trial (averaged across 2 TRs, 4-8 sec post-stimulus). Within each
--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
If you have marginally sparse design, I guess, you could may be
get away from anti-learning, if you do just N-2 cross-validation -- i.e.
holding out not full runs/blocks but just two samples (1 from each
category) taken out sufficiently large number of times ;) (since now you
would need to arbitrarily select 1 from class A and another one from
class B).
sorry for too much of speculation
> To answer your question,
> my cross-validation folds have unequal numbers of trials because I'm on
> ly selecting trials from two particular conditions (that do depend on b
> ehavior) and some runs have fewer of these trial types. My requirement
> that each run has an equal number of Class A and Class B trials result
> s in a further loss of data. In this case, I'm left with 114 trials ou
> t of 400 total.
Well, I could only say that behavior might matter here ;) without
specifics of experiment/design/etc hard to say anything specific.
> -- Jesse
Regards,
-Rawi
One anti-learning nonbeliever
>________________________________
> From: Hunar Ahmad <huna...@gmail.com>
>To: mvpa-t...@googlegroups.com
>Sent: Wednesday, April 17, 2013 7:40 PM
>Subject: [mvpa-toolbox] Re: revisiting the issue of below-chance classification performance
>
>
>
>Hi Jesse,
>
>Thanks for this post, even though its along time ago! but know I'm running into the exact problem you have described!
>wondering if you or anybody from this group have got a solution or a theoretical interpretation of why this happening, I really
>don't know what to do with the subjects which has a clear below classification performance 30-40 (chance level is 50) while most of the
>other normal subjects had classification performance of >60! should I discard the below chance subjects from the analysis
>or consider them as low performing subjects as I'm comparing too groups older and younger adults, being unbiased is really import for the study!!
>any help or suggestion is really appreciated...
>Thanks a lot in advance
>
>Hunar
--
>You received this message because you are subscribed to the Google Groups "Princeton MVPA Toolbox for Matlab" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox...@googlegroups.com.
>To post to this group, send email to mvpa-t...@googlegroups.com.
>Visit this group at http://groups.google.com/group/mvpa-toolbox?hl=en.
>For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.
Hi, Hunar,I had test on your data by Libsvm, linear kernel, default parameters. The script is attached. If you don't use Libsvm in matlab, I could change to other classifier, e.g. a simple classifier PCA+NN. The results areTraining and testing on the original sample separation.Accuracy = 28.083% (230/819) (classification)Training and testing by 2-fold cross validation, samples are randomly separated.Accuracy = 76.2755% (598/784) (classification)Accuracy = 80.4847% (631/784) (classification)The mean classification accuracy of 2-fold cross validation is 0.7838So the below chance level accuracy (anti-learning) results from the strange data distribution, in other words, the way samples are separated into training samples and testing samples. This is no strange.For example, If you have a data set (x_train, x_test, label_train, label_test) with accuracy 0.8+, then another dataset (x_train, x_test, label_train, 1-label_test) (binary case here, it's possible in reality) would get an accuracy less than 0.2. It's not a problem of algorithms, just because the data distribution is strange and the data is accidentally separated into two parts by a special manner.Best regards,Jing Wang.
To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox...@googlegroups.com.
To post to this group, send email to mvpa-t...@googlegroups.com.
Visit this group at http://groups.google.com/group/mvpa-toolbox?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To post to this group, send email to mvpa-t...@googlegroups.com.
Visit this group at http://groups.google.com/group/mvpa-toolbox?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.