revisiting the issue of below-chance classification performance

1,594 views
Skip to first unread message

Jesse Rissman

unread,
Sep 17, 2009, 4:46:35 PM9/17/09
to mvpa-t...@googlegroups.com
I know that I've brought up this issue before on the mailing list (http://groups.google.com/group/mvpa-toolbox/browse_thread/thread/3b16fd93f001adc0/986053f4ef0d4e1d?pli=1), but this is still bothering me, so I want to bring it up again to see if anyone has any insights.  

Over the past two years I've literally runs thousands of different classification analyses on the fMRI data from dozens of subjects.  Most of the time classification performance is well better than chance, occasionally it hovers at or near chance (for cognitive states that are challenging to differentiate), but once in a rare while performance is well below chance.  That is to say that classifer is reliably guessing that the test examples come from opposite class as they actually do (typically yielding accuracy levels of 38-44% correct).  Moreover, the stronger the classifer thinks an example is from Class A (as measured by the scalar probability values of the output nodes), the more likely it is to be from Class B.  When this happens, it happens regardless of the classification algorithm I use (regularized logistic regression or support vector machines), and whether I use aggressive feature selection (1000 voxels) or no feature selection at all (23,000 voxels).  The number of Class A and Class B examples in my training set are always balanced, as are those in my testing set, and the classifier does not develop any bias to guess one class more than the other (i.e., the classifier guesses Class A and Class B roughly an equal amount; the guesses just tend to be wrong more often than not).  I use a leave-one-run-out cross-validation approach with 10 runs, and performance is below chance on most of the test iterations (and always below chance overall), so this doesn't seem to be a fluke of scanner drift, cumulative subject motion, etc.  I see below-chance performance most commonly when there are a small number of training examples of each class (e.g., 20), but I've also observed this when I have 70 examples of each class.  Importantly, when I scramble the Class A and Class B labels prior to running the classifier, performance settles on chance levels -- again indicating that this isn't something wrong with my data or my analysis code.  From informal conversations with mvpa users from several labs, I know that others have also encountered below-chance classification performance in their data.  Below-chance performance is so frustrating because it means that the classifier is actually able to extract meaningful information about the neural signatures of the two classes -- it just somehow learns precisely the opposite labeling as it it should.  Very puzzling...

Last time I brought this up on the mvpa list, Yaroslav Halchenko sent me a link to an interesting article by Adam Kowalczyk that discusses the "anti-learning" phenomenon in supervised machine learning:

Here's a link to the paper, for those who are interested:

A. Kowalczyk and O. Chapelle. An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron, in Sanjay Jain, Hans Ulrich Simon, Etsuji Tomita , edts.,    Algorithmic Learning Theory: 16th International Conference, ALT 2005, Springer , 2005.

http://kyb.mpg.de/publications/attachments/alt_05_%5B0%5D.pdf

And a link to a video of a lecture by Dr. Kowalczyk about this issue:

"In the talk we shall analyze and theoretically explain some counter-intuitive experimental and theoretical findings that systematic reversal of classifier decisions can occur when switching from training to independent test data (the phenomenon of anti-learning). We demonstrate this on both natural and synthetic data and show that it is distinct from overfitting."

http://videolectures.net/mlss06au_kowalczyk_al/


While Dr. Kowalczyk's work is intriguing, indicating that below-chance classification performance is a real phenomenon in need of further study, I'm throwing this issue out to the list again because I still can't conceptually wrap my mind around what below-chance classification means, why it happens, and what one might do about it.  Any ideas or anecdotes from your own analysis experiences would be greatly welcome.

Thanks,
Jesse

-------------------------------------------------
Jesse Rissman, Ph.D.
Dept. of Psychology
Stanford University
Jordan Hall, Bldg 420
Stanford, CA 94305-2130

MS Al-Rawi

unread,
Sep 25, 2009, 10:41:18 AM9/25/09
to Princeton MVPA Toolbox for Matlab
Re

> Over the past two years I've literally runs thousands of different
> classification analyses on the fMRI data from dozens of subjects

May I ask how many exemplars (or examples as you like to call them)
were there in the training set, and how many were in the testing set?
(roughly speaking for the below chance experiment).

Regards

Al-Rawi
University of Aveiro
Portugal

Francisco Pereira

unread,
Sep 25, 2009, 11:21:57 AM9/25/09
to mvpa-t...@googlegroups.com
The other thing that would be useful to know is the accuracy you get
in each cross-validation fold,
i.e. is it always below chance or do you have some at or above and
others far below?

Francisco

Jesse Rissman

unread,
Sep 25, 2009, 1:02:12 PM9/25/09
to mvpa-t...@googlegroups.com
Hi Al-Rawi & Francisco,

Thanks for your interest in helping me out with this quandary.  Here's one example of a subject where classification performance is below chance.  Across 10 cross-validation folds (which constitute 10 scanning runs), the classifier makes a total of 114 guesses (57 exemplars from each class).  Each exemplar constitutes the data from one trial (averaged across 2 TRs, 4-8 sec post-stimulus).  Within each fold, the number of exemplars from Class A and Class B is always artificially balanced prior to training/testing.  No data-based feature selection is used here -- the L2-regularized logistic regression operates on 23,000 voxels.  In this example, the classifier yields an overall accuracy level of 36% across the 114 trials.  If I only look at the 25% of trials for which the classifier has the strongest 'confidence' in its assessment (i.e., the strongest probability estimates of an item being in one class or the other), classification accuracy drops even further to 27.5%. When performance is indexed by measuring the area under the ROC curve (my preferred way to assess classification performance), I get an AUC of 0.37.  The mean AUC for this particular classification scheme across 16 subjects is 0.60.  And I should note that I can classify other types of trials in this particular subject quite well, so the issue is unlikely related to major artifacts in his fMRI data (not that those should yield below-chance performance anyway).

The guesses, desireds, and corrects for each fold are pasted below.  As you see from the, only 9 of the 10 testing folds have accuracies below 50%.

-- Jesse


K>> results.iterations.perfmet

ans = 

          guesses: [2 1 1 1 2 1 2 1]
         desireds: [2 1 1 2 1 2 1 2]
         corrects: [1 1 1 0 0 0 0 0]
             perf: 0.3750
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 1 1 2 1 1 1 1 1]
         desireds: [1 1 1 2 1 2 2 2 1 2]
         corrects: [1 1 1 0 0 0 0 0 1 0]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [2 2 2 2 2 1 2 2 1 1]
         desireds: [1 1 2 2 1 2 1 2 1 2]
         corrects: [0 0 1 1 0 0 0 1 1 0]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 1 2 1 1 2 2 1 1 2 1 1 1 2 2 2 2 2 2]
         desireds: [1 1 2 1 2 2 1 1 2 1 1 1 2 2 2 2 1 1 2 2]
         corrects: [1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 2 1 1 2 2 1 1 2 2 2]
         desireds: [1 2 1 1 2 1 2 2 2 2 1 1]
         corrects: [1 0 0 1 0 0 1 0 0 1 0 0]
             perf: 0.3333
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 2 1 1 2 2 2 2 1 1 1 1 2 1]
         desireds: [2 1 1 2 2 2 1 2 2 2 1 1 1 1]
         corrects: [0 0 1 0 1 1 0 1 0 0 1 1 0 1]
             perf: 0.5000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [2 2 2 1 2 2 2 2 2 1 1 1]
         desireds: [2 2 1 2 2 1 1 1 1 2 1 2]
         corrects: [1 1 0 0 1 0 0 0 0 0 1 0]
             perf: 0.3333
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 2 2 1 1 1 1 2 1]
         desireds: [2 2 1 2 2 1 1 1 1 2]
         corrects: [0 0 0 1 0 1 1 1 0 0]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [2 1 2 1]
         desireds: [1 2 1 2]
         corrects: [0 0 0 0]
             perf: 0
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [2 2 2 2 2 2 1 1 2 2 1 2 2 1]
         desireds: [1 2 1 1 2 1 2 2 1 2 2 1 1 2]
         corrects: [0 1 0 0 1 0 0 0 0 1 0 0 0 0]
             perf: 0.2143
       scratchpad: []
    function_name: 'perfmet_maxclass'

Greg Detre

unread,
Sep 25, 2009, 1:06:57 PM9/25/09
to mvpa-t...@googlegroups.com
What happens if you switch classifier?

g
--


---
Greg Detre
cell: 617 642 3902
email: gr...@gregdetre.co.uk
web: http://www.gregdetre.co.uk

Jesse Rissman

unread,
Sep 25, 2009, 1:49:39 PM9/25/09
to mvpa-t...@googlegroups.com
Hi Greg,

To address your question, I re-ran the this classification analysis (on the same sample of 114 trials) using linear SVM, and got almost exactly the same outcome.  If you compare the results pasted below with those from my previous email, you'll see that the classifier made 3 additional correct guesses (yielding an overall accuracy of 38.6%; AUC = 0.37), with all of its other guesses being the same as those obtained when I used the logistic regression classifier.

-- Jesse




results.iterations.perfmet

ans = 

          guesses: [2 1 1 1 2 1 2 1]
         desireds: [2 1 1 2 1 2 1 2]
         corrects: [1 1 1 0 0 0 0 0]
             perf: 0.3750
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 1 1 2 1 1 1 1 1]
         desireds: [1 1 1 2 1 2 2 2 1 2]
         corrects: [1 1 1 0 0 0 0 0 1 0]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [2 2 2 2 2 2 2 2 1 1]
         desireds: [1 1 2 2 1 2 1 2 1 2]
         corrects: [0 0 1 1 0 1 0 1 1 0]
             perf: 0.5000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 1 2 1 1 2 2 1 1 2 1 1 1 2 2 2 2 2 2]
         desireds: [1 1 2 1 2 2 1 1 2 1 1 1 2 2 2 2 1 1 2 2]
         corrects: [1 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1]
             perf: 0.4000
       scratchpad: []
    function_name: 'perfmet_maxclass'


ans = 

          guesses: [1 1 2 1 2 2 2 1 1 2 2 2]
         desireds: [1 2 1 1 2 1 2 2 2 2 1 1]
         corrects: [1 0 0 1 1 0 1 0 0 1 0 0]
             perf: 0.4167
          guesses: [2 2 2 2 2 2 1 1 2 2 1 2 2 2]
         desireds: [1 2 1 1 2 1 2 2 1 2 2 1 1 2]
         corrects: [0 1 0 0 1 0 0 0 0 1 0 0 0 1]
             perf: 0.2857
       scratchpad: []
    function_name: 'perfmet_maxclass'

Yaroslav Halchenko

unread,
Sep 25, 2009, 2:33:57 PM9/25/09
to mvpa-t...@googlegroups.com
I wonder if it has anything to do (once again) with temporal dependence
among the trials:

doing something stupid now and taking your email and ...

> grep '^\s*desireds:.*\]$' /tmp/1.txt | sed -e 's/.*\[\(.*\)\].*/\1/g' | tr '\n' ' ' | PYTHONPATH=$PWD python -c "import sys; from mvpa.datasets.miscfx import *; SequenceStats(sys.stdin.readlines()[0].strip().split(' '));"
Original sequence had 114 entries from set ['1', '2']
Counter-balance table for orders up to 2:
Labels/Order O1 | O2 |
1: 23 34 | 27 29 |
2: 34 22 | 29 27 |
Correlations: min=-0.19 max=0.23 mean=-0.0088 sum(abs)=8.8

and something which might be related:

*In [2]:(23.0/34)/2
Out[2]:0.33823529411764708

so, may be, if a classifier for some reason manages to learn something
related to the precedence/order effect of the conditions -- you would obtain
your antilearning instead of pure chance distribution.

I wonder, why did you have such disproportion of trials in the splits? are
those subject responses matching some criterion (e.g. correct answers)?

On Fri, 25 Sep 2009, Jesse Rissman wrote:

> Hi Al-Rawi & Francisco,

> Thanks for your interest in helping me out with this quandary. Here's
> one example of a subject where classification performance is
> below chance. Across 10 cross-validation folds (which constitute 10
> scanning runs), the classifier makes a total of 114 guesses (57
> exemplars from each class). Each exemplar constitutes the data from
> one trial (averaged across 2 TRs, 4-8 sec post-stimulus). Within each

--
.-.
=------------------------------ /v\ ----------------------------=
Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]


Jesse Rissman

unread,
Sep 25, 2009, 2:58:43 PM9/25/09
to mvpa-t...@googlegroups.com
Hi Yaroslav,

I think you're probably on to something with the temporal dependence issue, but I'm not quite sure I understand why this influences what the classifier learns during training and how it ultimately results in anti-learning.  I could write some code to artificially re-arrange my training sets such that the order of the trials is Class A, Class B, Class A, Class B, etc..., but I'm not sure if this would solve the problem or make things worse.

To answer your question, my cross-validation folds have unequal numbers of trials because I'm only selecting trials from two particular conditions (that do depend on behavior) and some runs have fewer of these trial types.  My requirement that each run has an equal number of Class A and Class B trials results in a further loss of data.  In this case, I'm left with 114 trials out of 400 total.

-- Jesse  

Yaroslav Halchenko

unread,
Sep 25, 2009, 3:18:49 PM9/25/09
to mvpa-t...@googlegroups.com

> I think you're probably on to something with the temporal
> dependence issue, but I'm not quite sure I understand why this
> influences what the classifier learns during training and how it
> ultimately results in anti-learning. I could write some code to
> artificially re-arrange my training sets such that the order of the
> trials is Class A, Class B, Class A, Class B, etc..., but I'm not
> sure if this would solve the problem or make things worse.
My wild guessing/blurbing: imagine that there is some physiological or
instrumental signal in the data which has some harmonic at
frequency close to autocorrelation present within the labeling of the
trials... Then classifier might simply learn that 'frequency'. I am not
sure yet on how to explain actual anti-learning (besides that due to
non-100% coherence between autocorrelation of labels and that rhythm in the
data, hold-out set might always be in the 'anti-phase') but clearly it
should widen "null-distribution". And you can see now, why permutation
testing would not provide reliable assessment -- since it obliterates
autocovariance within the labeling.

If you have marginally sparse design, I guess, you could may be
get away from anti-learning, if you do just N-2 cross-validation -- i.e.
holding out not full runs/blocks but just two samples (1 from each
category) taken out sufficiently large number of times ;) (since now you
would need to arbitrarily select 1 from class A and another one from
class B).

sorry for too much of speculation

> To answer your question,
> my cross-validation folds have unequal numbers of trials because I'm on

> ly selecting trials from two particular conditions (that do depend on b
> ehavior) and some runs have fewer of these trial types. My requirement


> that each run has an equal number of Class A and Class B trials result

> s in a further loss of data. In this case, I'm left with 114 trials ou
> t of 400 total.
Well, I could only say that behavior might matter here ;) without
specifics of experiment/design/etc hard to say anything specific.

> -- Jesse

MS Al-Rawi

unread,
Sep 28, 2009, 6:14:23 AM9/28/09
to Princeton MVPA Toolbox for Matlab
My humble opinion is that you have few exemplars, thus, the
reliability of testing is undecidable, I guess!. Thus, a way to go is
to do 2-fold , 3-fold, 4-fold,..., 9-fold cross validation too.

Regards

Al-Rawi

MS Al-Rawi

unread,
Apr 18, 2013, 5:51:15 AM4/18/13
to mvpa-t...@googlegroups.com
Since below chance accuracy occurs in permutation testing
experiments with nearly similar probability to that of above chance accuracy,
this imply that below chance and above chance could both be stochastic. I would
say (and there is  a high possibility that
I could be wrong) that if only one subject out of 10 is giving below chance
accuracy, then, this could be due to different sources of noise/degradation (e.g.,
scanner noise, head motion during acquisition, the subject attention, etc.). In
this case, maybe you can overlook that subject and talk about the effect you
are targeting in your study. If, on the other hand, the number of below chance
subjects is close to the number of above chance subjects, then, there is no
correlation between the fMRI and the stimuli.

Regards,
-Rawi

One anti-learning nonbeliever

>________________________________
> From: Hunar Ahmad <huna...@gmail.com>
>To: mvpa-t...@googlegroups.com
>Sent: Wednesday, April 17, 2013 7:40 PM
>Subject: [mvpa-toolbox] Re: revisiting the issue of below-chance classification performance
>
>
>
>Hi Jesse,
>
>Thanks for this post, even though its along time ago! but know I'm running into the exact problem you have described!
>wondering if you or anybody from this group have got a solution or a theoretical interpretation of why this happening, I really
>don't know what to do with the subjects which has a clear below classification performance 30-40 (chance level is 50) while most of the
>other normal subjects had classification performance of >60! should I discard the below chance subjects from the analysis
>or consider them as low performing subjects as I'm comparing too groups older and younger adults, being unbiased is really import for the study!!
>any help or suggestion is really appreciated...
>Thanks a lot in advance
>
>Hunar

--
>You received this message because you are subscribed to the Google Groups "Princeton MVPA Toolbox for Matlab" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox...@googlegroups.com.
>To post to this group, send email to mvpa-t...@googlegroups.com.
>Visit this group at http://groups.google.com/group/mvpa-toolbox?hl=en.
>For more options, visit https://groups.google.com/groups/opt_out.


>
>
>

Hunar Ahmad

unread,
Apr 18, 2013, 6:40:23 AM4/18/13
to mvpa-t...@googlegroups.com
Thanks a lot for the replay Rawi,

But anti learning is a real phenomenon and there are many papers talking about as a different problem than over-fitting!
if I change the labels the below chance level become above chance and if I add noise to the data the below chance level performance attenuates to a chance level. Unfortunately not many people encounters this problem as its very unusual and it mostly occurs in datas with high dimensionality and low sample numbers!

Regards
Hunar


You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.

Yaroslav Halchenko

unread,
Apr 18, 2013, 10:26:48 AM4/18/13
to mvpa-t...@googlegroups.com, pkg-expp...@lists.alioth.debian.org
Hi Hunar,

NB cross-posting to pymvpa mailing list as well, since it would be of
interest there too, and I doubt we have a complete overlap between
audiences

Thank you bunch for attacking this issue and bringing back the
discussion. Unfortunately I have no conclusive answer myself to this
phenomenon in the analysis of fMRI -- I have tried few ideas but they
seems were not representative, at least on simulated data... Since
you have already explored so much -- would you mind trying more? ;-)

e.g. for the dataset(s) where you have already observed consistent
anti-learning -- you have mentioned that you have tried different
classifiers, but have you tried principally different classification
schemes (e.g. non-linear classifiers allows to learn combinatorial
coding instead of simple linear separation which you have explored).
The most obvious which comes to mind would be SVM with RBF kernel. You
have mentioned a study with 70 samples per class. Since to train RBF
you would need to tune up hyper-parameters -- it better to have "larger"
training set for nested cross-validation for that purpose.
if you provide me with your dataset (e.g. in .mat file with a brief
README on what is what there -- data, labels, run labels) -- I could try
re-analyzing it with PyMVPA and different classification schemes to see
how results would differ from anti-learning

On Thu, 18 Apr 2013, Hunar Ahmad wrote:

> Thanks a lot for the replay Rawi,

> But anti learning is a real phenomenon and there are many papers talking
> about as a different problem than over-fitting!
> if I change the labels the below chance level become above chance and if I
> add noise to the data the below chance level performance attenuates to a
> chance level. Unfortunately not many people encounters this problem as its
> very unusual and it mostly occurs in datas with high dimensionality and
> low sample numbers!

> Regards
> Hunar

> On Thu, Apr 18, 2013 at 10:51 AM, MS Al-Rawi <[1]raw...@yahoo.com> wrote:

> Since below chance accuracy occurs in permutation testing
> experiments with nearly similar probability to that of above chance
> accuracy,
> this imply that below chance and above chance could both be stochastic.
> I would
> say (and there is� a high possibility that
> I could be wrong) that if only one subject out of 10 is giving below
> chance
> accuracy, then, this could be due to different sources of
> noise/degradation (e.g.,
> scanner noise, head motion during acquisition, the subject attention,
> etc.). In
> this case, maybe you can overlook that subject and talk about the effect
> you
> are targeting in your study. If, on the other hand, the number of below
> chance
> subjects is close to the number of above chance subjects, then, there is
> no
> correlation between the fMRI and the stimuli.

> Regards,
> -Rawi

> One anti-learning nonbeliever

> >________________________________
> > From: Hunar Ahmad <[2]huna...@gmail.com>
> >To: [3]mvpa-t...@googlegroups.com
> >Sent: Wednesday, April 17, 2013 7:40 PM
> >Subject: [mvpa-toolbox] Re: revisiting the issue of below-chance
> classification performance



> >Hi Jesse,

> >Thanks for this post, even though its along time ago! but know I'm
> running into the exact problem you have described!
> >wondering if you or anybody from this group have got a solution or a
> theoretical interpretation of why this happening, I really
> >don't know what to do with the subjects which has a clear below
> classification performance 30-40 (chance level is 50) while most of the
> >other normal subjects had classification performance of >60! should I
> discard the below chance subjects from the analysis
> >or consider them as low performing subjects as I'm comparing too groups
> older and younger adults, being unbiased is really import for the
> study!!
> >any help or suggestion is really appreciated...
> >Thanks a lot in advance

> >Hunar


> >On Thursday, September 17, 2009 9:46:35 PM UTC+1, Jesse Rissman wrote:
> >I know that I've brought up this issue before on the mailing list
> ([4]http://groups.google.com/group/mvpa-toolbox/browse_thread/thread/3b16fd93f001adc0/986053f4ef0d4e1d?pli=1),
> >>>[5]http://kyb.mpg.de/ publications/attachments/alt_ 05_%5B0%5D.pdf

> >>>And a link to a video of a lecture by Dr. Kowalczyk about this issue:

> >>>"In the talk we shall analyze and theoretically explain some
> counter-intuitive experimental and theoretical findings that systematic
> reversal of classifier decisions can occur when switching from training
> to independent test data (the phenomenon of anti-learning). We
> demonstrate this on both natural and synthetic data and show that it is
> distinct from overfitting."

> >>>[6]http://videolectures.net/ mlss06au_kowalczyk_al/




> >>While Dr. Kowalczyk's work is intriguing, indicating that below-chance
> classification performance is a real phenomenon in need of further
> study, I'm throwing this issue out to the list again because I still
> can't conceptually wrap my mind around what below-chance classification
> means, why it happens, and what one might do about it. �Any ideas or
> anecdotes from your own analysis experiences would be greatly welcome.


> >>Thanks,
> >>Jesse


> >>------------------------------ -------------------
> >>Jesse Rissman, Ph.D.
> >>Dept. of Psychology
> >>Stanford University
> >>Jordan Hall, Bldg 420
> >>Stanford, CA 94305-2130
> >>(o) 650-724-9515
> >>(f)� 650-725-5699
> >>[7]http://www.stanford.edu/~ rissmanj
--
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

J.A. Etzel

unread,
Apr 18, 2013, 12:31:01 PM4/18/13
to mvpa-t...@googlegroups.com
I collected some thoughts in longer form here:
http://mvpa.blogspot.com/2013/04/below-chance-classification-accuracy.html

In short: I would really, really look at stability and for dataset
errors before going further, and would not consider that it might be a
true signal (anti-learning, etc.) except as a last resort and with
direct evidence.

Good Luck!
Jo


On 4/18/2013 9:26 AM, Yaroslav Halchenko wrote:
> Hi Hunar,
>
> NB cross-posting to pymvpa mailing list as well, since it would be of
> interest there too, and I doubt we have a complete overlap between
> audiences
>
> Thank you bunch for attacking this issue and bringing back the
> discussion. Unfortunately I have no conclusive answer myself to this
> phenomenon in the analysis of fMRI -- I have tried few ideas but they
> seems were not representative, at least on simulated data... Since
> you have already explored so much -- would you mind trying more? ;-)
>
> e.g. for the dataset(s) where you have already observed consistent
> anti-learning -- you have mentioned that you have tried different
> classifiers, but have you tried principally different classification
> schemes (e.g. non-linear classifiers allows to learn combinatorial
> coding instead of simple linear separation which you have explored).
> The most obvious which comes to mind would be SVM with RBF kernel. You
> have mentioned a study with 70 samples per class. Since to train RBF
> you would need to tune up hyper-parameters -- it better to have "larger"
> training set for nested cross-validation for that purpose.
> if you provide me with your dataset (e.g. in .mat file with a brief
> README on what is what there -- data, labels, run labels) -- I could try
> re-analyzing it with PyMVPA and different classification schemes to see
> how results would differ from anti-learning
>
--
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/

Jesse Rissman

unread,
Apr 18, 2013, 12:54:13 PM4/18/13
to mvpa-t...@googlegroups.com
Thanks everyone for sharing your thoughts on this perplexing topic.  I think that Jo's blog post nicely highlights some of the things to examine when your classifier is churning out below-chance accuracy values.  That said, I still haven't been able to make sense of the rare situations where I observe below-chance classification performance in my own data. But to answer your last question, Hunar, I don't think it'd be appropriate to discard your below-chance subjects from your analysis, since it is very important for readers to know how reliably your decoding effect can be observed in individual subjects.

-- Jesse




--
You received this message because you are subscribed to the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox+unsubscribe@googlegroups.com.

Hunar Ahmad

unread,
Apr 20, 2013, 11:42:17 AM4/20/13
to mvpa-t...@googlegroups.com
Thank you all for your thoughtful replies and precious advices.
 
Yaroslav Halchenko, thanks for your interest to try testing on the data with your classifier. I will be glad to share the data (which I think its anti learning) with you and I will appreciate your help a lot, I have attached the training and the testing data with their labels in .mat format to this message, the data is a matrix of unprocessed values of BOLD signal from the Left inferior frontal gyrus which is about 1543 voxels in total.
Unfortunately the data was primarily designed for univariate analysis with block design of 2 distinctive classes, the number of the samples are quite low (which is one of the reasons for anti learning according to anti learning literature). I have reused the data as a project for my Master degree and my objective was to train a classifier on the encoding data and then testing it on the retrieval data. I have 29 subjects and the classification accuracy was around 60-80 (chance 50), however 3 subjects were consistently performing below chance (30-40)! even though I have used exactly the same settings for all the subjects, the only way to make those subjects perform well is to reverse the labels of the testing set!


--
You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.
sub6.zip

Hunar Ahmad

unread,
May 28, 2013, 2:30:18 PM5/28/13
to mvpa-t...@googlegroups.com
Hi, Wang,
Thanks a lot for trying my data, The Idea of random separation of the training and testing labels sounds cool. However, I noticed that you have mixed the testing set (retrieval data) with the training set. Unfortunately, my study requires to train the classifier on the encoding data and test it on the retrieval data not the other-way around. I have tried your method while doing random separtations of the labels within the training set (encoding data) alone but that didn't help!. Do you have an explanation what could be the cause? The thing which makes me suspect that its anti-learning is that if you do The n-fold cross validation within the encoding data alone or within the retrieval data alone, both ways gives good results (>>chance level), Which indicating that the classifier is able to learn from the data within each session separately (encoding and retrieval). However, When I train on one session alone and test it on the other session in this case it gives me below chance level! and this happened only in few subjects. Any more ideas will be greatly appreciated. Thanks a lot in advance.
Regards
Hunar


On Sat, May 25, 2013 at 1:33 PM, Wang Jing <yuzh...@gmail.com> wrote:
Hi, Hunar,

I had test on your data by Libsvm, linear kernel, default parameters. The script is attached. If you don't use Libsvm in matlab, I could change to other classifier, e.g. a simple classifier PCA+NN. The results are

Training and testing on the original sample separation.
Accuracy = 28.083% (230/819) (classification)

Training and testing by 2-fold cross validation, samples are randomly separated.
Accuracy = 76.2755% (598/784) (classification)
Accuracy = 80.4847% (631/784) (classification)

The mean classification accuracy of 2-fold cross validation is 0.7838

So the below chance level accuracy (anti-learning) results from the strange data distribution, in other words, the way samples are separated into training samples and testing samples. This is no strange.

For example, If you have a data set (x_train, x_test, label_train, label_test) with accuracy 0.8+, then another dataset (x_train, x_test, label_train, 1-label_test) (binary case here, it's possible in reality) would get an accuracy less than 0.2. It's not a problem of algorithms, just because the data distribution is strange and the data is accidentally separated into two parts by a special manner.

Best regards,
Jing Wang.
To unsubscribe from this group and stop receiving emails from it, send an email to mvpa-toolbox...@googlegroups.com.

To post to this group, send email to mvpa-t...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Princeton MVPA Toolbox for Matlab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mvpa-toolbox/EL6cniKCF7A/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mvpa-toolbox...@googlegroups.com.
To post to this group, send email to mvpa-t...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages