below-chance classification performance

Jesse Rissman

unread,

Sep 15, 2008, 7:02:01 PM9/15/08

to mvpa-t...@googlegroups.com

I have recently encountered a few situations in which a 2-layer backpropagation neural net classifier consistently yields below-chance levels of performance. The classification is binary, and the number of training and testing trials for each class is balanced, so chance should theoretically be 50%. I am relatively confident that a coding error is not driving the effect. Rather, it seems that the classifier is actually learning the reverse category labels and applying these 'flipped' labels to the test data with a fair degree of accuracy. In fact, on the trials where the classifier is the most 'confident' of its classification (i.e. the activation levels of the two output units have the greatest difference), the classifier is the least accurate. And the classifier is not biased to guess one class over the other -- it simply guesses wrongly more frequently than it guesses correctly. When this phenomenon occurs in my data, the below-chance classification performance is typically 43-45% correct (but these numbers are consistent when the classifier is re-run many times, each time with a randomly initialized weight matrix and/or a different subset of trials selected for balancing purposes). But I also recently observed this phenomenon in a colleague's data, and he was getting ~25% accuracy levels (with chance also being 50%) -- in his case, this occurred when he used a feature selection technique that intentionally choose the least diagnostic voxels (those with the lowest absolute value t-values in a t-test of condition A vs. B) to feed into the classifier. In my case, it simply occurred during a particularly difficult classification problem (one that for the best subjects, I can get 60% accuracy, and for many I only get 53-55%). So the problem of below-chance performance only seems to occur under situations in which the classifier has very little good information to go on, either because it's given the worst voxels to work with, or because its faced with discriminating two classes of brain activity that may not be all that discriminable. However, I still find it puzzling that the classifier could "learn the opposite" of what we're teaching it during training. Has anyone else encountered below-chance generalization performance with a properly balanced binary classification situation? Any thoughts or insights on this matter would be much appreciated.

Thanks,

Jesse

Francisco Pereira

unread,

Sep 15, 2008, 7:07:35 PM9/15/08

to mvpa-t...@googlegroups.com

On Mon, Sep 15, 2008 at 7:02 PM, Jesse Rissman <ris...@gmail.com> wrote:

Has anyone else encountered below-chance generalization performance with a properly balanced binary classification situation? Any thoughts or insights on this matter would be much appreciated.

I have, in situations where there is cross-validation and the examples in some folds are markedly different from examples in other folds.
Whenever that (or those) fold(s) appear as the test set the classifier tends to get the vast majority of the examples wrong.
The cases where I saw this were situations where a fold corresponded to a scanner run, but I wonder if this would happen in other
circumstances...

cheers,
Francisco

Yaroslav Halchenko

unread,

Sep 15, 2008, 7:11:53 PM9/15/08

to mvpa-t...@googlegroups.com

saw similar whenever testing samples (with different labels) where from
the opposite temporal sides of the experiment (ie label-0 obtained at
the beginning of the scan, and label-1 closer to the end...). I didn't
trace what attributed to it but I guess it could be many things starting
with motion, trends, signal saturation etc.

On Mon, 15 Sep 2008, Francisco Pereira wrote:

> On Mon, Sep 15, 2008 at 7:02 PM, Jesse Rissman <[1]ris...@gmail.com>
> wrote:

--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW: http://www.linkedin.com/in/yarik

Yaroslav Halchenko

unread,

Sep 15, 2008, 7:09:23 PM9/15/08

to mvpa-t...@googlegroups.com

I observed similar behavior whenever testing set wasn't well balanced,
but you stated that it is, thus it is not the case I guess.

imho good strategy in such cases is to check what is the actual
empirical chance performance on a given dataset (data + labels) ;-)
permute labels randomly and do exactly the same
learning/feature_selection/testing pipeline... and do that quite a few
times... and then see how well training on randomized labels does.

On Mon, 15 Sep 2008, Jesse Rissman wrote:

> I have recently encountered a few situations in which a 2-layer
> backpropagation neural net classifier consistently yields below-chance
> levels of performance. The classification is binary, and the number of

Reply all

Reply to author

Forward