GNB: Use a Gaussian Naive Bayes classifier to learn regressors.

245 views
Skip to first unread message

raw...@yahoo.com

unread,
May 27, 2009, 9:48:12 AM5/27/09
to Princeton MVPA Toolbox for Matlab

Hi

I was trying to use SPM tutorial. Before going into GNB issue, I would
like to comment few SPM related. I am using SPM8. Since SPM contains
several compiled routines that work mainly on the subject, the current
directory of Matlab should be moved to that containing the subject, in
order for those compiled routines to work correctly (or handling the
path some other way). There was an intruder underscore in ['haxby8_r'
index '_.img']; which should be changed to ['haxby8_r' index
'.img'];.

As for the GNB, I was astonished that it gave 0.59 performance
compared to 0.48 when using a multi-layer neural network trained with
back propagation of error with several hidden units. I changed the
uniform priori to non-uniform, and got the same 0.59. Then, I embedded
a random priori (fake just to judge the result). I kept
defaults.uniform_prior = true; and changed line 80 in train_gnb.m
to

scratch.prior = (rand(nConds,1) / nConds); % so its not uniform any
more, and doesnot sum to 1.

The result is the same which is 0.59, with slight differences at each
performance run. So, I thought that the likelihood is not normalized.
Tracing the problem, it seems that
acts = acts ./ repmat(sum(acts,1), nConds, 1); in test_gnb.m gave lots
of NaN's in acts due to zero division.
This resulted incorrect results in calculating the performance using
[yg guesses] = max(acts); [yd desireds] = max(targs); corrects =
guesses == desireds; in perfmet_maxclass.m
If my runs are correct, this problem should be resolved.

M.S. Al-Rawi

% As part of a classroom demonstration experiment

Greg Detre

unread,
May 27, 2009, 1:09:02 PM5/27/09
to mvpa-t...@googlegroups.com
hello,

good to hear from you.

> I was trying to use SPM tutorial. Before going into GNB issue, I
> would like to comment few SPM related. I am using SPM8. Since SPM
> contains several compiled routines that work mainly on the subject,
> the current directory of Matlab should be moved to that containing
> the subject, in order for those compiled routines to work correctly
> (or handling the path some other way).

can you explain this issue again, perhaps with an example?

g

Greg Detre

unread,
May 27, 2009, 1:11:39 PM5/27/09
to mvpa-t...@googlegroups.com
p.s. have you read the section in the 'Setup' documentation about
setting up your matlab paths?

https://compmem.princeton.edu/mvpa_docs/Setup#head-01613791df977b9b831a41c1a02a9fe10f618f74

g
--


---
Greg Detre
cell: 617 642 3902
email: gr...@gregdetre.co.uk
web: http://www.princeton.edu/~gdetre/

Greg Detre

unread,
May 27, 2009, 1:14:30 PM5/27/09
to mvpa-t...@googlegroups.com
> As for the GNB, I was astonished that it gave 0.59 performance
> compared to 0.48 when using a multi-layer neural network trained with
> back propagation of error with several hidden units. I changed the
> uniform priori to non-uniform, and got the same 0.59. Then, I
> embedded a random priori (fake just to judge the result). I kept
> defaults.uniform_prior = true; and changed line 80 in train_gnb.m
> to
>
> scratch.prior = (rand(nConds,1) / nConds); % so its not uniform any
> more, and doesnot sum to 1.
>
> The result is the same which is 0.59, with slight differences at each
> performance run. So, I thought that the likelihood is not
> normalized. Tracing the problem, it seems that acts = acts ./
> repmat(sum(acts,1), nConds, 1); in test_gnb.m gave lots of NaN's in
> acts due to zero division. This resulted incorrect results in
> calculating the performance using [yg guesses] = max(acts); [yd
> desireds] = max(targs); corrects = guesses == desireds; in
> perfmet_maxclass.m If my runs are correct, this problem should be
> resolved.

hmmm. are there equal numbers of timepoints in each of your conditions?
it sounds like there's something wrong with your regressors or runs
somehow. is this using the standard tutorial dataset and script?

raw...@yahoo.com

unread,
May 28, 2009, 1:30:56 PM5/28/09
to Princeton MVPA Toolbox for Matlab
Thx Greg

Yes, the experiments were carried out using the standard tutorial and
dataset.
Well, my post included two issues, the first is a note on how to solve
a simple path problem related to executable files in SPM8. The second
is a probable bug in GNB. Here are my answers to the posted replies:
1- Setting the Matlab path is very simple issue, and I did that
already, but the path problem I talked about happens when Matlab calls
executables including mex’s (by the way, SPM8 folder and all its
subfolder were added to the path too).
The problem by an example: Running SPM tutorial, after trying
easy_tutorial using AFNI, and all the advanced tutorials (shifting
repressors, HRF convolution, store to HD, etc. ), I moved to SPM
tutorial. Thus, I am using the same dataset provided by MVPA (in all
my experiments).
Here is the experiment example demonstrating the path dilemma
[subj results] = tutorial_easy_spm();
??? Error using ==> spm_vol>subfunc at 111
File "mask_cat_select_vt.img" does not exist.
But, the file does exist, tracing and debugging into the code showed
that the source of this error is a compiled routine called
spm_existfile() in SPM8 main folder (remember that the SPM8 path is
defined in Matlab). Now since the compiled routine spm_existfile()
needs to act on the file "mask_cat_select_vt.img", you may change the
current directory to the folder containing "mask_cat_select_vt.img"
which is called “..\Working_set” in this example which is already in
the path too. Many other solutions exist to this problem, you may
define the path as part of the file name.

2- This is the major issue in the post SPM tutorial by loading ANAlYZE
format, as I said, I tried the several tutorials using a multi-layer
neural network trained with back propagation of error, I could not go
above 0.50 average performance in many experiments.
Is this using the standard tutorial dataset and script? Yes, of
course. Like I said. There is nothing wrong with regressors nor
timepoints Ok, I will try the SPM tutorial using neural networks (10
hidden neurons)
090518_2805: Cross-validation using train_bp and test_bp - got
total_perfs - 0.49587
here is another one
090518_2809: Cross-validation using train_bp and test_bp - got
total_perfs - 0.46281
which make sense.

So, is there a problem with the SPM dataset, we can find by running
GNB using AFNI, here it is
[subj results] = tutorial_easy_afni();
Setting all non-zero values in the mask_cat_select_vt+orig.BRIK mask
to one
Mask 'VT_category-selective' created by load_afni_pattern
…pla pla pla….
090518_2812: Cross-validation using train_gnb and test_gnb - got
total_perfs - 0.59091

Which is the same as one gets using GNB on Haxby et al data Sci. 2000
ANALYZ format, and the same zero division problem exist in line 68 of
the code test_gnb.m. In fact, the zeros are generated by the function
repmat(sum(acts,1), nConds, 1) in line 68 of test_gnb.m. Tracing
further, its log_posterior who caused the problem, the minimum value
is -1.6353e+03, taking exp(-1.6353e+03)=0. May be calculating the MLE
needs revision.
> somehow. is this using the standard tutorial dataset and script?- Hide quoted text -
>
> - Show quoted text -

Greg Detre

unread,
Jun 1, 2009, 11:24:44 AM6/1/09
to mvpa-t...@googlegroups.com
> Well, my post included two issues, the first is a note on how to solve
> a simple path problem related to executable files in SPM8.

Garrett is going to reply to to the list about the SPM path issue.

g

Garrett McGrath

unread,
Jun 1, 2009, 11:58:25 AM6/1/09
to mvpa-t...@googlegroups.com
I think the main problem is that your using the wrong version of SPM.
The MVPA spm code is built around the SPM5 code base. Please also make
sure you've got the latest version of mvpa from SVN, the naming
convention on some of the files in the data set had changed and the
latest copies of the tutorials should reflect this.
-Garrett

Greg Detre

unread,
Jun 1, 2009, 12:27:54 PM6/1/09
to mvpa-t...@googlegroups.com
Can you tell me exactly what I should type to reproduce the
zero-division error, and also paste the output from matlab into the email?

This is what I ran:

[subj results] = tutorial_easy_spm('fextension','.img');
class_args.train_funct_name = 'train_gnb';
class_args.test_funct_name = 'test_gnb';
[subj results] =
cross_validation(subj,'epi_z','conds','runs_xval','epi_z_thresh0.05',class_args);

What should I be doing differently?

g

Greg Detre

unread,
Jun 14, 2009, 11:28:41 PM6/14/09
to mvpa-t...@googlegroups.com
thanks for the detailed report. things are a little busy at the moment,
but next time i'm in MVPA mode, i'll try and have a look.

in the meantime, if anyone else wants to take a peek at the code to see
what's happening, i'd welcome your insights.

g


raw...@yahoo.com wrote:
> Hello again
>
> (Oh, about SPM 8, it is much better than SPM 5, many bugs are
> resolved, I think we are just using it to read ANALYZE in my
> discussion, thus, it works perfectly)
>
> Here is the result after removing rest
>
>
> temp_sel = ones(1,size(regs,2));
>
> temp_sel(find(sum(regs)==0)) = 0;
>
> subj = init_object(subj,'selector','no_rest');
>
> subj = set_mat(subj,'selector','no_rest',temp_sel);
>
> subj = create_xvalid_indices
> (subj,'runs','actives_selname','no_rest');
> Selector group 'runs_xval' created by create_xvalid_indices
>
>>> subj = feature_select(subj,'epi_z','conds','runs_xval');
>>>
>
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> subj = shift_regressors(subj,'conds','runs',3); % the lag of HRF
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Starting 10 statmap_anova iterations
> 1 2 3 4class_args.train_funct_name = 'train_gnb';
> class_args.test_funct_name = 'test_gnb';
> class_args.nHidden = 10;
> 5 6 7 8 9 10
> Pattern statmap group 'epi_z_anova' and mask group 'epi_z_thresh0.05'
> created by feature_select


>
>>> [subj results] = cross_validation(subj,'epi_z','conds','runs_xval','epi_z_thresh0.05',class_args);
>>>

> Starting 10 cross-validation classification iterations - train_gnb
> 1 0.43
> 2 0.31
> 3 0.53
> 4 0.43
> 5 0.38
> 6 0.47
> 7 0.39
> 8 0.38
> 9 0.44
> 10 0.44
>
> 090613_0306: Cross-validation using train_gnb and test_gnb - got
> total_perfs - 0.41944
>
> As expected, performance is reduced, many NaN's play no more cheeding,
> I guess!
>
> removing the rest resulted in 72 time points (instead of 121), acts
> matrix (from test_gnb.m line 68) is copy then pasted as follows
> (NaN's are still around):
>
>
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN
> 0.00151396602162518 NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
> 4.14000006087909e-47 0 0 0 5.54455290559100e-36 2.78430888707368e-20
> 1.05373240260291e-13 3.44594949462253e-25 NaN 0 NaN NaN NaN NaN NaN 0
> 0 0.000103038341289368 0 6.73328908573544e-40 0 0 0 0 0 0 NaN 0 0 0 0
> 0 0 0 0 NaN 0 NaN 0 NaN 0 0 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 NaN 0 NaN NaN NaN
> NaN NaN 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0 0
> 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 NaN 0 NaN NaN NaN
> NaN NaN 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0 0
> 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 NaN 0 NaN NaN NaN
> NaN NaN 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0 0
> 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 NaN 0 NaN NaN NaN
> NaN NaN 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0 0
> 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 NaN 0 NaN NaN NaN
> NaN NaN 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0 0
> 0 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN 0 NaN NaN 0
> NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.09689304578986e-11 1
> 0.0439365907098268 1 1 0.000368237532826858 0.997790728631795
> 2.12331025227936e-19 NaN 0.000214967083165390 NaN NaN NaN NaN NaN 1 1
> 2.45587685893647e-10 1 1 1 1 1 1 1 1 NaN 1 1 1 1 1 1 1 1 NaN 1 NaN 1
> NaN 1 1 1 1
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN
> 0.998486033978375 NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
> 0.999999999939031 0 0.956063409290173 0 1.42966668578629e-22
> 0.999631762467173 0.00220927136810008 1 NaN 0.999785032916835 NaN NaN
> NaN NaN NaN 0 0 0.999896961413123 4.77030170403344e-34
> 1.26831476298938e-25 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 NaN 0 NaN 0 NaN 0
> 0 0 0


>
>
> On Jun 1, 5:27 pm, Greg Detre <gde...@Princeton.EDU> wrote:
>
>> Can you tell me exactly what I should type to reproduce the
>> zero-division error, and also paste the output from matlab into the email?
>>
>> This is what I ran:
>>
>> [subj results] = tutorial_easy_spm('fextension','.img');
>> class_args.train_funct_name = 'train_gnb';
>> class_args.test_funct_name = 'test_gnb';
>> [subj results] =

>> cross_validation(subj,'epi_z','conds','runs_xval','epi_z_thresh0.05',class_­args);

>> email: g...@gregdetre.co.uk
>> web:http://www.princeton.edu/~gdetre/- Hide quoted text -

raw...@yahoo.com

unread,
Jun 3, 2009, 7:44:58 AM6/3/09
to Princeton MVPA Toolbox for Matlab


Dear Greg, what you wrote is right, however, you cannot feel the zero
division without tracing inside the code of test_gnb.m (put a break
point at the line you want to inspect, and execute
[subj results]= cross_validation
(subj,'epi_z','conds','runs_xval','epi_z_thresh0.05',class_args);).

Here is a snapshot of the contents of acts (the first three columns):
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN

now, tracing
[yg guesses] = max(acts);

first three columns of yg is
NaN NaN NaN
and guesses are
1 1 1
So, Nan's is classified as condition 1 !!


It might be a good practice (in MVPA) to mention the reference of any
related Matalab function. I think the reference related to
implementing GNB in test_gnb.m and train_gnb.m is the following.

[1] T. M. Mitchell, R. Hutchinson, R. S. Niculescu, F. Pereira, X. R.
Wang, M. Just, and S. Newman, "Learning to decode cognitive states
from brain images," Machine Learning, vol. 57, pp. 145-175, Oct-Nov
2004.

The above is one of the pioneer works in classification of fMRI data.

Using GNB has two main problems, the zero division problem, and the
performance dilemma of using perfmet_maxclass which is necessary in
Bayes theorem, lots of discussion in the mvpa-toolbox forum exist
about the max classifier. When perfmet_maxclass works on values
obtained incorrectly through zero division, we will get totally
incorrect performance.


However, many other works discusses the theory and practice of Bayes
naive classifier (apparently the classifier is naïve and not Bayes).
How about calling them, train_bnc.m and test_bnc.m.

Anyway, looking at the BNC (or GNB) in Ref. [1], we see that the
denominator is a scalar normalization factor. This appears in
test_gnb.m in line as the code
acts = acts ./repmat(sum(acts,1), nConds, 1);

// please check repmat(sum(acts,1), nConds, 1)..you will see that it
contains many zeors. So, why cant you feel a zero division, cause
Matlab treats zero division as a normal operation and the execution
continues , try the following
>> exp(-1/0)+5
the answer is 5,

What went wrong?

1-Scaling might be incorrect, it could be done according to:

temp = sum(sum(acts));
acts = acts /temp;

SumOfAllP =sum(sum(acts)); % if this value is 1, we are ok, just
checking sum is 1


Note 1: Removing rest points might reduce how many Nan's are there,
but some will stay definitly. I havent try to remove rest yet.


Note 2: All above comments are based on the assumption that my runs
are correct, I hope I did not miss anything.


Regards


Al-Rawi
IEETA
Univ. of Aveiro





On Jun 1, 5:27 pm, Greg Detre <gde...@Princeton.EDU> wrote:
> Can you tell me exactly what I should type to reproduce the
> zero-division error, and also paste the output from matlab into the email?
>
> This is what I ran:
>
> [subj results] = tutorial_easy_spm('fextension','.img');  
> class_args.train_funct_name = 'train_gnb';
> class_args.test_funct_name = 'test_gnb';
> [subj results] =
> cross_validation(subj,'epi_z','conds','runs_xval','epi_z_thresh0.05',class_­args);
> email: g...@gregdetre.co.uk
> web:http://www.princeton.edu/~gdetre/- Hide quoted text -

smith...@gmail.com

unread,
Feb 4, 2015, 7:00:36 PM2/4/15
to mvpa-t...@googlegroups.com
Hi,Al-Rawi 
I still did not understand how to modify the test_gnb.m to fix the underflow problem by using Gaussian Naive Bayes Classifier. Can 
you paste the modified code here?

Thanks!
Smith
Reply all
Reply to author
Forward
0 new messages