Greetings,
> On Oct 30, 2023, at 12:46, Andrea Ivan Costantino <
andreaivan...@gmail.com> wrote:
>
> Thanks. I am aware of the problem of double dipping, and it seems indeed that this is what is happening here.
>
> However, it's not clear to me how the partitioning in test/train data would be implemented when we want to do features selection. I can implement it manually for each fold, but I feel there must be an easier way using cosmo native functions. This is how I am running the MVPA:
>
>
> % Define labels for the data samples and other arguments needed for classification
> ds.sa.targets = results.targets_table.CheckmateTarget; % Assign the variable "checkmateTargets" as the target labels
> comsoArgs = struct(); % Initialize an empty structure to hold classification arguments % Define the classifier function to be used (Linear Discriminant Analysis in this case)
> comsoArgs.classifier = @cosmo_classify_lda; % Define how to partition the data for cross-validation
> comsoArgs.partitions = cosmo_nfold_partitioner(ds); % Specify the type of output to be produced by the classifier ('fold_accuracy' means accuracy will be calculated for each fold)
> comsoArgs.output = 'fold_accuracy'; % Set the maximum number of features to be considered in the classification
> comsoArgs.max_feature_count = 10000; % Run the MVPA classification
> checkRes = cosmo_crossvalidation_measure(ds, comsoArgs);
>
> How would I use here the cosmo_meta_feature_selection_classifier function or, more generally, how would I do features selection in this lda analysis?
Actually cosmo_meta_feature_selection_classifier is deprecated, the more updated function is now called cosmo_classify_meta_feature_selection. Its documentation contains an example using a searchlight.
There the final line of code is:
res=cosmo_searchlight(ds_tl,nbrhood,measure,measure_args,...
'progress',false);
which, if you want to run the analysis only once on the entire dataset in ds_tl (without searchlight), can be changed into:
res=measure(ds_tl, measure_args)
The idea is that for the measure arguments used in cosmo_classify_meta_feature_selection:
- child_classifier is used as classifier, you would use @cosmo_classify_lda there
- feature_selector and feature_selection_ratio_to_keep define how to select the ‘best' features
- other arguments, such as partitions, are passed onto the child_classifier.
Does that help?