Hi all,
There are several issues, let me know whether I addressed all of them:
1) Robustness against translation and scaling on a small example:
Here are various cases (code I wrote using the data objects used by
the platform):
% Script to test the Hebbian object from the sample code
a=10^5;
num_patt=10;
num_feat=5;
% Create training and test data
t{1}='original';
D{1}=data_object(rand(num_patt,num_feat),sign(randn(num_patt,1)));
DT{1}=data_object(rand(num_patt,num_feat),sign(randn(num_patt,1)));
% Create a variant of the training data with first feature flipped
t{2}='one_flipped';
D{2}=D{1};
D{2}.X(:,1)=-D{2}.X(:,1);
DT{2}=DT{1};
DT{2}.X(:,1)=-DT{2}.X(:,1);
% Create a variant of the training data with first feature scaled
t{3}='one_scaled';
D{3}=D{1};
D{3}.X(:,1)=a*D{3}.X(:,1);
DT{3}=DT{1};
DT{3}.X(:,1)=a*DT{3}.X(:,1);
% Create a variant of the training data with all features scaled
t{4}='all_scaled';
D{4}=D{1};
D{4}.X=a*D{4}.X;
DT{4}=DT{1};
DT{4}.X=a*DT{4}.X;
% Create a variant of the training data with first feature shifted
t{5}='one_shifted';
D{5}=D{1};
D{5}.X(:,1)=D{5}.X(:,1)+a;
DT{5}=DT{1};
DT{5}.X(:,1)=DT{5}.X(:,1)+a;
% Create a variant of the training data with all features shifted
t{6}='all_shifted';
D{6}=D{1};
D{6}.X=D{6}.X+a;
DT{6}=DT{1};
DT{6}.X=DT{6}.X+a;
% Examine the results
for k=1:length(D)
fprintf('------ %s ------\n', t{k});
[DD, MM]=train(hebbian, D{k});
fprintf('W = [');
for j=1:length(MM.W)
fprintf('%5.4f ', MM.W(j));
end
fprintf(']\n');
DD=test(MM, DT{k});
fprintf('AUC=%5.4f\n', auc(DD));
end
Result:
=====
------ original ------
W = [0.2056 -0.1720 0.0726 0.0137 -0.2762 ]
AUC=0.5238
------ one_flipped ------
W = [-0.2056 -0.1720 0.0726 0.0137 -0.2762 ] <== the sign of the first
weight is changed
AUC=0.5238 <== because the first feature is flipped in test data also,
this does not change the AUC
------ one_scaled ------
W = [20555.2882 -0.1720 0.0726 0.0137 -0.2762 ] <== the first weight
is scaled
AUC=0.3810 <== this changes the AUC because the relative importance of
the features is changed
------ all_scaled ------
W = [20555.2882 -17204.0064 7262.2386 1366.3886 -27617.9802 ] <== all
weights are scaled
AUC=0.5238 <== this does not change the AUC because the relative
importance of the features in not changed
------ one_shifted ------
W = [0.2056 -0.1720 0.0726 0.0137 -0.2762 ] <== the weights are the
same
AUC=0.5238 <== the AUC is the same because it does not care about a
bias change
------ all_shifted ------
W = [0.2056 -0.1720 0.0726 0.0137 -0.2762 ] <== the weights are the
same
AUC=0.5238 <== the AUC is the same because it does not care about a
bias change
2) Changes in ALC on the test example
http://litpc45.ulb.ac.be/SylvesterTestScore.zip
Here we have:
all(all(A81(:,2:end)==A82(:,2:end)))
all(all(A81(:,1)==1000-A82(:,1)))
In principle, all AUCs and ALCs should be the same, as per the rules
investigate above. We took care in the evaluation software of using
the same set of randomly drawn data splits. HOWEVER, we use a floating
number of data splits, i.e. we keep averaging the AUC over an
increasing number of data splits until we reach an error bar that goes
under a threshold. Even using double precision, there are enough
rounding errors that, for small numbers of training examples, this
make a difference. Here are the logfiles of the 2 runs:
Example81:
-------------------- Point 1 ----------------------
296 repeats, auc= 0.54+- 0.01 -----------------
-------------------- Point 2 ----------------------
254 repeats, auc= 0.62+- 0.01 -----------------
-------------------- Point 3 ----------------------
126 repeats, auc= 0.74+- 0.01 -----------------
-------------------- Point 4 ----------------------
53 repeats, auc= 0.81+- 0.01 -----------------
-------------------- Point 5 ----------------------
31 repeats, auc= 0.88+- 0.01 -----------------
-------------------- Point 6 ----------------------
22 repeats, auc= 0.91+- 0.01 -----------------
-------------------- Point 7 ----------------------
10 repeats, auc= 0.94+- 0.01 -----------------
Example82
-------------------- Point 1 ----------------------
160 repeats, auc= 0.55+- 0.01 -----------------
-------------------- Point 2 ----------------------
173 repeats, auc= 0.62+- 0.01 -----------------
-------------------- Point 3 ----------------------
122 repeats, auc= 0.73+- 0.01 -----------------
-------------------- Point 4 ----------------------
55 repeats, auc= 0.81+- 0.01 -----------------
-------------------- Point 5 ----------------------
31 repeats, auc= 0.88+- 0.01 -----------------
-------------------- Point 6 ----------------------
22 repeats, auc= 0.91+- 0.01 -----------------
-------------------- Point 7 ----------------------
10 repeats, auc= 0.94+- 0.01 -----------------
The resulting learning curves are very similar, but slightly different
because of numerical precision.
3) Quantization:
If the whole matrix was just translated and scaled and there were no
numerical precision issues, both ALC and AUC should be the same before
and after quantization. However, because of our implementation
allowing a floating number of repeats and because of precision, the
result can end up being different.
Let me know whether everything is now clear.
Best regards,
The organizers
On Apr 13, 10:35 am, Yann <
yann...@gmail.com> wrote:
> Hi,
>
> As I understand the way the scores are computed, translating or
> scaling the data before submitting them should not make any
> difference.
>
> I however observed potentially big differences, just by changing the
> sign of a variable in the validation set.
>
> As an example, seehttp://
litpc45.ulb.ac.be/SylvesterTestScore.zip