Dear Challenge team,
While processing the new dataset, we encountered a problem with digitization and ambiguous approach for individual records and even channels for the same patient in the same record.
We have downloaded the new helper code and used it since it changed in the official round and even reading the file using the WFDB library now has different output than the helper code.
In the unofficial round, the signals were digitized with ADC gain = 32/uV, i.e. j. the digital unit corresponded to 1/32 = 0.03125 microvolt:
ICARE_0613_67 18 100 30000
ICARE_0613_67.mat 16+24 32/uV 16 0 0 0 0 Fp1-F7
ICARE_0613_67.mat 16+24 32/uV 16 0 0 0 0 F7-T3
ICARE_0613_67.mat 16+24 32/uV 16 0 0 0 0 T3-T5
ICARE_0613_67.mat 16+24 32/uV 16 0 0 0 0 T5-O1
ICARE_0613_67.mat 16+24 32/uV 16 0 0 0 0 Fp2-F8
(Example file in the unofficial round with random set of channels)
In the official round, it looks as if the ADC gain was set dynamically according to the range signal, e.g.:
0613_077_067_EEG 19 500 1800000
0613_077_067_EEG.mat 16+24 -0.0003181560314260423 16 0 1 1800000 0 Fp1
0613_077_067_EEG.mat 16+24 6.291050794970943e-06 16 0 1 1800000 0 Fp2
0613_077_067_EEG.mat 16+24 -0.0003429266798775643 16 0 1 1800000 0 F3
0613_077_067_EEG.mat 16+24 -0.00020546276937238872 16 0 1 1800000 0 F4
0613_077_067_EEG.mat 16+24 -0.00013665278675034642 16 0 1 1800000 0 C3
0613_077_067_EEG.mat 16+24 -0.0002076699456665665 16 0 1 1800000 0 C4
0613_077_067_EEG.mat 16+24 -0.0001065958640538156 16 0 1 1800000 0 P3
(Same file in official round, same patient/hour/channels as in unofficial round, but now with full 1h record)
At the same time, the ADC gain values are even negative (sometimes they reach large abs values, e.g. -32768). If the ADC gain is very small, e.g. 0.001, so the digital unit corresponds to 1000 microvolts. Due to the physiological range of EEG amplitudes, the signal processed in this way is unusable.
It seems that in a large number of channels, the ADC gain was set to a relatively small value (in the abs value significantly smaller than 32/uV, respectively smaller than 1/uV), which makes it possible to capture the measured extreme values in case of measurement errors or artifacts, but leads to low accuracy of values in a physiologically interesting range. Moreover, there are significant differences not only between individual EEGs, but sometimes also between individual channels within one EEG. This can have an impact on the calculation of the channel difference when re-referencing the montage.
In my opinion, it would be more correct to cut off extreme values and choose a uniform ADC gain during digitization, large enough for the required accuracy of values in the physiologically interesting area, since the share of signals degraded in this way in the dataset is significant.
With greetings, David