Error in Loading Data - checksum issue

233 views
Skip to first unread message

morteza...@gmail.com

unread,
Jun 20, 2023, 10:01:54 PM6/20/23
to physionet-challenges
Hello,

I encounter an error while attempting to load the data. There appears to be an inconsistency between the checksum in the header file and the actual summation of the files in the data. The error message I received is as follows:

"helper_code.py", line 116, in load_recording_data
    raise ValueError('The checksum in header file {}'.format(header_file) \
ValueError: The checksum in header file ~/physionet.org/files/i-care/2.0/training/0284/0284_001_004_EEG.hea is inconsistent with the initial value for channel.

I have checked the provided files and their corresponding checksums, but the values do not align. I believe this discrepancy is causing the error during the loading process. 

I would appreciate it if you could guide how to address this issue. Thank you!

Warm regards,
Morteza

PhysioNet Challenge

unread,
Jun 20, 2023, 10:07:14 PM6/20/23
to physionet-challenges
Hi Morteza,

It seems like your local copy of the 0284_001_004_EEG record may be corrupted.

I re-downloaded this record using wget. The checksums matched, and I was able to load the file, so it seems like the copy on the server is OK. Could you please try re-downloading the 0284_001_004_EEG.mat file and loading the data again?

In this case, the checksum is working as intended to verify the integrity of the signal data, but I'll push a change to this function to allow you to optionally disable this check in case you have several corrupted files and want to try to work with the data as you verify it.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Christian Canedo

unread,
Jul 6, 2023, 8:48:49 AM7/6/23
to physionet-challenges
Hello, 

we're having the same problems using the python functions provided in helper_code.py. We have downloaded the 0284_001_004_EEG data (.mat and .hea) from 
https://physionet.org/content/i-care/2.0/#files as you recommend ahead. But the checksums and the data sums still don't match:

For Fp1 channel:
sum: -720840266 
checksum: 37933865398

This is just the example Morteza shows, but we have checked it in many other signals. It does not seem like a corrupted file problem and we don't really know what's the problem.

Thank you very much for the help!

Regards, 
Christian.

jstyoon96

unread,
Jul 6, 2023, 8:48:55 AM7/6/23
to physionet-challenges
Hi Morteza,

Our team had the same issue on checksum.

We had manually checked if the data was corrupted. However it was not corrupted.

Problem
The problem we found was that when loading the .mat file, the default dtype were int16 or int32, therefore when calculating checksum (adding all the amplitude values of EEG) we have encountered  the following error. [Python RuntimeWarning: overflow encountered in long scalars]

Solution
Check the dtype of the downloaded .mat file and if it is int16 or int32 change it to int64 before checksum.
Or check the version of Scipy (far as I know version under 1.0 has the default of int16 when using sp.io.loadmat)
Checkout the link below for detail

Best,
Justin
2023년 6월 21일 수요일 오전 11시 7분 14초 UTC+9에 PhysioNet Challenge님이 작성:

PhysioNet Challenge

unread,
Jul 6, 2023, 8:52:01 AM7/6/23
to physionet-challenges
Dear Morteza, Christian, Justin,

Yes, this is the most likely explanation. The updated recordings are longer, and they use the full range of 16-bit signed integers, so they are more likely to overflow. The example code doesn't use all of the recording segments or channels, so we didn't encounter this issue when testing the code. In all likelihood, your files are not corrupted, and I apologize for all of the confusion.

I have updated the helper_code file to promote the signal data to 64-bit signed integers for the checksum calculation, and I have disabled the checksum calculation by default to avoid the problem with incorrect checksums. Due to the size of the data, we will not be able to shared updated files now, but we will plan to regenerate all of the checksums at the end of the Challenge.


Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Reply all
Reply to author
Forward
0 new messages