Hi all,
For instance, when trying to load the “msk_impact_50k_2026” study into the database, I am seeing the following errors in the stdout:
ERROR: data_clinical_sample.txt: line 5: column 20: Attribute name not in upper case.; value encountered: 'purity_estimate_from_mutations'
And
ERROR: data_sv.txt: lines [25, 150, 154, (354 more)]: column 1: Sample ID not defined in clinical file; values encountered: ['P-0012686-T01-IM5', 'P-0013979-T01-IM5', 'P-0010429-T01-IM5', '(317 more)']
WARNING: data_sv.txt: lines [7423, 7840, 12522, (1 more)]: Hugo Symbol should not start with a number.; values encountered: ['48787', '30302_C.890', '1311DEL']
I am concerned that I am looking in an out of date location for the public datasets since a lot of the datasets in the GitHub repo linked have issues loading into the database.
Please let me know how I can get these datasets loaded or if there is an updated data repository.
Thank you!