Hi Ritika,
It seems it cannot recognize some values referring to NAs in samples, mutations_extended and CNA files.
Here is most of the output and I have highlighted the errors in red:
....
INFO: -: Validation of case list folder complete
INFO: data_gene_matrix.txt: line 1: This column can be replaced by a 'gene_panel' property in the respective meta file; value encountered: 'cna'
ERROR: data_gene_matrix.txt: lines [11, 22, 26, (35601 more)]: column 3: Blank cell found in column; value encountered: ''' (in column 'cna')'
ERROR: data_gene_matrix.txt: lines [11, 22, 26, (35601 more)]: Gene panel ID is not in database. Please import this gene panel before loading study data.; value encountered: ''
INFO: data_gene_matrix.txt: Validation of file complete
INFO: data_gene_matrix.txt: Read 135707 lines. Lines with warning: 0. Lines with error: 35604
WARNING: data_CNA.txt: line 1: The recommended column Entrez_Gene_Id was not found. Using Hugo_Symbol for all gene parsing.
WARNING: data_CNA.txt: lines [92, 99, 179, (46 more)]: Gene symbol not known to the cBioPortal instance. This record will not be loaded.; values encountered: ['BRE', 'C11ORF30', 'CXORF67', '(46 more)']
INFO: data_CNA.txt: Validation of file complete
INFO: data_CNA.txt: Read 965 lines. Lines with warning: 50. Lines with error: 0
WARNING: data_clinical_patient.txt: Columns OS_MONTHS and/or OS_STATUS not found. Overall survival analysis feature will not be available for this study.
WARNING: data_clinical_patient.txt: Columns DFS_MONTHS and/or DFS_STATUS not found. Disease free analysis feature will not be available for this study.
ERROR: data_clinical_patient.txt: lines [6, 10, 13, (77894 more)]: columns [7, 10, 6, (1 more)]: Value of numeric attribute is not a real number; values encountered: ['Unknown', 'Not Applicable', 'Not Collected', '(1 more)']
INFO: data_clinical_patient.txt: Validation of file complete
INFO: data_clinical_patient.txt: Read 121226 lines. Lines with warning: 0. Lines with error: 77897
WARNING: data_fusions.txt: lines [32, 162, 246, (1435 more)]: Gene symbol not known to the cBioPortal instance. This record will not be loaded.; values encountered: ['PARK2', 'HIST1H2BD', 'MRE11A', '(659 more)']
WARNING: data_fusions.txt: lines [2530, 2531, 2532, (2664 more)]: Entrez gene id is not an integer. This record will not be loaded.; values encountered: ['238.0', '324.0', '8289.0', '(709 more)']
WARNING: data_fusions.txt: lines [13995, 14252, 35873, (1 more)]: Hugo Symbol is not in gene or alias table and starts with a number. This can be caused by unintentional gene conversion in Excel.; values encountered: ['48787', '30302_C.
890', '1311DEL']
INFO: data_fusions.txt: Validation of file complete
INFO: data_fusions.txt: Read 41199 lines. Lines with warning: 4105. Lines with error: 0
WARNING: data_mutations_extended.txt: column 60: A SWISSPROT column was found in datafile without specifying associated 'swissprot_identifier' in metafile, assuming 'swissprot_identifier: name'.
WARNING: data_mutations_extended.txt: lines [2, 3, 4, (9652 more)]: Variant_Type indicates a SNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 1.; values encountered: ['(T, , C)', '(G, , T)', '
(C, , A)', '(9 more)']
WARNING: data_mutations_extended.txt: lines [2, 3, 4, (216850 more)]: Missing value in SWISSPROT column; this column is recommended to make sure that the UniProt canonical isoform is used when drawing Pfam domains in the mutations view.
; value encountered: ''
INFO: data_mutations_extended.txt: lines [35, 49, 59, (9949 more)]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; values encountered: ['Intr
on', 'Silent', '3'UTR', '(4 more)']
WARNING: data_mutations_extended.txt: lines [172, 225, 301, (160 more)]: Variant_Type indicates a DNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 2.; values encountered: ['(CG, , AA)', '(CC,
, AA)', '(GG, , TT)', '(27 more)']
WARNING: data_mutations_extended.txt: lines [303, 315, 1545, (114 more)]: Variant_Type indicates a ONP, but length of Reference_Allele, Tumor_Seq_Allele1 and 2 are not bigger than 3 or are of unequal lengths.; values encountered: ['(GTG
, , AAA)', '(ACCAC, , GTGGT)', '(CTG, , TTG)', '(77 more)']
WARNING: data_mutations_extended.txt: lines [12632, 12702, 12888, (3345 more)]: Entrez gene id exists, but gene symbol specified is not known to the cBioPortal instance. The gene symbol will be ignored. Might be wrong mapping, new or de
precated gene symbol.; values encountered: ['MEF2BNB-MEF2B', 'PARK2', 'RFWD2', '(18 more)']
WARNING: data_mutations_extended.txt: lines [12632, 14434, 14478, (789 more)]: Off panel variant. Gene symbol not known to the targeted panel.; values encountered: ['MEF2BNB-MEF2B', 'GNB2L1', 'NUTM1', '(19 more)']
WARNING: data_mutations_extended.txt: lines [29888, 45896, 45961, (27 more)]: No Amino_Acid_Change or HGVSp_Short value. This mutation record will get a generic "MUTATED" flag
WARNING: data_mutations_extended.txt: lines [226807, 226808, 226809, (478212 more)]: Missing value in SWISSPROT column; this column is recommended to make sure that the UniProt canonical isoform is used when drawing Pfam domains in the
mutations view.; value encountered: ''
INFO: data_mutations_extended.txt: lines [226825, 226837, 226846, (9415 more)]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; values encount
ered: ['Intron', 'Silent', 'RNA', '(4 more)']
WARNING: data_mutations_extended.txt: lines [226849, 226916, 226917, (5774 more)]: Entrez gene id exists, but gene symbol specified is not known to the cBioPortal instance. The gene symbol will be ignored. Might be wrong mapping, new or
deprecated gene symbol.; values encountered: ['BRE', 'WHSC1L1', 'WHSC1', '(44 more)']
WARNING: data_mutations_extended.txt: lines [227152, 227218, 227353, (350 more)]: Off panel variant. Gene symbol not known to the targeted panel.; values encountered: ['PGBD3', 'MEF2BNB-MEF2B', 'FIP1L1', '(24 more)']
WARNING: data_mutations_extended.txt: lines [227289, 228078, 228676, (534 more)]: Variant_Type indicates a ONP, but length of Reference_Allele, Tumor_Seq_Allele1 and 2 are not bigger than 3 or are of unequal lengths.; values encountered
: ['(CTC, CTC, ATT)', '(CTC, CTC, TTT)', '(TGC, TGC, GAA)', '(306 more)']
WARNING: data_mutations_extended.txt: lines [230068, 232465, 243076, (67 more)]: No Amino_Acid_Change or HGVSp_Short value. This mutation record will get a generic "MUTATED" flag
WARNING: data_mutations_extended.txt: lines [331868, 331869, 331870, (1911 more)]: Variant_Type indicates a SNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 1.; values encountered: ['(G, , A)'
, '(C, , G)', '(A, , C)', '(13 more)']
WARNING: data_mutations_extended.txt: lines [331968, 331970, 331974, (265 more)]: Variant_Type indicates a DNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 2.; values encountered: ['(CG, , GG)
', '(CC, , AC)', '(CG, , AG)', '(55 more)']
ERROR: data_mutations_extended.txt: lines [332357, 332496, 332549, (19 more)]: No Entrez gene id or gene symbol provided for gene.
WARNING: data_mutations_extended.txt: lines [337450, 337462, 337596, (3420 more)]: Gene symbol not known to the cBioPortal instance. This record will not be loaded.; values encountered: ['PAK7', 'PARK2', 'WHSC1', '(10 more)']
WARNING: data_mutations_extended.txt: lines [414405, 415440, 593496]: All Values in columns Reference_Allele, Tumor_Seq_Allele1 and Tumor_Seq_Allele2 are equal.; values encountered: ['(GAGG, GAGG, GAGG)', '(AGG, AGG, AGG)']
WARNING: data_mutations_extended.txt: lines [714440, 714441, 714442, (277644 more)]: Missing value in SWISSPROT column; this column is recommended to make sure that the UniProt canonical isoform is used when drawing Pfam domains in the
mutations view.; value encountered: ''
INFO: data_mutations_extended.txt: lines [714469, 714526, 714532, (73719 more)]: Line will not be loaded due to the variant classification filter. Filtered types: [Silent, Intron, 3'UTR, 3'Flank, 5'UTR, 5'Flank, IGR, RNA]; values encoun
tered: ['5'Flank', '3'UTR', '5'UTR', '(4 more)']
WARNING: data_mutations_extended.txt: lines [714505, 714510, 714737, (1713 more)]: Entrez gene id exists, but gene symbol specified is not known to the cBioPortal instance. The gene symbol will be ignored. Might be wrong mapping, new or
deprecated gene symbol.; values encountered: ['HIST1H3E', 'HIST1H3C', 'HIST1H1C', '(44 more)']
WARNING: data_mutations_extended.txt: lines [714571, 714621, 714736, (2660 more)]: Gene symbol not known to the cBioPortal instance. This record will not be loaded.; values encountered: ['FAM46C', 'PAK7', 'PARK2', '(28 more)']
WARNING: data_mutations_extended.txt: lines [714616, 715842, 717575, (333 more)]: Variant_Type indicates a ONP, but length of Reference_Allele, Tumor_Seq_Allele1 and 2 are not bigger than 3 or are of unequal lengths.; values encountered
: ['(CTC, CTC, TTT)', '(CAC, CAC, AAA)', '(GAG, GAG, AAA)', '(238 more)']
WARNING: data_mutations_extended.txt: lines [732877, 740361, 743455, (713 more)]: No Amino_Acid_Change or HGVSp_Short value. This mutation record will get a generic "MUTATED" flag
WARNING: data_mutations_extended.txt: lines [778513, 778538, 794417, (448 more)]: Off panel variant. Gene symbol not known to the targeted panel.; values encountered: ['C1orf147', 'HIST2H3D', 'PCDHAC1', '(14 more)']
WARNING: data_mutations_extended.txt: lines [796770, 796773, 796774, (31020 more)]: Variant_Type indicates a SNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 1.; values encountered: ['(A, , G)
', '(C, , T)', '(T, , A)', '(13 more)']
WARNING: data_mutations_extended.txt: lines [796810, 796813, 796826, (414 more)]: Variant_Type indicates a DNP, but length of Reference_Allele, Tumor_Seq_Allele1 and/or Tumor_Seq_Allele2 do not equal 2.; values encountered: ['(CC, , AC)
', '(GA, , AA)', '(CC, , TC)', '(77 more)']
WARNING: data_mutations_extended.txt: lines [846157, 846235, 846527, (169 more)]: All Values in columns Reference_Allele, Tumor_Seq_Allele1 and Tumor_Seq_Allele2 are equal.; values encountered: ['(-, -, -)', '(CT, CT, CT)', '(CG, CG, CG
)', '(2 more)']
WARNING: data_mutations_extended.txt: lines [846157, 846527, 846846, (132 more)]: Given value for Variant_Classification column is not one of the expected values. This can result in mapping issues and subsequent missing features in the
mutation view UI, such as missing COSMIC information.; values encountered: ['In_Frame_DEL', 'Frame_Shift_DEL']
WARNING: data_mutations_extended.txt: lines [859622, 859623, 859624, (5 more)]: Entrez gene id and gene symbol do not match. The gene symbol will be ignored. Might be wrong mapping or recycled gene symbol.; value encountered: '(KMT2D, 9
757)'
ERROR: data_mutations_extended.txt: lines [880084, 880092, 880098, (551 more)]: No Entrez gene id or gene symbol provided for gene.
INFO: data_mutations_extended.txt: Validation of file complete
INFO: data_mutations_extended.txt: Read 1065808 lines. Lines with warning: 972715. Lines with error: 576
WARNING: genie_data_cna_hg19.seg: lines [106307, 153216, 334275, (6 more)]: Segment is zero bases wide and will not be loaded; values encountered: ['153023184-153023184', '65096797-65096797', '36854039-36854039', '(6 more)']
INFO: genie_data_cna_hg19.seg: Validation of file complete
INFO: genie_data_cna_hg19.seg: Read 3748118 lines. Lines with warning: 9. Lines with error: 0
INFO: -: Validation complete
Thanks,
Shakuntala
---------------------------------------------------------------
Assoc. Prof. (Dr.) Shakuntala Baichoo
Department of Digital Technologies, FoICDT, University of Mauritius
Phone:
+230 4037762