Dear Ritika,
thank you very much for your quick reply.
In the cBioPortal study "Head and Neck Squamous Cell Carcinoma (TCGA, PanCancer Atlas)", there is a column named "Subtype" in the Clinical Data table, which provides information about the patients’ HPV status.
Below the table headline, there is a note stating “the original data is here”, which links to the GDC website under the page titled “TCGA-PanCanAtlas Publications.
From that landing page, I downloaded the “Clinical with Follow-up” table, which contains data from multiple TCGA projects. I opened this table in R, sorted out all samples, which were not HNSC and examined all remaining columns that might provide HPV-related information for TCGA-HNSC patients. These columns were:
hpv_test, hpv_status_by_ish_testing, hpv_status_by_p16_testing, human_papillomavirus_other_type_text, human_papillomavirus_laboratory_procedure_performed_text, and human_papillomavirus_type.
After analyzing these columns, I was able to identify 81 HPV-negative and 44 HPV-positive samples.
However, the cBioPortal table lists HPV information for 487 samples in total.
My question, therefore, is how the HPV status data in cBioPortal was collected, since I could not find a description of the method, which was used, on the website.
I also searched for this information in the clinical data table, which can be downloaded from the GDC Data Portal. Unfortunately, this table didn´t provide any information regarding this.
I would highly appreciate your help with this.
Kind regards,
Arian
> Ursprüngliche Nachricht:
> Von: Ritika Kundra <
ritika...@gmail.com>
> An: Arian Michael Daschner <
A.Das...@campus.lmu.de>
> Kopie:
cbiop...@googlegroups.com
> Datum: Wed Oct 22 16:25:45 CEST 2025