I’m working with the
GDC TCGA LUAD dataset via the UCSC Xena Browser.
I noticed that there are 721 samples with phenotypic information, while the Ensemble Somatic Variant (WXS) file is a 194,731 × 12 table containing 577 unique samples. However, it is not clear whether 577 samples were successfully sequenced via WXS or if more were sequenced but do not appear in this table due to lacking somatic variants.
Could you clarify how to distinguish between:
1. Samples that did not have WXS successfully performed, and
2. Samples for which WXS was performed but no somatic variants were detected?
For example, if gene X is altered in 100 patients, I’m unsure whether the appropriate denominator should be 577 (assuming all sequenced samples appear in the WXS file), or a larger number if some sequenced samples have no reported variants.
Thank you very much for your help!
Best regards,
Joshua Lau