Hello,
I have noticed that the TCGA PanCancerAtlas datasets differ dramatically from the TCGA GDC datasets in terms of copy number variation (HOMDEL, HETLOSS, GAIN, AMP in the Oncoqueries). This is surprising given the exact same samples are analyzed, yet many of
the samples that were previously reported to harbour shallow deletion of a given gene in the PanCancerAtlas now seem to exhibit normal copy number or gain or amplification of the same gene in the GDC version of the dataset. I have found this trend to be true
of all the genes tested so far, in all the cancer types I have compared.
Could you please explain why the pipeline used to reanalyze the data in the GDC version so often produces a different result than the PanCancerAtlas pipeline? Is one pipeline more accurate than the other? Are their known biases in the process that may skew
the results one way or the other?
Any insight into these discrepancies would be greatly appreciated.
Thank you very much,
- Lucile Jeusset