Clarifying data format, continuous copy number

16 views
Skip to first unread message

Michael Rightmire

unread,
Jul 14, 2021, 10:30:51 AMJul 14
to cbiop...@googlegroups.com
Hello All,

I'm looking at the cBioPortal documentation at https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats#continuous-copy-number-data

The meta file description discusses using either ...

datatype: CONTINUOUS
stable_id
: linear_CNA

...or...
datatype:
LOG2-VALUE

stable_id:
log2CNA

It then discusses that you can use the data from the GISTIC generated file <prefix>_all_data_by_genes.txt I'm not super familiar with GISTIC output, but I assume (if I use the GISTIC file) I should use log2 datatype and ID?

datatype: LOG2-VALUE
stable_id:
log2CNA

Thanks!

Michael Rightmire

Bioinformatics and Omics Data Analytics (B240)

Omics Software Architect


German Cancer Research Center (DKFZ)

Im Neuenheimer Feld 280

69120 Heidelberg

Germany

phone: +49 176 7131 8758

fax: +49 6221 42-3563


M.Rightmire@dkfz-heidelberg.de

www.dkfz.de




Avast logo

This email has been checked for viruses by Avast antivirus software.
www.avast.com


Ino de Bruijn

unread,
Jul 14, 2021, 12:44:58 PMJul 14
to Michael Rightmire, cBioPortal for Cancer Genomics Discussion Group, Ritika Kundra, Y
Hi Michael,

Thanks for reaching out!

I believe you should indeed use:

datatype: LOG2-VALUE
stable_id:
 log2CNA

CC'ing some more members of our team to confirm

Best wishes,
Ino

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbioportal/00324313-c30c-7763-7cff-addab08b2a0b%40dkfz-heidelberg.de.

Michael Rightmire

unread,
Jul 14, 2021, 4:55:18 PMJul 14
to Ino de Bruijn, cBioPortal for Cancer Genomics Discussion Group, Ritika Kundra, Y
Hi Ino,
--

Michael Rightmire

unread,
Jul 23, 2021, 7:15:07 AMJul 23
to Ino de Bruijn, cBioPortal for Cancer Genomics Discussion Group, Ritika Kundra, Y, gistic...@broadinstitute.org
Hi Ino,

Upon further examination, it seems it's not the negative values in the all_data_by_genes.txt file, it that occasionally the Entrez_Gene_Id column ends up with a negative integer, instead of a valid Gene ID. It seems to happen only with data lines that have Hugo symbols for RNA in the format hsa-mir-xxxx. Any idea why this is happening? Do those negative integers hold any meaning, or can they just be replaced with N/a? Hugo_Symbol : hsa-mir-4320 Entrez_Gene_Id : -948 Cytoband : 18q21.1 <ID1>.tumor021-01 : 0.180 <ID2>.tumor011-01 : -0.013 <ID3>.tumor001-01 : 0.122 <ID2>.tumor011-01 : -0.008 ================== Hugo_Symbol : hsa-mir-122 Entrez_Gene_Id : -1013 Cytoband : 18q21.31 <ID1>.tumor021-01 : 0.335 <ID2>.tumor011-01 : -0.013 <ID3>.tumor001-01 : 0.122 <ID2>.tumor011-01 : -0.015
Thanks!
Mike

Ino de Bruijn

unread,
Aug 2, 2021, 4:15:00 PMAug 2
to Michael Rightmire, cBioPortal for Cancer Genomics Discussion Group, Ritika Kundra, Y, gistic...@broadinstitute.org
Hi Michael,

Apologies for the delay in replying. The negative integers can be safely ignored. They aren't really valid genes but we've used them internally to represent non-gene entities such as e.g. expression of miRNA.

Best wishes,
Ino
Reply all
Reply to author
Forward
0 new messages