cBioPortal Import Error: Duplicate SV/Fusion Entries

9 views
Skip to first unread message

Setlem, Rohit, M.S.

unread,
Jan 24, 2026, 3:52:57 PM (3 days ago) Jan 24
to cbiop...@googlegroups.com, Sivasankaran, Gopi
Hi Team,

We are attempting to upload RNA fusion SV data to cBioportal and encountering the following error:

ERROR: data_fusions_extended.txt: lines [130, 131, 132, (12417 more)]:
Duplicate entry in structural variant data; values encountered:
[ ... Results (already defined on line 117) … ]

Below are some example entries from the file that trigger the error:


Sample_Id
SV_Status
Site1_Hugo_Symbol
Site1_Ensembl_Transcript_Id
Site1_Entrez_Gene_Id
Site1_Region_Number
Site1_Region
Site1_Chromosome
Site1_Contig
Site1_Position
Site2_Hugo_Symbol
Site2_Ensembl_Transcript_Id
Site2_Entrez_Gene_Id
Site2_Region_Number
Site2_Region
Site2_Chromosome
Site2_Contig
Site2_Position
Site2_Effect_On_Frame
NCBI_Build
Class
RNA-Control-2
SOMATIC
VCL
NM_014000
VCL
16
exon
chr10
NA
75865111
NTRK2
Refseq_3
NTRK2
12
exon
chr9
NA
87356807
In-Frame
GRCh37
FUSION
RNA-Control-2
SOMATIC
VCL
NM_014000
VCL
16
exon
chr10
NA
75865111
NTRK2
Refseq_3
NTRK2
13
exon
chr9
NA
87359888
In-Frame
GRCh37
FUSION

In this case, the fusion is between the same gene pair in the same sample (VCL-NTRK2), but the 3’  breakpoint differs (chr9:87356807 vs chr9:87359888) and the region number differs (12 vs 13).

We also see similar failures in other instances where the gene pairs are same, but the transcript identifiers vary, and the importer reports them as duplicates.

Could you please clarify which fields are used to define duplicates for SV/fusion entries during import? 
Additionally, what is the recommended approach to represent the same gene pair within a sample but with different breakpoints or transcripts ? 
We would appreciate your guidance on how to handle these situations to ensure a successful upload the biologically distinct events

Thanks,

Rohit Setlem | Senior Bioinformatician | Quantitative Health Sciences | Email: setlem...@mayo.edu

Mayo Clinic | 200 First Street SW | Rochester, MN 55905 | www.mayoclinic.org

 

Benjamin Gross

unread,
Jan 24, 2026, 5:25:13 PM (2 days ago) Jan 24
to Setlem, Rohit, M.S., cbiop...@googlegroups.com, Sivasankaran, Gopi
Hi Rohit,

Thank you for the email. validateData.py is the code that is generating this error. The validator is called by the meta importer before the actual import. The columns below are taken into account when determining SV record uniqueness (see https://github.com/cBioPortal/cbioportal-core/blob/main/scripts/importer/validateData.py#L3151). If the column headers in the file are not named exactly as shown below, they may be ignored by the validator when checking for uniqueness. I would check that first.

 UNIQUENESS_COLUMNS = [
        'Sample_Id',
        'Site1_Hugo_Symbol',
        'Site1_Entrez_Gene_Id',
        'Site1_Chromosome',
        'Site1_Position',
        'Site1_Region_Number',
        'Site1_Ensembl_Transcript_Id', 
        'Site2_Hugo_Symbol',
        'Site2_Entrez_Gene_Id',
        'Site2_Chromosome',
        'Site2_Position',
        'Site2_Region_Number',
        'Site2_Ensembl_Transcript_Id',
        'Event_Info’]

Let me know how it goes.
-Benjamin

--
You received this message because you are subscribed to the Google Groups "cBioPortal for Cancer Genomics Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbioportal+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cbioportal/SJ2PR01MB848445781B2915BCDAD511D2FA94A%40SJ2PR01MB8484.prod.exchangelabs.com.

Reply all
Reply to author
Forward
0 new messages