[email Thorben S. 19.05.2021]
We observed that we have some duplicates in one of the counts file we uploaded for analysis. We noticed that the playground automatically reduces the number of counts and excludes the duplicates in the anaylsis, however we were wondering which of the duplicate is chosen for keeping and which ones are excluded. Or is some kind of merging process occuring?
The duplicates originated from a protein grouping algorithm where protein ID (or gene IDs for gene groups) may be present in multiple groups, depending on the indentified peptides. However, since we have to reduce the protein/gene groups to single identifiers, we typically just keep the first identifier and delete the following. Is that approach okay? I dont know how to handle protein groups for differential analysis or geneset enrichment analysis where single identifiers are needed. Could you give me some guidance on how to hande these kind of data?