isoforms ACs in input file

Shahab Mirshahvaladi

unread,

May 29, 2022, 7:29:31 PM5/29/22

to Omics Playground

Hi all,

Does anyone know if we have to remove the isoform numbers from the accession numbers or we can leave it as is? For instance, I had to convert P17677-2 to P17677 in order to be included in the input list. Does the pipeline consider the difference between isoforms at all?

Thanks,

Shahab

Shahab Mirshahvaladi

unread,

Jun 1, 2022, 9:11:26 PM6/1/22

to BigOmics Analytics Team, Omics Playground, Ivo Kwee

Thanks for clarification. The good news is that we can leave the AC numbers with numbers and they are included in the pipeline.

Cheers,

Shahab

On Tue, May 31, 2022 at 10:57 PM BigOmics Analytics Team <te...@bigomics.ch> wrote:

Dear Shahab,

The platform does not support isoform analysis. Having said that, you can use isoform Ids and you should be able to do expression comparison and view them in the dataview tab. But anything related to GSEA or gene UMAP gene clustering, will only be performed at the gene level and if you add them as custom isoform names, they will not be combined into genes.

I'll let Ivo add any extra comments if needed.

Axel

--
You received this message because you are subscribed to the Google Groups "Omics Playground" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omicsplaygrou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/omicsplayground/CANXKuFFUJxBjr%2BAFPiNM8urBbT6mH%3D0u5YSzknd8Kivy-7LbrQ%40mail.gmail.com.

--

Shahab Mirshahvaladi,

Shahab Mirshahvaladi

unread,

Jun 6, 2022, 9:18:21 AM6/6/22

to Omics Playground

Hi there,

Just wondering what would be the best way to compare two (or even more) TMT proteomics data sets, from independent runs? Should we normalize the intensities beforehand? Can we use batch conversion for less bias?

Thanks,

--

Shahab Mirshahvaladi,

Ivo Kwee

unread,

Jun 24, 2022, 11:43:32 AM6/24/22

to Omics Playground

see separate thread

Staffan Holmqvist

unread,

Jan 13, 2023, 8:16:53 AM1/13/23

to Omics Playground

I am currently working on TMT proteomics analysis.

The software does not (to my knowledge!) handle different isoforms. It does state that duplicate isoforms are added to together. on this page of the FAQ.

My TMT data had the peculiarity of having EXACTLY the same value for many of the isoforms (e.g. P17677-2 & P17677). I talked to the technical staff and I think the reason is that they match the datapoints against a libarary. By doing so it registers counts for any sequence/proteinID that match the identified datapoint. In my case this meant that the software added all of those togheter - in effect multiplying the number of counts by the number of isoforms.

Therefore only those isoforms showing unique sets of counts are real.

Splitting my dataset into two.

a) For each isoform, added together the unique full counts for isoforms - representing full counts for a particular protein - all isoforms.

b) I also did an "isoform" analaysis. Here I kept the unique isoform counts. Since the gene names then will be e.g. Slc1a3-2 - pathway analysis wont work, but DEG etc should (see above more accurate answer). This I only used for comparing the abundance of different isoforms between my samples.

Other things

1. My standard TMT analysis automatically excluded all proteins that didnt find counts in every sample. This was obviously a problem since it excluded a gene that I had KO in half the samples.

Therefore I recommend to use the "raw counts" as input to omicsPG. It will normalise within samples as Counts per million (for that sample) CPM.

2. Manually remove identified cRAP genes (Hs contamination) from your dataset.

3. As always make sure proteins that excel/sheets abbreviate to sep-2 or march-5 have their acutal names (Septin2 and marchf5).