Hi there,
I have my proteomics dataset ready for statistical analysis, and I'm currently exploring different software options to see which works best for my dataset. During this process, I came across gene-set enrichment analysis software. My species of interest is Atlantic salmon, which is considered a non-model organism compared to humans and mice.
I'm aware that GSEA offers a wealth of gene sets for humans and mice, but unfortunately, I couldn't find any specific to salmon on the different platforms I searched.
I have a few questions:
1) Since my data is from proteomics and uses protein accession numbers, I believe the software may not recognize them directly. So, I attempted to map my 6000 protein IDs to Ensembl IDs using Biomart. However, I could only map 2500 of them. I'm uncertain if this approach is correct, especially considering the loss of potentially meaningful information.
2) My main challenge lies in preparing gene sets for my data. As far as I understand, creating gene sets involves gathering enough information about genes contributing to specific biological pathways in the organism of interest. For example, if I'm comparing the oxidative phosphorylation pathway between two sample time points, I would need to compile a gene set related to this process based on previous literature evidence. Could you confirm if my understanding is accurate?
I even tried to use gene sets from closely related species such as zebrafish, unfortunately, I couldn't retrieve the gene sets from any platform. Do you have any recommendations about what platforms I could search for the gene sets?
Given the limited number of Ensembl IDs in my expression dataset and the difficulty in preparing gene sets for my research model, what would you suggest as the best approach for my situation?
I'm looking forward to your response
Thanks in advance