merging PSM reports

20 views
Skip to first unread message

Ruben Bakker

unread,
Nov 19, 2021, 5:52:03 AM11/19/21
to PeptideShaker
Hi PeptideShaker group, 

I have a large dataset of 36 samples. It seems that with a smaller subset I can export a PSM reports just fine. Downstream I would like to use MoFF. I was wondering how my analysis would be impacted when I cut up the dataset in two sets, export two PSM reports and simply merge them. 

Search GUI has been run and identified the peptides over all the data and MoFF would use matches-between-runs for peptide quantification. Is there any bias I will include in my analysis if I were merge PSM reports instead of a single export? 

Also on usegalaxy.eu I run into problems so I wish to do it on a local computer now. The problem occurs after loading in the data both from the terminal and GUI on both Ubuntu 20.04 LTS and Windows 10. Especially on Ubuntu I can export the data as PSM reports just fine when working with half the samples.

Kind regards, 
Ruben 

Harald Barsnes

unread,
Nov 26, 2021, 6:25:04 AM11/26/21
to PeptideShaker
Hi Ruben,

Splitting up the processing of the samples will make a (most likely) minor impact on the results, in particular the scoring of the PSMs and the FDR calculations. How big this effect will be depends on the properties of the data, mainly on how different the samples are. They impact may also differ between search engines. However, the impact should be lowest at the PSM level (compared to the peptide and protein level). Hence if you later merge the results at this level I think you should be ok.

Note that when it comes to the quantification in moFF this is done at the individual PSM level, so there you should not see any direct impact due to the splitting/merging. But of course there may be a difference in terms of which PSMs are scored high enough to be quantified in the first place.

I'm not sure how much any of this has been tested in detail though, hence I cannot give any guarantees that there won't be any changes. I would recommend that you give it a go on a smaller subset of your data (one that you can process in one go) and compare the differences. Based on this you can then decide whether to continue with this approach or not. I'd be very interested in learning what you find out, so please share your findings here?

Regarding the problems with processing the data in the first place, what kind of issues are you seeing? And I assume that you have already attempted increase the amount of memory given to PeptideShaker?

Best regards,
Harald
Reply all
Reply to author
Forward
0 new messages