Problem with Philosopher/Fragpipe converter (MSstatsTMT)

105 views

Skip to first unread message

Miguel Cosenza

unread,

Jul 21, 2022, 10:55:55 AM7/21/22

to MSstats

Hello MSstats team,

I am working with MSstatsTMT on different TMT datasets after Fragpipe search.

Right now I am facing an error with the converter function (PhilosophertoMSstatsTMTFormat), apparently related to the data.table package.

The log and the error are below.

I am attaching sharing a link with the files I am trying to process, including the annotation file, also with code to reproduce my attempt to convert the files.

https://1drv.ms/u/s!ArJgSkDejwROqe9vcZSRBpvrwbCsCg?e=zOy6ah

I have checked the annotation file against another one that I have used succesfuly with another dataset, so I believe the formating is correct; although I would appreciate your feeback on that front too.

PhilosophertoMSstatsTMTFormat(input = list_mst_vir,
annotation = annotation_mst,
protein_id_col = "ProteinAccessions",
peptide_id_col = "PeptideSequence",
Purity_cutoff = 0.5,
PeptideProphet_prob_cutoff = 0.7,
useUniquePeptide = TRUE,
rmPSM_withfewMea_withinRun = FALSE,
rmPeptide_OxidationM = FALSE,
rmProtein_with1Feature = FALSE,
summaryforMultipleRows = sum,
use_log_file = TRUE,
append = FALSE,
verbose = TRUE,
log_file_path = NULL)

INFO [2022-07-21 16:12:59] ** Raw data from Philosopher imported successfully.
INFO [2022-07-21 16:13:02] ** Using provided annotation.
INFO [2022-07-21 16:13:02] ** Run and Channel labels were standardized to remove symbols such as '.' or '%'.
INFO [2022-07-21 16:13:02] ** The following options are used:
- Features will be defined by the columns: PeptideSequence, PrecursorCharge
- Shared peptides will be removed.
- Proteins with single feature will not be removed.
- Features with less than 3 measurements within each run will be kept.
INFO [2022-07-21 16:13:03] ** Rows with values not greater than 0.5 in Purity are removed
INFO [2022-07-21 16:13:03] ** Rows with values not greater than 0.7 in PeptideProphetProbability are removed
INFO [2022-07-21 16:13:05] ** Features with all missing measurements across channels within each run are removed.
INFO [2022-07-21 16:13:06] ** Shared peptides are removed.
INFO [2022-07-21 16:13:10] ** Features with all missing measurements across channels within each run are removed.
INFO [2022-07-21 16:34:11] ** PSMs have been aggregated to peptide ions.
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in 1431948 rows; more than 1125550 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

Mateusz Staniak

unread,

Jul 22, 2022, 6:42:37 PM7/22/22

to MSstats

Hi,

thanks for sharing the data, I will check what's causing the issue as soon as I can

Kind regards

Mateusz

Reply all

Reply to author

Forward

0 new messages