Problem with Philosopher/Fragpipe converter (MSstatsTMT)

105 views
Skip to first unread message

Miguel Cosenza

unread,
Jul 21, 2022, 10:55:55 AM7/21/22
to MSstats
Hello MSstats team,

I am working with MSstatsTMT on different TMT datasets after Fragpipe search.

Right now I am facing an error with the converter function (PhilosophertoMSstatsTMTFormat), apparently related to the data.table package.

The log and the error are below.

I am attaching sharing a link with the files I am trying to process, including the annotation file, also with code to reproduce my attempt to convert the files.


I have checked the annotation file against another one that I have used succesfuly with another dataset, so I believe the formating is correct; although I would appreciate your feeback on that front too.

PhilosophertoMSstatsTMTFormat(input = list_mst_vir,
                                                                    annotation = annotation_mst,
                                                                    protein_id_col = "ProteinAccessions",
                                                                    peptide_id_col = "PeptideSequence",
                                                                    Purity_cutoff = 0.5,
                                                                    PeptideProphet_prob_cutoff = 0.7,
                                                                    useUniquePeptide = TRUE,
                                                                    rmPSM_withfewMea_withinRun = FALSE,
                                                                    rmPeptide_OxidationM = FALSE,
                                                                    rmProtein_with1Feature = FALSE,
                                                                    summaryforMultipleRows = sum,
                                                                    use_log_file = TRUE,
                                                                    append = FALSE,
                                                                    verbose = TRUE,
                                                                    log_file_path = NULL)

INFO  [2022-07-21 16:12:59] ** Raw data from Philosopher imported successfully.
INFO  [2022-07-21 16:13:02] ** Using provided annotation.
INFO  [2022-07-21 16:13:02] ** Run and Channel labels were standardized to remove symbols such as '.' or '%'.
INFO  [2022-07-21 16:13:02] ** The following options are used:
  - Features will be defined by the columns: PeptideSequence, PrecursorCharge
  - Shared peptides will be removed.
  - Proteins with single feature will not be removed.
  - Features with less than 3 measurements within each run will be kept.
INFO  [2022-07-21 16:13:03] ** Rows with values not greater than 0.5 in Purity are removed
INFO  [2022-07-21 16:13:03] ** Rows with values not greater than 0.7 in PeptideProphetProbability are removed
INFO  [2022-07-21 16:13:05] ** Features with all missing measurements across channels within each run are removed.
INFO  [2022-07-21 16:13:06] ** Shared peptides are removed.
INFO  [2022-07-21 16:13:10] ** Features with all missing measurements across channels within each run are removed.
INFO  [2022-07-21 16:34:11] ** PSMs have been aggregated to peptide ions.
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  :
  Join results in 1431948 rows; more than 1125550 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.


Mateusz Staniak

unread,
Jul 22, 2022, 6:42:37 PM7/22/22
to MSstats
Hi,

thanks for sharing the data, I will check what's causing the issue as soon as I can


Kind regards
Mateusz
Reply all
Reply to author
Forward
0 new messages