I’m analyzing 515 LC-MS/MS files (FragPipe → MSstats). Below is the current workflow for dataProcess, diagnostic plots, and protein–sample matrix generation:
Unfortunately, processing is slow and frequently stalls at:
Could you advise on a more efficient way to code this to improve throughput?
Regards,
Ben
Hi Devon,
Thank you for your quick response. Unfortunately, I’m encountering issues with MSstatsBig when using the Arrow backend. Below is the relevant code snippet and error message:
# ===========================
# MSstatsBig
# ===========================
formattedData <- bigFragPipetoMSstatsFormat(
input_file = raw_file,
output_file_name = NULL, # ← no CSV writing
backend = "arrow",
max_feature_count = TOP_N,
filter_unique_peptides = FALSE,
aggregate_psms = FALSE,
filter_few_obs = FALSE
) %>%
dplyr::collect() # materialize to a data.frame for MSstats
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Error:
Error in arrow::write_csv_arrow(): ! x must be an object of class 'data.frame', 'RecordBatch', 'Dataset', 'Table', or 'RecordBatchReader' not 'arrow_dplyr_query'. Run rlang::last_trace() to see where the error occurred.
Could you please advise on how to resolve this?
Regards,
Ben