Running Picrust2 for large dataset

3 views
Skip to first unread message

Chieh-Chang Chen

unread,
Nov 11, 2025, 7:03:02 PM (4 days ago) Nov 11
to picrust-users
I'm currently running a large dataset through PICRUSt2, but the process has been extremely slow—it's been over two weeks with minimal progress.
The input dataset was filtered to exclude ASVs present in fewer than 0.5% of samples and with fewer than 100 total reads. After filtering, the dataset includes:
- Samples: 4,630
- Features (ASVs): 14,246
- Total read count: 461,615,396
Does anyone have suggestions for optimizing performance or troubleshooting this issue?

The version of Picrust2 is 2.6.2, and currently the memory usage was around 300G
There are warnings:
metagenome_pipeline.py:317: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  func_abun_subset['taxon'] = func_abun_subset.index.to_list()

Thanks!
Reply all
Reply to author
Forward
0 new messages