Running Picrust2 for large dataset

3 views

Skip to first unread message

Chieh-Chang Chen

unread,

Nov 11, 2025, 7:03:02 PM (4 days ago) Nov 11

to picrust-users

I'm currently running a large dataset through PICRUSt2, but the process has been extremely slow—it's been over two weeks with minimal progress.
The input dataset was filtered to exclude ASVs present in fewer than 0.5% of samples and with fewer than 100 total reads. After filtering, the dataset includes:
- Samples: 4,630
- Features (ASVs): 14,246
- Total read count: 461,615,396
Does anyone have suggestions for optimizing performance or troubleshooting this issue?

The version of Picrust2 is 2.6.2, and currently the memory usage was around 300G

There are warnings:

metagenome_pipeline.py:317: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`

func_abun_subset['taxon'] = func_abun_subset.index.to_list()

Thanks!

Reply all

Reply to author

Forward

0 new messages