I'm currently running a large dataset through PICRUSt2, but the process has been extremely slow—it's been over two weeks with minimal progress.
The input dataset was filtered to exclude ASVs present in fewer than 0.5% of samples and with fewer than 100 total reads. After filtering, the dataset includes:
- Samples: 4,630
- Features (ASVs): 14,246
- Total read count: 461,615,396
Does anyone have suggestions for optimizing performance or troubleshooting this issue?
The version of Picrust2 is 2.6.2, and currently the memory usage was around 300G
There are warnings:
metagenome_pipeline.py:317: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
func_abun_subset['taxon'] = func_abun_subset.index.to_list()
Thanks!