Hi Thomas,
I've been meaning to write an entry in the HUMAnN2 manual on downstream analysis, so your email gives me a chance to draft out some of those ideas!
* As you found, there are _a lot_ of gene families in the output -- too many to test en masse, probably even with aggressive filtering. The full table is provided for the sake of being comprehensive and to help with grouping genes into different pathway systems (e.g. metacyc or uniprot's own pathway definitions). You can also use the gene table for strain-level analysis (as described in the manual) or to pull out specific genes of interest for testing. Most folks will want to do their testing at the pathway level however.
* Even after grouping uniref gene families into reactions or pathways, the stratified output can still be quite large if your community contained a lot of different species. For this reason, we recommend doing your testing on the community totals, and then digging into the stratified output to understand the mechanisms underlying significant functional changes. For example, given that the abundance of pathway A changed significantly, was it because a particular A-contributing species expanded, or because new A-contributing species appeared (including potential "unclassified" species)? The only problem with this approach is that it fails to capture situations where community-level function was constant but individual species' contributions changed. However, I would argue that such situations are better described as a change in community composition, which one can assay directly by analyzing the MetaPhlAn profiles from the samples.
* After collapsing to pathways and considering community totals, you may still have a lot of pathways to analyze. In this case, we recommend prioritizing based on a combination of pathway (i) mean/median abundance and (ii) variance. For example, you might first select pathways with median abundance in the top ~50% to exclude rare pathways, and then among those select pathways with variance in the top ~25% to exclude housekeeping functions. Other selection schemes with a similar spirit (removing rare/invariant pathways) would also be fair game. The goal here is mostly to maintain statistical power, so if you have more samples you can be less stringent with your filtering.
In the future we can add in another utility script to help with these trimming procedures.