Thank you Eric for the detailed answer. I like using the taxonomic profile as an example as i have experience only with 16S and i am new to meta’omics. Regarding the normalization, my approach in my 16s has been always to keep all the unclassifed reads. On average, in most of my samples, 30% of the reads were unclassified. The reason i keep them is to be able to compare studies or batches on a fare ground. I don’t believe that i should compare my study (with 30% unclassified) with someone else’s who had 60% unclassifiable reads.
In this study compare the shift in rumen's function between two groups of animals (4 received yeast vs. 4 who received placebo) using metatranscriptomics.
I tried to use my “approach” by calculating the relative abundance (i.e. 100%) as this approach will be simple and meaningful in comparing the two groups (ANOVA).
Figuring out the following questions will help decide what to do:
-is renorm_table necessary only if you regroup, or does the script also can be used to convert to CPM or relab.
-the way i understand it that renorm-table does two things 1) omit the ungrouped genes to any function and 2) convert the unit from RPK to CPM (RPK per million?) or relativ abundance (total of 1).
-Can i normalize without ommitting any genes (consider using a ungroups group..). I am curious to know who many did not group to a function.
- what is the sequence of events in this case, regroup > rename > renormalize?
- When i collapsed my uniref50 (1,1 million gene families) to uniref_ko (19k), the screen said only 85% were renamed. I checked the file and there were some of the “NO NAME” lines. would this be a large loss of information.
- even when collapsing to KO, it is still very large number of gene families. my plan is to use a cut off point i.e. >XX% in all samples and then present only the statistically significant (ANOVA) between the groups of animals. what would be an acceptable cut-off point?
many many thanks for all the help
OA