Hi Robert,
I’m also working with the human gut microbiota, and I concur with Meren: co-assembly across subjects is problematic. There’s simply too much interpersonal variability, at all levels ranging from the proportions of major phyla to strain-level
variation within the most prevalent gut species. Feeding more data into an assembly is only helpful if the additional reads are covering the same genomes, and it’s far from clear that unrelated human adults share many of the same genomes. (‘Same’ here is
defined operationally by the stringency of your assembly algorithm for overlapping reads…’same genome’ is a lot more stringent than belonging to a prevalent gut species, as assessed by 16S studies looking for a ‘core human gut microbiota’.) Especially because
the ‘same’ genome would have to be not just present, but reasonably abundant in multiple samples to be the source of many additional reads.
You could consider an approach like the original MetaHIT paper (Qin 2010) and do the within-subject assembly first, then merge unassembled reads across subjects for another round of attempted assembly…but I haven’t had to really think through
this issue myself, and the MetaHIT paper was just trying to get long enough contigs from 75-base GA-II reads to reliably call genes, and then just analyzed the gene catalog. They weren’t aiming for MAGs.
Depending on the depth of your sequencing per sample (per subject), you might not have much success getting MAGs for each subject. I remember from the first Banfield lab forays into assembly-based metagenomics of complex samples (contaminated
aquifer in their case), they were shooting for ~30 GB of sequencing from the community to be able to get partial MAGs from dozens of microbial strains. Think about that ratio…roughly a factor of 1000 between raw sequencing depth within one complex community
and summed length of things large enough to reasonably called MAGs, as opposed to just contigs.
If you don’t have (and can’t get) the depth to assemble much within your individual subjects, you might consider trying to map your reads to the updated human gut gene catalog. Li et al. 2014 Nature Biotechnology is a direct descendent of the
original MetaHIT catalog, and would likely be both easier to work with and more appropriate than a general reference database such as UniRef or NCBI genomes.