- Created a "counts" file for each sequence from the emirge output. I calculated "count" by multiplying normPrior of each sequence by the total mapped reads of the sample, then rounded to integer. I guess you could just theoretically use the normPrior value, but I'm not sure how QIIME would deal with fractional sequence counts.
- Mapped these counts to the QIIME sequence IDs generated by split_library_output.py, to create a "counts" file of QIIME sequence ID and EMIRGE "count".
- Modified the make_otu_table.py script with an optional "counts" argument. If a "counts" file is provided, it will add the "count" of each sequence of each sample in the OTU map, rather than just add the number of sequences.
eg files
emirge.fasta
>130|X79495.1.2400 Prior=0.094795 Length=1777 NormPrior=0.080338
emirge.counts
130|X79495.1.2400 1410
seqs.fna
>S1D_1 S1D130|X79495.1.2400 orig_bc= new_bc=,S1D bc_diffs=0
seqs.count
S1D_1 1410
The other thing to note is that I used the -j run_prefix in split_libraries.py, and appended the sample ID to the sequence headers of the EMIRGE fasta. This is an old trick I learned from using Sanger fasta with QIIME.
Dylan Bodington