Exporting EMIRGE data to a QIIME usable format

225 views
Skip to first unread message

blake...@gmail.com

unread,
Dec 5, 2013, 7:31:31 PM12/5/13
to emirge...@googlegroups.com
I was wondering if it was possible to concatenate/process the output from EMIRGE runs on multiple samples into something QIIME would digest to run the various diversity analyses, make pretty figures, etc. etc. that I'm used to with 16S data. I was hoping there was some way to do this and preserve the abundance data akin to something post pick_rep_set.py

Thank you!

Chris Miller

unread,
Dec 14, 2013, 5:11:02 PM12/14/13
to emirge...@googlegroups.com
Hi,

There is nothing available to do this at the moment.  We have been working on more general code that would do this, but it is not yet tested sufficiently.  I will be sure to post back to this thread when we get this done and up on github.

Thanks,
Chris

massac...@gmail.com

unread,
Mar 4, 2016, 6:32:26 PM3/4/16
to EMIRGE users
Hello there,

Any news on this?

Thanks!
André

Dylan Bodington

unread,
May 30, 2016, 3:15:34 AM5/30/16
to EMIRGE users, massac...@gmail.com
This was my fairly hacky way of doing it:

  1. Created a "counts" file for each sequence from the emirge output. I calculated "count" by multiplying normPrior of each sequence by the total mapped reads of the sample, then rounded to integer. I guess you could just theoretically use the normPrior value, but I'm not sure how QIIME would deal with fractional sequence counts.
  2. Mapped these counts to the QIIME sequence IDs generated by split_library_output.py, to create a "counts" file of QIIME sequence ID and EMIRGE "count".
  3. Modified the make_otu_table.py script with an optional "counts" argument. If a "counts" file is provided, it will add the "count" of each sequence of each sample in the OTU map, rather than just add the number of sequences.
eg files

emirge.fasta
>130|X79495.1.2400 Prior=0.094795 Length=1777 NormPrior=0.080338

emirge.counts
130|X79495.1.2400      1410

seqs.fna
>S1D_1 S1D130|X79495.1.2400 orig_bc= new_bc=,S1D bc_diffs=0

seqs.count
S1D_1 1410

The other thing to note is that I used the -j run_prefix in split_libraries.py, and appended the sample ID to the sequence headers of the EMIRGE fasta. This is an old trick I learned from using Sanger fasta with QIIME.

Dylan Bodington

Dylan Bodington

unread,
May 30, 2016, 4:06:28 AM5/30/16
to EMIRGE users, massac...@gmail.com
If anyone is interested in using EMIRGE with QIIME, I've posted my modded scripts.

On Monday, 30 May 2016 16:15:34 UTC+9, Dylan Bodington wrote:
This was my fairly hacky way of doing it:

  1. Created a "counts" file for each sequence from the emirge output. I calculated "count" by multiplying normPrior of each sequence by the total mapped reads of the sample, then rounded to integer. I guess you could just theoretically use the normPrior value, but I'm not sure how QIIME would deal with fractional sequence counts.
  2. Mapped these counts to the QIIME sequence IDs generated by split_library_output.py, to create a "counts" file of QIIME sequence ID and EMIRGE "count".
  3. Modified the make_otu_table.py script with an optional "counts" argument. If a "counts" file is provided, it will add the "count" of each sequence of each sample in the OTU map, rather than just add the number of sequences.
eg files

emirge.fasta
>130|X79495.1.2400 Prior=0.094795 Length=1777 NormPrior=0.080338

emirge.counts
130|X79495.1.2400      1410

seqs.fna
>S1D_1 S1D130|X79495.1.2400 orig_bc= new_bc=,S1D bc_diffs=0

seqs.count
S1D_1 1410

The other thing to note is that I used the -j run_prefix in split_libraries.py, and appended the sample ID to the sequence headers of the EMIRGE fasta. This is an old trick I learned from using Sanger fasta with QIIME.
Reply all
Reply to author
Forward
0 new messages