MED for fungi

Rachel Adams

unread,

Mar 15, 2016, 4:00:07 PM3/15/16

to Oligotyping and MED

Hi - I'm looking to analyze an Illumina fungal data - the ITS1 region - using MED. I am trying to adapt best practices for fungi. Given that the ITS region is variable across taxa and that the sequence (in my case, 250bp) can be longer than the amplicon, I will have variable sequence length that is real. But, aligning ITS is known to be tricky - and its debatable how valid the ITS alignment would be. I'm thinking the best option would be to trim sequences to a common length (excluding those that are shorter) before running "decompose." Any thoughts would be greatly appreciated. Thanks.

A. Murat Eren

unread,

Mar 15, 2016, 4:07:40 PM3/15/16

to Oligotyping and MED

Hi,

No alignment is necessary for Illumina data. I guess you don't have paired-end reads. In that case you can trim your _quality_filtered_ reads to a reasonable length to get rid of terribly low-quality ends, but you don't need to remove shorter reads (since they also represent valid organisms since length variation is due to biology). I would simply try o-pad-with-gaps script to get rid of the remaining character length variation. There is lots of information in this article, and although it talks about oligotyping, everything there is also valid for MED:

http://merenlab.org/2014/09/14/oligotyping-and-alignment/

Best,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

On Tue, Mar 15, 2016 at 3:00 PM, Rachel Adams <tuffr...@gmail.com> wrote:

Hi - I'm looking to analyze an Illumina fungal data - the ITS1 region - using MED. I am trying to adapt best practices for fungi. Given that the ITS region is variable across taxa and that the sequence (in my case, 250bp) can be longer than the amplicon, I will have variable sequence length that is real. But, aligning ITS is known to be tricky - and its debatable how valid the ITS alignment would be. I'm thinking the best option would be to trim sequences to a common length (excluding those that are shorter) before running "decompose." Any thoughts would be greatly appreciated. Thanks.

--
You received this message because you are subscribed to the Google Groups "Oligotyping and MED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oligotyping...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oligotyping/cbdf1a2f-a4a4-41eb-8050-86d02abcb25c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rachel Adams

unread,

Mar 15, 2016, 4:23:02 PM3/15/16

to Oligotyping and MED

Hi - I do have paired reads, but for the sake of this analysis I was thinking of just using the R1 reads. In adapting the "best practices" list for fungi and MED - please comment.

Get all reads for all samples in a project into one FASTA file, and format the deflines to meet the standard explained here.
Quality filter your reads (using your preferred method?)
Use RDP or GAST to assign taxonomy to each read individually.
If you are working with Illumina HiSeq or MiSeq reads, skip the alignment step, but run this script on your FASTA file to accommodate for any length variation. ->> This link is broken. What does that script do?
Remove common gaps from the resulting alignment file using this script. ->> This link is also broken. Is this the o-pad-with-gaps script?
Run decompose on the fasta file.

Thanks again.

A. Murat Eren

unread,

Mar 15, 2016, 4:49:45 PM3/15/16

to Oligotyping and MED

I fixed that post you referenced, but I will not be able to adapt it to fungi and MED since I never worked with fungal ITS data :(

If your reads partially overlap, you should consider merging them instead of using only R1. Merging step will improve the quality dramatically. There are many ways to do quality filtering (and merging, too). I usually use this library here because I am most familiar with it, but I would strongly suggest you to explore other alternatives as well:

https://github.com/meren/illumina-utils

If you go with illumina-utils, here are some more suggestions;

If your reads are partially overlapping, you can use iu-merge-pairs, and work with the resulting FASTA file.

If your reads are not partially overlapping (or if you are not interested in that option), you can use iu-filter-quality-minoche to quality filter your paired-end reads, and then use only R1 for each sample. If you go with iu-filter-quality-minoche, I would suggest you to still trim the last 50 bases from resulting quality filtered reads anyway (you can use 'iu-trim-fastq -t 200' for that (for other parameters please see iu-trim-fastq -h)). Then you will need to convert each trimmed R1 FASTQ into FASTA files (iu-fastq-to-fasta), and merge all individual FASTA files into an MED compatible single FASTA file (this step will be a bit painful and will require some scripting, I apologize for that in advance, but if you come all the way down here and stuck, let me know and I can try to help with this step).

I hope this helps.

Best,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

--

You received this message because you are subscribed to the Google Groups "Oligotyping and MED" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oligotyping...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/oligotyping/6371d851-9e6a-45a1-ac18-19065142aa5d%40googlegroups.com.

clout...@appstate.edu

unread,

Aug 4, 2017, 10:57:07 AM8/4/17

to Oligotyping and MED

Hi Rachel,

Were you able to perform oligotyping with fungal sequences? If so, have you been able to publish any of your findings yet? I would love to see your pipeline.

-Mara

Reply all

Reply to author

Forward