Workflow help

54 views

Skip to first unread message

Morgan Olmstead

unread,

Jun 30, 2017, 4:45:17 PM6/30/17

to Qiime 1 Forum

Hello,

I've been mucking around with the workflow for my samples and keep hitting many roadblocks. I've read through many tutorials and still keep facing many issues. I'm looking at comparing insect microbiomes compared across several metadata types. The machine is a HiSeq and returns 3 files per each sample given.The sequences I receive from the genome center have fastq labelled as such (AFF1G_AAGGAGCGCCTT_L001_R1_001.fastq) with R1, R2, and R3. I believe the 2nd run are barcodes as it is the smallest file out of the triples. My first question is do I need that second file or should I toss it?

Second, I am unsure as to how to format the #SampleID in my mapping file. Do I use the name of the long fastq file or the sampleID name I gave to the genome center initially. Is there a way to check and make sure the names match up (not using the validat mapping file command).

So far my commands have looked like this:
1. validate_mapping_file.py -m new_mapping.txt -o check_id_output/ -p -b -j Description (as I do not know my barcode or linker primer sequence as of right now).

2. multiple_join_paired_ends.py -i 16s_raw -o multiple_joined --read1_indicator '_R1_' --read2_indicator '_R3_'

3. multiple_extract_barcodes.py -i multiple_joined -o no_

4. multiple_split_libraries_fastq.py -i no_barcodes -o split_multiple_reads

5. pick_open_reference_otus.py -m usearch61 -i seqs.fna -o usearch61_picked_otus
Should I run chimera check here?

6. pick_open_reference_otus.py -m uclust -i seqs.fna -o uclust_picked_otus

7. merge_otu_tables.py -i otu_table_mc2.biom,otu_table_mc22.biom -o merged_otu.biom

8. biom summarize-table -i merged_otu.biom -o mc2_summary.txt

this command does something weird and somehow the max and min end up being the same exact number?

9. filter_samples_from_otu_table.py -i merged_otu.biom -o otu_table_not_control.biom -m new_mapping.txt -s 'Description:*,!Control'

Here is where I tried to filter OTU found in the negative control from the other OTUs, but an empty biom file is created and I don't know why.

10.make_phylogeny.py -i $PWD/rep_set_aligned_pfiltered.fasta -o $PWD/rep_phylo.tre

11. single_rarefaction.py -i otu_table_mc2.biom -o rarefactiontable.biom -d 27093

12. alpha_diversity.py -i rarefactiontable.biom -o alpha_div -t rep_set.tre -m
And this is where things just fell apart.

Before I figure out what's going on in the alpha_diversity part, I would like to figure out my mapping file situation and the other aspects mentioned above. When I tried the alpha diversity, it popped up saying I needed to add metadata to the biom file, which I thought the mapping file was supposed to do. Please let me know if you need any information or have any questions. I really appreciate any help given.

TonyWalters

unread,

Jul 2, 2017, 2:52:14 AM7/2/17

to Qiime 1 Forum

Hello Morgan,

There indeed could be something awry with the mapping/sampleIDs.

You want your sampleIDs to be something like AFF1G, right? Is this what you have in your mapping file?

Step 1 and 2 are fine I think.

You may need to filter out the unjoined reads before step. To get rid of the unjoined data (first make sure you have reasonable counts of joined data, e.g. make sure the joined files are much larger than the unjoined files), here's an example Linux command to do so:

find input_dir/ -name "fastqjoin.un*" -print -exec mv {} output_dir/ \;

where input_dir is the folder containing all of the subfolders with your joined reads (i.e. multiple_joined/ above, but depends upon where you are running the command), and output_dir will be where the files will be dumped to (you want to create this folder before running the command). After doing this, you should just have the joined fastq files remaining.

I don't think you need to do step 3, since the barcodes are already in R2.

Step 4 is where you might have hit a snag in the naming of the samples. I'd recommend getting rid of the unjoined data above, and rerunning multiple_split_libraries_fastq.py using the multiple_joined/ folder as input. If you add -w, it will print the command instead of running it, and you can add:

-w > slf_command.txt

to write the command it's generating to a text file. I would make a copy of this text file, and then edit it, and look for the --sample_ids section at the end, and make sure they match the desired SampleIDs in your mapping file. You may have to fix these by hand. Then you can copy the full command from the text file, and run it directly.

See this thread for an example of editing that multiple_split_libraries_fastq.py command:

https://groups.google.com/forum/#!searchin/qiime-forum/sample_ids$20split_libraries_fastq$20text$20editor%7Csort:relevance/qiime-forum/szT7gU8L3To/Z31QEHcWCAAJ

For chimera checking, it may not be necessary since there are some filters built in that likely capture the chimeras (filtering singletons, filtering sequences that do not align to the template 16S alignment). If you do want to do chimera checking, I would follow this approach: http://qiime.org/tutorials/chimera_checking.html#usearch-6-1

which means you would do chimera checking before doing open reference OTU picking with usearch61 (or the download the software vsearch, https://github.com/torognes/vsearch, rename it usearch61, and use it instead if you run into 32-bit memory issues).

Whether you do or skip chimera checking, I'd recommend figuring out if everything is okay once you have completed the open-reference OTU picking. You should be able to get the OTU table with:

biom summarize-table -i OTU-TABLE.biom -o table_stats.txt

and you will be able to see sequence counts per sample and if the sampleiDs look correct.