Demultiplexed 16S MiSeq Data

123 views
Skip to first unread message

ffynn...@gmail.com

unread,
Jun 27, 2017, 5:59:28 AM6/27/17
to Qiime 1 Forum
Hi,

I have some 16S MiSeq data and I'm unsure how to proceed. The samples are demultiplexed (fastq format) and the barcodes have been removed. My plan was to analyse them in the following steps:

1. merge the R1 and R2 folders for each using multiple_join_paired_ends.py

2. merge the fasta from step 1 into 1 file using add_qiime_labels.py

3. pick otus and quality filter using the pick_otus.py with the usearch_qf function (for removing chimeras etc)

4. pick rep set, assign taxonomy, align sequences, etc.

My questions are about the first 3 steps. Is it reasonable to proceed from step 1 through 3 or is there something else that needs to be done? Also, the files that have been created in step 1 are all named the same thing. Is this a problem for merging the files in step 2?

Thanks,

Nia

Colin Brislawn

unread,
Jun 27, 2017, 10:55:28 AM6/27/17
to Qiime 1 Forum
Hello Nia,

Thanks for posting on the forms. I think your method sounds good.

For step 2, you could also consider using this script: 

This script works will with the output of multiple_join_paired_ends.py. For example, you can have samples labeled with the names of the folder, which is helpful when names of the files are all the same.

Let me know what you try. There is lots of ways to get demultiplexing set up.
Colin

ffynn...@gmail.com

unread,
Jun 27, 2017, 11:10:21 AM6/27/17
to Qiime 1 Forum
Hi Colin,

Thanks for the reply! Would you mind just explaining what the main benefit of doing split libraries would be? I've read over several threads and I don't quite understand it. Is it that it produces .fna files ready for downstream analysis?

So, just to clarify if I add the multiple split libraries I will do the following steps:


1. merge the R1 and R2 folders for each using multiple_join_paired_ends.py

2. run multiple split libraries (to produce .fna files)

3. merge the fasta from step 2 into 2 file using add_qiime_labels.py


3. pick otus and quality filter using the pick_otus.py with the usearch_qf function (for removing chimeras etc)

4. pick rep set, assign taxonomy, align sequences, etc.


Thank you!

Nia

Colin Brislawn

unread,
Jun 27, 2017, 11:57:06 AM6/27/17
to Qiime 1 Forum
Hello Nia,

The script split_libraries.py will perform quality filtering on fastq files, removing low quality reads, before it produces a .fna file with valid qiime labels. The script add_qiime_labels.py does not perform filtering. It's a good fit if you have already performed quality filtering using some other method.

As an added bonus, the script multiple_split_libraries.py will do this quality filtering on many input fastq files, including automatic renaming, and this is a perfect fit for the multiple_join_paired_ends.py script. The output will be many .fna files, each with the valid qiime label from the sample it came from.

The process could be 
1. merge the R1 and R2 folders for each using multiple_join_paired_ends.py 
2. run multiple split libraries (to produce .fna files)
3. merge the fasta from step 2 into a single seqs.fna file using the linux 'cat' command.
4. pick otus and quality filter using the pick_otus.py 

So we need step 2 to do quality filtering, and we need step 3 to combine the files. In step 3, we already have labels on them, so we don't need to use add_qiime_labels.py anymore.

Does that help?

Colin

ffynn...@gmail.com

unread,
Jul 4, 2017, 9:51:12 AM7/4/17
to Qiime 1 Forum
Hi Colin,

Thank you for the reply!

I have a slight problem... I stitched the reads using multiple paired end reads which seemed to work OK. I only have 4 samples for this first set of data. This produced 3 fastq files which were stored in 1 folder (1 folder per sample). I then put all sample folders in one directory ready for the multiple split libraries. I've tried running it without any additional options (i.e. without a parameters file for quality filtering which I would like to do) and I get the error message:

"qiime@qiime-190-virtual-box:~$ multiple_split_libraries_fastq.py -i /home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir -o /home/qiime/Desktop/Jing_Samples/4.Multiple_Split_Libraries
Traceback (most recent call last):
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 219, in <module>
    main()
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 216, in main
    close_logger_on_success=True)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError:

*** ERROR RAISED DURING STEP: split_libraries_fastq.py
Command run was:
 split_libraries_fastq.py  -i /home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N15/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N15/fastqjoin.un2.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N14/fastqjoin.un2.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N17/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N14/fastqjoin.un1.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N15/fastqjoin.un1.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N17/fastqjoin.un1.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N17/fastqjoin.un2.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N16/fastqjoin.un1.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N16/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N14/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/2.Merged_Reads/Merged_Reads_Dir/N16/fastqjoin.un2.fastq --sample_ids fastqjoin.join.fastq,fastqjoin.un2.fastq,fastqjoin.un2.fastq,fastqjoin.join.fastq,fastqjoin.un1.fastq,fastqjoin.un1.fastq,fastqjoin.un1.fastq,fastqjoin.un2.fastq,fastqjoin.un1.fastq,fastqjoin.join.fastq,fastqjoin.join.fastq,fastqjoin.un2.fastq -o /home/qiime/Desktop/Jing_Samples/4.Multiple_Split_Libraries  --barcode_type 'not-barcoded'
Command returned exit status: 1
Stdout:

Stderr
Traceback (most recent call last):
  File "/usr/local/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/usr/local/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 239, in process_fastq_single_end_read_file_no_barcode
    phred_offset=phred_offset):
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/usr/local/lib/python2.7/dist-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M01867:38:000000000-B4YR5:1:1101:18624:1923. This may be because you passed an incorrect value for phred_offset."

Could you advise on what to do with the samples? I would also like to add a quality filtering step (Q20) in the parameters file (split_libraries_fastq:phred_quality_threshold 19 in a text file) and I also want to change the name of the files so that. I also, have another question - from the three files generated in the multiple paired end reads step, is it possible to select only one of those files for the mutliple split libraries step? i.e. the .join file not the .un files (for R1 and R2)?

thanks :-)

Nia

 


Colin Brislawn

unread,
Jul 4, 2017, 5:06:31 PM7/4/17
to Qiime 1 Forum
Hello Nia,

Lots of stuff to do here! Let's start at the start:

This produced 3 fastq files which were stored in 1 folder (1 folder per sample). 
Good! These files are the joined reads, and two .un files, which are the unjoined reads from R1 and R2.

I then put all sample folders in one directory ready for the multiple split libraries.
Good. You may want to delete these unjoined reads before running the next step. You can do that with a linux command like

rm -f folder*/*un*

In this example, folder* will match your folder names and *un* will match the file names you want to remove. 
You asked about selecting only the *join* files, and removing the *un* files is a great way to accomplish that. 

You can pass a parameter file to the multiple_split_libraries.py script, as shown here: 

Changing the names is a bit harder, but also possible. 

I think all these changes may also solve your FastqParseError too, but let's wait and see.

Let me know what you find,
Colin

ffynn...@gmail.com

unread,
Jul 5, 2017, 3:16:28 AM7/5/17
to Qiime 1 Forum
Hi Colin,

I deleted the unjoined files as suggested so now I have the joined files in a folder each and then I put all sample folders into one directory. I tried running the script again but got an error message. I tried with and without passing the parameter file in case that was the problem. I still get an error message. Any ideas?

WITH PARAMETERS FILE:

qiime@qiime-190-virtual-box:~$ multiple_split_libraries_fastq.py -i '/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only' -o '/home/qiime/Desktop/Jing_Samples/5.Multiple_Split_Libraries' -p '/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/qiime_parameters.txt'
Traceback (most recent call last):
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 219, in <module>
    main()
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 216, in main
    close_logger_on_success=True)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError:

*** ERROR RAISED DURING STEP: split_libraries_fastq.py
Command run was:
 split_libraries_fastq.py --phred_quality_threshold 19 -i /home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N15/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N17/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N14/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N16/fastqjoin.join.fastq --sample_ids fastqjoin.join.fastq,fastqjoin.join.fastq,fastqjoin.join.fastq,fastqjoin.join.fastq -o /home/qiime/Desktop/Jing_Samples/5.Multiple_Split_Libraries  --barcode_type 'not-barcoded'

Command returned exit status: 1
Stdout:

Stderr
Traceback (most recent call last):
  File "/usr/local/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/usr/local/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 239, in process_fastq_single_end_read_file_no_barcode
    phred_offset=phred_offset):
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/usr/local/lib/python2.7/dist-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: M01867:38:000000000-B4YR5:1:1101:18624:1923. This may be because you passed an incorrect value for phred_offset.

WITHOUT PARAMETERS FILE:

qiime@qiime-190-virtual-box:~$ multiple_split_libraries_fastq.py -i '/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only' -o '/home/qiime/Desktop/Jing_Samples/5.Multiple_Split_Libraries'

Traceback (most recent call last):
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 219, in <module>
    main()
  File "/usr/local/bin/multiple_split_libraries_fastq.py", line 216, in main
    close_logger_on_success=True)
  File "/usr/local/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError:

*** ERROR RAISED DURING STEP: split_libraries_fastq.py
Command run was:
 split_libraries_fastq.py  -i /home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N15/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N17/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N14/fastqjoin.join.fastq,/home/qiime/Desktop/Jing_Samples/4.Joined_Reads_Joined_Only/Joined_Reads_Only_Dir/N16/fastqjoin.join.fastq --sample_ids fastqjoin.join.fastq,fastqjoin.join.fastq,fastqjoin.join.fastq,fastqjoin.join.fastq -o /home/qiime/Desktop/Jing_Samples/5.Multiple_Split_Libraries  --barcode_type 'not-barcoded'

ffynn...@gmail.com

unread,
Jul 5, 2017, 7:11:11 AM7/5/17
to Qiime 1 Forum
Also, this is an example of the header of the sequences in the sample files: @M01867:38:000000000-B4YR5:1:1101:18624:1923

Could this be causing a problem in the analysis?

Thanks,

Nia

Colin Brislawn

unread,
Jul 5, 2017, 2:16:25 PM7/5/17
to Qiime 1 Forum, William Walters
Hello Nia,

Great to hear from you. Thanks for trying it with and without the parameter file.

Let's play with the settings a bit. First, I noticed this in the log files:
--sample_ids fastqjoin.join.fastq, fastqjoin.join.fastq,fastqjoin.join.fastq,....

These all have the same sample_ids because your files have the same names. Because your folders are named after your samples, try passing --include_input_dir_path and also --remove_filepath_in_name in your command. This should add better sample_IDs to your reads.

Now let's look at that error:
This may be because you passed an incorrect value for phred_offset.
I'm a little surprised to see this error; Qiime should detect this automatically. Let me cc another developer who may know more.

Colin

ffynn...@gmail.com

unread,
Jul 6, 2017, 9:18:42 AM7/6/17
to Qiime 1 Forum, william....@gmail.com
Hi Colin,

Thanks for the reply. I turns out that I am in the same university as Tony Walters and asked for his advice. I ran this suggested script:



and my samples worked. It seems that the phred offset 33 is the problem though I don't know why?

Best, Nia
Auto Generated Inline Image 1

Colin Brislawn

unread,
Jul 6, 2017, 10:54:27 AM7/6/17
to Qiime 1 Forum
Hello Nia,

Glad you got it working! IDK why qiime would pick the wrong phred offset, but it's working now.

Tony's great! I'm glad you get to work with him.

Colin

ffynn...@gmail.com

unread,
Jul 7, 2017, 4:12:48 AM7/7/17
to Qiime 1 Forum
Thanks for all your help :-)
Reply all
Reply to author
Forward
0 new messages