join_paired_ends output fastq is broken

Masha T

unread,

Jan 10, 2017, 10:40:33 AM1/10/17

to Qiime 1 Forum

Hello, I have PE250 MiSeq data with primers 505/806 that has been demultiplexed and the barcodes are in the header. When I run join_paired_ends.py, the fastq.join.fastq is the largest file so I thought it worked, but looking more closely at the fastq.join.fastq file, it seems broken. The beginning of the file looks like it's in the correct format, but by the end of the file there are extra "+" separating teh sequences and the quality scores. This is the command I ran: MacQIIME Mashas-MacBook-Pro:290_joined $ join_paired_ends.py -f MI.M03555_0163.001.FLD0290.SCI017690_SLE16_R1.fastq.gz -r MI.M03555_0163.001

.FLD0290.SCI017690_SLE16_R2.fastq.gz -o 290_joined

This is the beginning of the fastq.join.fastq

+

ABBB@BBFFBBBGGGGGGGGGGHHEGGGGAGHHHEGGGGGEHHHGGFGGHHHHHHHFFGGFFGGGGGGGGGGGGFFHGGFGHHGHHHGGHHHHHHHHGHHHHHGGGGGGHHHHHHHHHGHGHHHHHHHHHHHHHHHHHHHH

HHHHGHHHHGHHGHGGGGHHHHHGGGHHHHHGGGGGGHHHHHHFHFFHHHGHGHGHGHHGHHFGGGFHGGGGHHHHHHHHHHHHHHHGGHHFHHHHHHHHGHHHHHGGFFAHHHHGEGGHHHHHFHHHHHHGGGDGGGGFE

BBFDFFFBAA>3

@M03555:163:000000000-ATTLR:1:2105:16042:1617 1:N:0:TCTAGCGTGG

CTAGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCT

TGAGTGCAGTTGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCCTGCTAAGCTGCAACTGACATTGAGGCTCGAAAGTGTGGGTATCAAACAGGATTAGAT

ACCCCAGTAGTCCA

+

>AAAAFFFFFBFGGGGGGGGGGFBGCF?EEGGFFHCEEAEG?GBGGG1EFHFFFGFFFFEFFFHHGH1EFEFFBFGGHGBBFFGFFFGEGGHHHHHGBGAGFHHEGGCGG<FFFHGDGFGGHGGHHFGFG1?GH1?GF>DH

GGHFGH1GHFFHHHGHGGDFG?GHECCEGGD1GGGEGHGDHHHHGFHHBGFGFFB0EHHFHHFEF>E9EGGEGHHHHGFCGGHHHFBFFHHGFDFGDHGHFHGHHGFACGAEEHHHEFCHHHHCHHHHFBBGHGFG3G1FE

?GAFFFFCDA>>1>

@M03555:163:000000000-ATTLR:1:2105:13480:1620 1:N:0:TCTAGCGTGG

ACGTGCCAGCCGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGGTTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAATTGATACTGGCAGTCTT

GAGTACAGTTGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTACTAACCTGTAACTGACATTGATGCTCGAAAGTGTGGGTATCAAACAGGATTAGATA

CCCCAGTAGTCA

But the end of the fastq file looks like this:

@M03555:163:000000000-ATTLR:1:1109:20044:28413 1:N:0:TCTAGCGTGG

+

GTGCCAGCAGCCGCGGTAATCCGGAGGCTCCGAGCGTTATCCGGATTTAGTGGGTTTAAAGGGAGCGTAGATGGATTTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGTATATCTTGAGTGCAGTTGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGAGAAGGCAGCCTGCTAAGCTGCAACTGACATTGAGGCTCGAAAGTGTGGGTATCAAACAGGATTAGATCCCCTTGTAGTCC

+

^@A?ABFBAFFFBGCGCECGGGGE2AEG2AFF2E0ECGGAGHHGFFEEH55EGFFEFGHHEHHGC>EEGFG;?BFHFGHFHHHDHHHHFGEGFFGBGFFGH2</<??3BBD>FCFHHHGBGGHFGHH11<<G1FFHHHHFFFGHFHHHHHGGGFFEGABD>HF<CEGGGHFE/E>CFDFHHGG@FB22FFGFGFFB2HHGFBF>BBFG1DDB1F@@GDBFBDFBBFGF0AEGBBB;DGFFD;C9HHGCF/EGA1BFEEGGG3GEFBHGGBHGAB13GA1A11>CDFB1AA>11

@M03555:163:000000000-ATTLR:1:1109:20050:28431 1:N:0:TCTAGCGTGG

+

GTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTTGAGTGCAGTTGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCCTGCTAAGCTGCAACTGACATTGAGGCTCGAAAGTGTGGGTATCAAACAGGATTAGATACCCTTGGAGTCC

+

^@AABBFFBFFFBGGGGGCFG5FGGGGG2BFFEEFGGGGGHGHGFEFGHFGGHHHFGGHHHHHGFGGGGGGAFFHHE;FF?BGHHHFHFHHHGHGFHHB?GA<0GBAFGFFGEG?FF@GHHGCGHHHDGHHHHHHGHHHFFHHHFHGHHHHHGHGGGG?DGCCEHFEFHGEEGFGG2HBHG>1CFF1CF1FFGF0;CG@FBHEFFFFD>/DE/B;1AFEAEFEFFFFF01BFFFD;9;FGFFBBGBABC0AFB1GGECFGF3FDGFGFF1A31313GEF11A1B@B111>111

I believe this is causing the problems I am having in split_libraries, so any help would be appreciated, thanks!

Masha

Stefan Janssen

unread,

Jan 10, 2017, 12:02:00 PM1/10/17

to Qiime 1 Forum

Hi Masha,
hard to check formatting issues without having those files in my hand. How big are they? Do you mind send them over to me via e.g. dropbox? If yes, could you mail the link to sjan...@ucsd.edu ?

Masha T

unread,

Jan 10, 2017, 2:16:46 PM1/10/17

to Qiime 1 Forum

Shared it with you via dropbox, thanks!

I've shared the dropbox folder with you with the original R1 and R2 files, and my output folder from join paired ends, along with the split libraries output, in case you're interested what the downstream error looked like.

thanks again!

Stefan Janssen

unread,

Jan 10, 2017, 4:51:55 PM1/10/17

to Qiime 1 Forum

Hi Masha,

I downloaded your files and re-run join_paired_ends.py. My results differ from what you uploaded. In fact, my results do not contain those additional lines with + symbols in it. Could you please double check which version of fastq-join you are using: fastq-join --help
I have version 1.3.1.

I uploaded my results to the Dropbox. I cannot test split_libraries, since I miss the metadata file containing the barcodes. But please try to run split_libraries.py on my results and see if the error is reproducible.

Best,
Stefan

Masha T

unread,

Jan 14, 2017, 2:34:24 PM1/14/17

to Qiime 1 Forum

Hi Stefan,

Thanks so much for your help, but unfortunately it's still a problem. I had version 1.2.1 so I had updated it to 1.3.1 and re-ran the command, but I still get the same problem of the additional lines with the pluses. I tried extracting barcodes and running split libraries and both worked on your joined file. I'm at a loss of what could be causing this. I have the latest version of Qiime, 1.9.1, here's what I get for print_qiime_config.py :

System information

==================

Platform: darwin

Python version: 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

Python executable: /Users/mashataguer/miniconda2/envs/qiime-github/bin/python

QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3

Dependency versions

===================

QIIME library version: 1.9.1-dev

QIIME script version: 1.9.1-dev

qiime-default-reference version: 0.1.3

NumPy version: 1.11.3

SciPy version: 0.16.0

pandas version: 0.16.2

matplotlib version: 1.4.3

biom-format version: 2.1.5

h5py version: Not installed.

qcli version: 0.1.1

pyqi version: 0.3.2

scikit-bio version: 0.2.3

PyNAST version: 1.2.2

Emperor version: 0.9.60

burrito version: 0.9.1

burrito-fillings version: 0.1.1

sortmerna version: SortMeRNA version 2.0, 29/11/2014

sumaclust version: Not installed.

swarm version: Swarm 1.2.19 [Jan 14 2017 14:02:16]

gdata: Installed.

QIIME config values

===================

For definitions of these settings and to learn how to configure QIIME, see here:

http://qiime.org/install/qiime_config.html

http://qiime.org/tutorials/parallel_qiime.html

blastmat_dir: None

pick_otus_reference_seqs_fp: /Users/mashataguer/miniconda2/envs/qiime-github/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

sc_queue: all.q

topiaryexplorer_project_dir: None

pynast_template_alignment_fp: /Users/mashataguer/miniconda2/envs/qiime-github/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta

cluster_jobs_fp: start_parallel_jobs.py

pynast_template_alignment_blastdb: None

assign_taxonomy_reference_seqs_fp: /Users/mashataguer/miniconda2/envs/qiime-github/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

torque_queue: friendlyq

jobs_to_start: 1

slurm_time: None

denoiser_min_per_core: 50

assign_taxonomy_id_to_taxonomy_fp: /Users/mashataguer/miniconda2/envs/qiime-github/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

temp_dir: /var/folders/x6/sl6dpg0156q050yy_wm4rjb00000gn/T/

slurm_memory: None

slurm_queue: None

blastall_fp: blastall

seconds_to_sleep: 1

and Here's the script I ran: join_paired_ends.py -f MI.M03555_0163.001.FLD0290.SCI017690_SLE16_R1.fastq.gz -r MI.M03555_0163.001.FLD0290.SCI017690_SLE16_R2.fastq.gz -o 290

Any other suggestions? Thanks!

Stefan Janssen

unread,

Jan 15, 2017, 12:30:22 PM1/15/17

to Qiime 1 Forum

Often transfer of files between Windows and Mac / Linux cause problems because of line end encodings (Windows uses \r\n, Mac/Linux only \n). Is there a Windows machine involved in your pipeline?

Masha T

unread,

Jan 22, 2017, 8:51:46 PM1/22/17

to Qiime 1 Forum

Hi, just wanted to follow up on what I've found since.

We found a bunch of null characters in the output files (ASCII 0) of the files that were 'broken'. I also tried SeqPrep which gave me a segfault, and then I saw this on github, which made me think the problem is somewhere within MacQIIME/using OS X: Some tests are failing when building on OS X versus Redhat 6.6. OS X gives "nan" while Redhat gives "-nan".

I'm on OS.X 10.11.6, and these problems were reproducable on another Mac computer.

So I tried this on a virtual machine, and success!

Stefan Janssen

unread,

Jan 23, 2017, 12:18:56 PM1/23/17

to Qiime 1 Forum

Hi Masha. That is a really hard bug to track. Thank's a lot for sharing your findings!

Reply all

Reply to author

Forward