fastq.gz files

1,276 views
Skip to first unread message

yum...@gmail.com

unread,
Feb 26, 2013, 8:56:39 PM2/26/13
to stacks...@googlegroups.com
Hi there:
I  started to run stacks for a RADseq run looking for SNP's, I am running a local cluster with a Linux OS. But I am stuck in the first part of process_radtags. I try to run my raw data file that are in a fastq.gz format my command line is something like 

 /usr/local/stacks/bin/process_radtags -f ./HroiRADseq/raw/HrYuma01_R1_2.fastq.gz -o ./HroiRADseq/samples -b ./HroiRADseq/barcodes/barcodes -e sbfI -i gzfastq -r -c -q

however I got this error message
 
sing Phred+64 encoding for quality scores.
Found 1 input file(s).
Loaded 48, 6bp barcodes.
Processing file 1 of 1 [HrYuma01_R1_2.fastq.gz]
Attempting to read first input record, unable to allocate Seq object (Was the correct input type specified?).


I unzip the same file and run it with the next command line 


 /usr/local/stacks/bin/process_radtags -f ./HroiRADseq/raw/HrYuma01_R1_2.fastq -o ./HroiRADseq/samples -b ./HroiRADseq/barcodes/barcodes -e sbfI  -r -c -q

And works. But this is just my first try with a few samples, I am worried that when I try the real runs the memory in my computer will be not enough for all the unzip files. 

Any suggestion? 


Julian Catchen

unread,
Feb 26, 2013, 10:12:57 PM2/26/13
to stacks...@googlegroups.com
Hi,

Make sure you are using a sufficiently new enough version of Stacks. I added
gzip support in version 0.99993. You would also have needed libz for the
compilation to work, although this library is more or less standard on most
machines.

julian

yum...@gmail.com

unread,
Mar 1, 2013, 12:34:25 AM3/1/13
to stacks...@googlegroups.com, jcat...@uoregon.edu
Hi Julian:
Thanks a lot for you replay and for all the great work you are doing with Stacks. I check the version installed is 0.99996 and libz.os is in the system. But still stack did not read my fastq.gz files. Any other suggestion ?


Cheers

Jon

Julian Catchen

unread,
Mar 1, 2013, 12:44:08 AM3/1/13
to yum...@gmail.com, stacks...@googlegroups.com
Hi Jon,

Thanks, I hope you find Stacks useful in your work.

I have not released versions 0.99996 (yet!) so is it possible you actually have
0.9996? (Confusing, I know...). Anyway, if this is true, then you need to
upgrade the software to use the gzip functionality (and 0.9996 is quite old, so
it would be good to upgrade).

Best,

julian

On 2/28/13 9:34 PM, yum...@gmail.com wrote:
> Hi Julian:
> Thanks a lot for you replay and for all the great work you are doing with
> Stacks. I check the version installed is 0.99996 and libz.os is in the system.
> But still stack did not read my fastq.gz files. Any other suggestion ?
>
>
> Cheers
>
> Jon
>
> On Wednesday, February 27, 2013 1:42:57 PM UTC+10:30, Julian Catchen wrote:
>
> Hi,
>
> Make sure you are using a sufficiently new enough version of Stacks. I added
> gzip support in version 0.99993. You would also have needed libz for the
> compilation to work, although this library is more or less standard on most
> machines.
>
> julian
>
--
Julian M Catchen, Ph.D.
Institute of Ecology and Evolution
University of Oregon
--
jcat...@uoregon.edu
http://www.uoregon.edu/~jcatchen/

yum...@gmail.com

unread,
Mar 1, 2013, 2:11:22 AM3/1/13
to stacks...@googlegroups.com
Thanks a lot

Manfred Klaas

unread,
May 31, 2013, 11:28:19 AM5/31/13
to stacks...@googlegroups.com
Hello,

I am running a GBS experiment, and try to de-multiplex the results from a Illumina sequencing run, the command line looks like this: 

process_radtags -f /home/mklas/work/GBS_L001_R1_001_fastq.gz -o /home/mklas/work/results_radtags -b /home/mklas/work/barcodes_9bp.txt -c -q -i gzfastq -D -e pstI

I had the very same error message as described earlier by Yuma More:
...Attempting to read first input record, unable to allocate Seq object (Was the correct input type specified?).
I followed the advice from this thread, had  our 0.998 stacks updated with the newest version 0.999991, and tried again. Now apparently the input file is no longer recognized. This is from the error file:
 
Processing file 1 of 1 [GBS_L001_R1_001_fastq.gz]
Failed to open gzipped file '/home/mklas/work/GBS_L001_R1_001_fastq.gz': No such file or directory
.
 
the directory is fine, as the barcode file is read properly, so it looks like the problem is with the file.
 
Is this a problem of Linux or of process_radtags v 0.999991 , and how could I get around this?

Julian Catchen

unread,
May 31, 2013, 1:45:05 PM5/31/13
to stacks...@googlegroups.com
Hi Manfred,

I just double-checked process_radtags here with your parameters and filename but
using one of my single-end gzipped, raw Illumina files and it worked fine. Is it
possible your file is corrupted? Can you gunzip it successfully?

Best,

julian

On 5/31/13 8:28 AM, Manfred Klaas wrote:
> Hello,
>
> I am running a GBS experiment, and try to de-multiplex the results from a
> Illumina sequencing run, the command line looks like this:
>
> process_radtags -f /home/mklas/work/GBS_L001_R1_001_fastq.gz -o
> /home/mklas/work/results_radtags -b /home/mklas/work/barcodes_9bp.txt -c -q -i
> gzfastq -D -e pstI
>
> I had the very same error message as described earlier by Yuma More:
> .../Attempting to read first input record, unable to allocate Seq object (Was
> the correct input type specified?)./
> I followed the advice from this thread, had our 0.998 stacks updated with the
> newest version 0.999991, and tried again. Now apparently the input file is no
> longer recognized. This is from the error file:
> /Processing file 1 of 1 [GBS_L001_R1_001_fastq.gz]
> Failed to open gzipped file '/home/mklas/work/GBS_L001_R1_001_fastq.gz': No such
> file or directory/.

Manfred Klaas

unread,
Jun 4, 2013, 11:45:57 AM6/4/13
to stacks...@googlegroups.com, jcat...@uoregon.edu
Hi Julian,
thanks for the quick reaction! I have un-zipped the file without an error, and had a look at the first lines of the output, it seemed ok (=fastq formatted sequences). I had actually already used the .gz file as input for fastqc for a quality check, and that program has opened the .gz file without a problem. I have started the process_radtags now using the unzipped version of the input file, and that worked.
Thanks a lot,
Manfred 

Julian Catchen

unread,
Jun 4, 2013, 2:49:22 PM6/4/13
to stacks...@googlegroups.com, mjk...@yahoo.com
Hi Manfred,

I'm not sure you want to pursue this further, but if you do, you could re-gzip
your file and try it in process_radtags one more time. And, if it fails to work
you could provide the file (or a fragment of the file) to me for debugging.

But, of course, you may just want to move on with your analysis.

Also, was the file you fed into process_radtags the same, raw file you fed into
fastqc, or was it the file processed by fastqc?

If anyone else experiences unreadable gzipped files, let me know, so we can try
to find a test case to debug the software.

Best,

julian
> --
> --
> For more options or to unsubscribe: http://groups.google.com/group/stacks-users
> Stacks website: http://creskolab.uoregon.edu/stacks/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to stacks-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Manfred Klaas

unread,
Jun 6, 2013, 10:55:02 AM6/6/13
to stacks...@googlegroups.com, mjk...@yahoo.com, jcat...@uoregon.edu
Hi Julian,
I have gzipped the unzipped file again, using the default compression rate, so that the resulting file was a little bit smaller than what we had got form the sequencing company (9.7 vs 11.9 gb). The result was the same as with the original .gz file: Failed to open gzipped file '/home/mklas/work/temp/gbs_misc_new.gz': No such file or directory
The input file for fastqc and my first process_radtags runs had been the same 11.9gb file as provided by the sequencing company.
 
How could I send you the file, or a fragment that is meaningful enough for debugging?
Best regards,
Manfred 

Julian Catchen

unread,
Jun 7, 2013, 4:57:46 PM6/7/13
to stacks...@googlegroups.com
Hi Manfred,

After thinking about this a bit (and thinking about the difficulty of moving a
12G file), I wonder if Stacks failed to find and include the gzip library (libz)
during the compilation step. Are you able to read any gzipped files with
process_radtags?

I would be inclined to rebuild it and look for errors during configure or during
make where libz fails to be found. In the compilation steps you should see "-lz"
in the linking step, like this:

g++ -fopenmp -g -O2 -fopenmp -o process_radtags
src/process_radtags-process_radtags.o src/process_radtags-utils.o
src/process_radtags-write.o src/process_radtags-clean.o
src/process_radtags-file_io.o src/process_radtags-input.o -lz -lgomp

julian
> > to stacks-users...@googlegroups.com <javascript:>.
> > For more options, visit https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>.
> >
> >
>
> --
> Julian M Catchen, Ph.D.
> Institute of Ecology and Evolution
> University of Oregon
> --
> jcat...@uoregon.edu <javascript:>
> http://www.uoregon.edu/~jcatchen/ <http://www.uoregon.edu/~jcatchen/>
Reply all
Reply to author
Forward
0 new messages