Prinseq-lite : generates single line with 160,000,000 characters

0 views

prinseq

Skip to first unread message

lee

unread,

Sep 21, 2016, 7:41:10 AM9/21/16

to Edwards Lab Tools

I am currently running prinseq-lite-0.20.4.

The command I ran was

perl prinseq-lite-0.20.4/prinseq-lite.pl -lc_method entropy -lc_threshold 60 -min_qual_mean 20 -ns_max_n 15 -min_len 20 -trim_tail_right 8 -fastq somefile.fastq -out_good fastq/qc_good -out_bad fastq/qc_bad

When I inspect the qc_good.fastq file generated by prinseq-lite, it contains a single line with over 160,000,000 characters (most of which are non-ASCII) and ~3million entries from the somefile.fastq file are missing from the qc_good.fastq. All the other lines in the file are formatted normally.

The read (taken from a prinseq run that did not mess up) that gave me grief was :
@SN1052:358:C9A18ACXX:4:1101:2222:7943 1:N:0:GTGAAA
CTCAGCAATAGGCAAGTTATTCTAATCATATGTTATCCCAAAAGGCTTCT
+SN1052:358:C9A18ACXX:4:1101:2222:7943 1:N:0:GTGAAA <---------This line was too long!
@@@FFFFDDFHHHGGBHCGGHI?FIIJDG@HHIJJJEHJJI9DGIJJFHF

The input file was 5.2GB and it was run as a batch job with 60GB allocated to it. When I re-ran the job, this strange line was _not_ reproduced.

Are there any ideas out there as to what may be going on or how to further debug this problem?