problems with output files in process radtags

492 views
Skip to first unread message

Matthias Bernt

unread,
Apr 18, 2019, 11:35:59 AM4/18/19
to Stacks
Dear list,

I have observed two problems with files created by process radtags (stacks v2.3c).

1. Created zip files seem to be incorrect sometimes. If I decompress with gunzip I get messages like this

```
gzip: D_ITH_C.reverse.fq.gz: invalid compressed data--format violated
gzip: D_ITZ_A.forward.fq.gz: decompression OK, trailing garbage ignored
gzip: U_COR_12.forward.fq.gz: invalid compressed data--crc error
gzip: U_COR_12.forward.fq.gz: invalid compressed data--length error
```

Apparently the zipped files often have a size of 20bytes (maybe the fastq files are empty?).

2. I observed one case where an invalid fastq file was generated. Here is a snippet:


```
...
@150_2_2201_3628_3247/1
AATTCCTAATGCTTATGTTAAAACGTTCCAAGGTCCTCCTCACGGAATCCAAGTGGAAAGAGATAAATTGAACAAGTATGGTCGTCCTCT
+

+
FFFHHHHHJJJJJJIJJJJJJJJJJHJJJJJJJGIJJJJJJJJJJJJJJJJJJFHGJJJJJJJJJIIHHHEHHFEFDFFEECEDDDDDDDDDE
@150_2_2201_3875_3190/1
AATTCAACGGCGCTGTCCGTTTCCCCAAAGAGTCGACGGTTTGCCCAGAAAGTTCAGCCGATGGTCGACGGTGGTCGTCGGGGTCGTCGAATA
....
```
I'm currently running it again to ensure that I can reproduce the error.
Best,
Matthias

Matthias Bernt

unread,
Apr 23, 2019, 2:10:28 PM4/23/19
to Stacks
Second problem can be reproduced. A closer look on the data made me guess that a "duplicate" entry in the barcode file might be the reason (would be nice if someone could verify this).
By duplicate I mean that there are two individuals with different barcode and index entries.

Best,
Matthias

Tanner Myers

unread,
Jan 19, 2021, 5:35:55 PM1/19/21
to Stacks
I think that I also encountered this problem in the gzipped fastq files output by process_radtags. 

When I piped the output of zcat to head, all of my fastq files look normal:
zcat 10202.1.fq.gz | head
@241_8_1101_12763_993/1
TGCAGGAGCCCATCATATATCTCAGGGTTACTTCTGGTGGGACTCTGTTCCAGGTCCAGCTCTGCAGCATGCTCTGAGGAAGCCTTGAGGATATGGGTGACTACCCATATCCTCCTTTAACAAGCCAAAAACAGCACGGAAAGAC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JJAJJJAJJJFJJJFJJJAJJJ7FJ7AJJJJJJJJJFJJA7FJJJ
@241_8_1101_11272_1010/1
TGCAGGAAAGGCACAATCAAAGTATTCCTGGCAGGTGGCCATCCAGCTTTCACTTAAAAGCCTCCAAGGAAGGAGCTTCCACCAGGCTCTGAGGACGCAGAGAGTTCCCCTGCTGAATTGCTCTTCTTGTGGTCAGGAAGTTCTT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<FJJJJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJF-FFF7JJ7JJJJJJJ-FAJFFJJAFJJJJJF-AJJF-AJJ-FJ
@241_8_1101_13809_1010/1
TGCAGGATAGGGAAGCTCTGAATCGATACGTCTCTCCAAGGTCCCGTTTGGCTCACTCATAGCTATGAGAGCTTACTTGAAGGCTACGCTTCAGCACCTTGGATAGCTCTTCAAAGCTTGAATGGGGATGGGTGTCGAATCCAAC

But, when piping to tail, I received the following error message:
zcat 10202.1.fq.gz | tail

gzip: 10202.1.fq.gz: invalid compressed data--format violated
ATG7J27JACAGCFJGTCCG0005C79T7T42-CAGATTACT42-CAGGGGGF7CA0TAA<78T7CG<TTFA44444444TA00FF3CTGC7TFAGGAC02<GT<C544507
CACTGJ<<<76<2AF7CJ-T4GC5GG-GA-CC5AA9JJJAA77GT<C0-F1TFCACGTTCA9GTTATJ-JAG3F-ACCGAFF7GG2AF7AJ757FGG7FTA<FA<J0A05FAAA-FTGJJTJCTA77JGTCG34G
GATJ7GAAFC-GAA-FTGJ-A5TAC0050ACAGFG55T-5GG-GT-GJ7T<C-FGJ<--F-77777GA55AAJ5CT
G7TTTT3-F0G--C-TA---3JT2AF7AJ7
8AA7-J
CGC7T222CFFTG17T-444TA0ACJ<TJJJ777777777777GFAGCTTF-5FJTAF77AA7AA35FCCCA0CTGGF<CA5
CACTGJ75---C7C07GCCFG7J7TC6ACC74<JGTJA777ACFA22TTJ7A<AA27-77A8-8TAAT5G2J407ATTC-8T-77AAG2JTG7AGT<<5500GGG-C-AFC4J7GATTA7777777A5
CACTGJ75-7A5
AGT7GACAFGGA55G7JFC-GAA-FTGJ-<7GT7_CA4A7G77JCAFJGGA-CGFG55T-5CJCFAG1AF7GFGFA<TAAJTJT7777777GFAGC5JCTGCTCF<377777A5
T77FGGC63<FFF63<F7F-7GGAG7G30<FCF-AAC-C27A77A5GA------A79A-AAFJJTGA64TFFCCA74<-7CAF-6CG0J<AAACG9CATJ7A<AAGJAATG17--AC7F4-AC5AJAJ5C0-3A2TJ6CT7TAAA5F00CAA7AG-GAG7G30<ATCC5<GCF7FTG7A1GGGG7CJ607T8JAA7CGCA2-AAATFFAACAAGFAJGAA<443-AF343-TAGA_C-CFCGGGG2G4G56<C04A-CGJJTAGA7G177AGTCGFF37CAGGTAAGCATCT1GGGG7CT7CAGGAG<F716GA1G7CAGFAF775-3JT21A<TAACT7770F4<A7J2AF7GGA7-TA_C7A67GA<JGG<C<78TC--GGA5TG8J9GT-JG-GAGG5TG77<TGGGGGC5ACA-AG7AAFC-TGJG-GAGGAFJCTGT2AAA_CA<7--JJ7T<CCCCCCCAFT<GAGTCG7FAJ<AATCF7AAAJ9AAGT-GGGTTTTTTT2AF4<GCF7AFFTAG2J4FAJ<AA6C-A0<-AAATA7G17AAAFA9JAJT73<7-AGAG777-77AAG2JT-CT7TG6ATTFF7-<GA<9CG3-F_F--77AGA5FCAG7<57FGAF9T<86ACTGGAFFT6CGTT-8G1TT-7<A7JG15G----011<AAFJ10AF7TG8J9GT-JG-GAGG5TG7FAGC<AATTTTAC4AATC<<AGAAGGAA7CFAJ9GT-JG8TC--ACACJ<GG8FCA5TACT7T42AT-8GAAA-AFA7G<A1FFF7GJT-CT7TG6AAGGGGG77CTG22AGAF7CJAJJJJTTAG0JJCCAG777A7A<1GG57<A7FGC<7T19TFC6AAA477GTF<AFCGT0TT55T7AGA5C5A-FGJ5FTTJ1-GGT74G227G2AF4GT7FGA1<T7FGA1<GG8FCAA0A0A4GJFTATTTTTTTTTG00FF3CTFJGC4AAGC7G70GG      JJAJAG9F1TAF2CA<6JF7CTA7-CG<TT111GG210ATGGGAG7<8GATC5FTGF<AFC5CAAAJ9<AAF-AT<ACAGGF<7C7==ރDB�ACAGGF<7C7==ރD8�s8�s8�s8�s

There isn't anything in the process_radtags log or standard error/output (.oe) files to suggest that the job didn't finish correctly so I'm unclear what is causing gzip to not work correctly for some of my data. 

Thanks,
Tanner

Tanner Myers

unread,
Jan 19, 2021, 6:38:19 PM1/19/21
to Stacks
I forgot to mention this, but I used stacks 2.5 to run process_radtags.

Nolan Bornowski

unread,
Oct 18, 2021, 5:06:39 PM10/18/21
to Stacks
Hi all,

Just wanted to chime in since I am also having this problem with process_radtags on stacks v2.55.

I have some samples that were sequenced with multiple barcodes (ex. 1 unique sample: 3 unique barcodes) and process_radtags messes up the demultiplexed .gz output for those samples. Other samples with a 1 sample:1 barcode entry are demultiplexed fine.

No error codes were thrown in the logs or output, as Tanner mentioned. I was only alerted to this when was QCing the demultiplexed files. I noticed the number of output reads was a fraction of what I would expect and the .gz file appeared to be truncated.

Running the same command as Tanner:

zcat test.fq.gz | tail

gzip: test.fq.gz: decompression OK, trailing garbage ignored


Demultiplexing with NGSEP and Cutadapt worked fine for the 1 sample:multiple barcode entries in case anyone is looking for an alternative solution for this use case, though they were slower and without the nice built-in filters that stacks uses.

Best,
Nolan

Nolan Bornowski

unread,
Oct 20, 2021, 7:04:37 AM10/20/21
to Stacks
Update: I installed Stacks v2.59 and demultiplexing of samples with multiple barcodes worked fine. Their fq.gz files were not truncated and contained the same reads as when demultiplexing with the other demultiplexers.
It seems this feature was added in v2.57.
Thanks,
Nolan

Reply all
Reply to author
Forward
0 new messages