High percentage of barcode not found

79 views
Skip to first unread message

steven yodas

unread,
Jan 9, 2025, 11:56:49 AMJan 9
to Stacks
Dear all, 
I am new to stacks and currently face an issue during demultiplexing. My experiment followed the protocol by Paterson (2012), using PstI-HF and MspI for digestion. I designed 5 PCR indices and the sequencing facility provided we with 10 fastq.gz files (5 indices, paired-end 150bp). I used process_radtags to demultiplex the samples according to the barcode list but obtained low percentage of retained reads (below 20%). Even after adjusting the mismatch rate (--barcode-dist-1) to 2, I still obtained low retained reads. This situation did not happen to my lab mate who followed the same protocol and we shared the same materials. I am wondering if there is something wrong with my script or barcode list, and I would greatly appreciate any insights or suggestions.

Here is the script that I used to process one of the indices.
>process_radtags -P -p stacklib1 -b lib1new.tsv -o stacklib1 --inline-null --renz-1 pstI --renz-2 mspI -c -q -r --barcode-dist-1 1 -i gzfastq -y gzfastq --score-limit 10

And below is how my barcode list looks like (lib1new.tsv):
AACCTA TARI_SP80
CAGATA TARI_SP63
GATCAT TARI_SP71
GAAGTG AVPP1708
GATTCA AVPP1714
ATAGAT 2257_001
GGCTTA 2257_002
GAACTT 2257_003
GCGCTT 2257_005
TGTGGA 2257_006
TGGATA 2257_008
TTCACG 2257_009
TCTTGG 2257_010
ATAAGG 2257_011
TGTACA 2257_013
TTCGTT 2257_014
TCGGCG 2257_015
CCTTCG 2257_017
GGATTG 2257_018
AATTAG 2257_019
ACAACT 2257_020
CGTCTG 2257_021
ACTGCT 2257_024
AACTGG 2257_025
CTGTTG 2257_026
CCGACG 2257_027
TTAGAG 2257_028
ACCAAG 2257_029
TTGGAA 2257_030
GTCATG 2257_031
CATAAG 2257_032
CCGTGA 2257_034
CTGTGT 2257_035
CTAACA 2257_036
GGCCTG 2257_037
TGACGT 2257_038
ACTGAG 2257_039
GCGCCG 2257_041
TCCGAG 2257_044

Here is the demultiplex result of my preliminary data
Screenshot 2025-01-09 144624.png

I would greatly appreciate any advice or guidance on what might be going wrong and how I can resolve this issue.

Thank you so much in advance!

Catchen, Julian

unread,
Jan 9, 2025, 12:26:41 PMJan 9
to stacks...@googlegroups.com

Hi, are your barcodes inline with the sequence (part of the single-end read), or an Illumina index that is provided in the FASTQ header of the sequenced files? See this for examples of what I mean:

 

https://catchenlab.life.illinois.edu/stacks/manual/#barcode

 

I would greatly appreciate any advice or guidance on what might be going wrong and how I can resolve this issue.

Thank you so much in advance!

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/stacks-users/9318f30c-e942-4ab5-8614-a6e0535e1be8n%40googlegroups.com.

Laís Aline Grossel

unread,
Apr 16, 2025, 6:07:27 PMApr 16
to Stacks
Hi, guys! 
Let me be part of this conversation because I have a similar problem (but maybe a little bit worse lol).
We used 3RAD, claI and ecoRI for digestion, and Illumina for sequencing. We received the samples demultiplexed (four files for sample). Then, because the barcodes appeared only in the files header, I understood I didn't need to remove them (I don't even know if it's possible to remove something from the header). Is that right?

This is how the beginning of the files looks like:
zcat files_1/F10n05_R1_001.fastq.gz | head -n 20
@LH00401:259:22HKMYLT4:5:1101:2449:1070 1:N:0:AACCTTGG+GTCAGTAC
TNATGGTCAATCGGCCTCAAGGCGCGAATTATCGTGCACAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAACCTTGGATCTGGGGCGGCGCCCCCTCCCCTCAAAAGAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
I#IIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9II-9----9-9------------99--99---9-9I99-99-9999999I99I9IIIIII9-99
@LH00401:259:22HKMYLT4:5:1101:46365:1084 1:N:0:AACCTTGG+GTCAGTAC
TNATGGTCAATCGGAACAAATCAGTTTTAAGTGGGACTGTCTGCTCAGTGCTATGACGACCTGGTGCGCAACGGCTGAGGAAGAAAGAATTATCGTGCACAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAACCTTGGATCTGGG
+
I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIII9IIII-II9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIII9IIIIIIIIIIIII99I-
@LH00401:259:22HKMYLT4:5:1101:49853:1098 1:N:0:AACCTTGG+GTCAGTAC
TNATGGTCAATCGGGCGAGGGTCCGAGAATTATCGTGCACAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAACCTTGGATCTGGGGGGGCCCCCCCCCCCCCCCAACAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9#IIIII9999II9IIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIII9IIIIIIII99I9----9--9-----9------9--9--------99999-99999I-II-9I9999-99999
@LH00401:259:22HKMYLT4:5:1101:8866:1140 1:N:0:AACCTTGG+GTCAGTAC
TCATGGTCAATCGGAACACCCAGGACGCGATCTCACCATCTCGAGATCTCGGGAGCTGTGTCTTAGCCCACGTGGCCATCCCGCCCCCTTTCAAGAAACGTCCATAGCCGTCCTGTTAGCACGGCTCCTTTATATCAAAGACAACTTTCT
+
IIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII99IIIIIIIII9IIIIIII--99III9IIII999IIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIII
@LH00401:259:22HKMYLT4:5:1101:26272:1154 1:N:0:AACCTTGG+GTCAGTAC
TCATGGTCACTCGATTCTGTTTCCCACAGTTTTCAAACTGCTGAAAATTTTCTGTACCTTACCAGTGTCCACAAGCACAGCTGAACGTTCATTTCCATCTTTGATAATAGTTAAAACATACTTACGCAGTACAATGTCCGAAAACCGTCT

I'm still in doubt with the barcodes because when I run process_radtags, the % of good-quality reads is around 0.1%, which is making me desesperate. But in my case, the problem is with RAD cutsite not found (I ran an example for only one sample, but the issues are the same):
1657030 total sequences
      0 barcode not found drops (0.0%)
      0 low quality read drops (0.0%)
1656811 RAD cutsite not found drops (100.0%)
    219 retained reads (0.0%)


Thank you for any idea and suggestion!
Best,
Laís
Reply all
Reply to author
Forward
0 new messages